The script kaltura_aws.py is the main script to be run.
It lists, archives, restores videos in Kaltura.
It relies on environment variables to authenticate against the Kaltura KMC and AWS.
Start with the template files bash.rc and csh.rc to set the relevant environment variables.
Use the help option for further documentation
python kaltura_aws.py --help
In production we run te script in a Docker container under AWS Fargate, Please read more about this at README-AWS.md
Only videos that have not been played recently are archived. Only videos smaller than 10000000kb are archived
kaltura_aws.py provides parameters to select videos based on the following criteria:
- number of years since placed/created
- played/created within given number of years
- category id
- tag value
- the number of times the video was played
- status of entry - eg READY (2), QUEUED (0), ERROR (-1), .. see KalturaEntryStatus class in KalturaClient.Plugins.Core
- entry id
Most critera can be combined.
All actions that trigger changes, like archiving videos in s3 or replacing with the place holder video are perfomed in dryRun mode by default.
Videos entries can be in three states:
- initial
- archived
- place_holder
In the initial state:
- s3 does not contain a copy of the entry's original flavor
- entry does not have the 'archived_to_s3' tag
- entry does not have the 'deleted_flavors' tag
- its original flavor is the entry's 'real' video - lets call this the real original flavor
In the archived state:
- s3 contain a copy of the entry's real original flavor
- entry has the 'archived_to_s3' tag
- entry does not have the 'deleted_flavors' tag -- same as initial state
- its original flavor is the real original flavor -- same as initial state
In the place_holder state:
- s3 contains a copy of the entry's real original flavor -- same as archived state
- entry has the 'archived_to_s3' tag -- same as archived state
- entry has the 'deleted_flavors' tag
- its original flavor is the place holder video
The kaltura_aws.py subcommands moves entries between states:
initial state -------- copy cmd ------> archive state
archive state -------- copy cmd ------> archive state
archive state -------- place_holder cmd --> place_holder state
place_holder state --- place_holder cmd --> place_holder state
place_holder state --- restore cmd -------> archived state
Subcommands print errors if they are applied to entries in a state that they can not work with, for example the place_holder command refuses to work with entries in the initial state.
In addition kaltura_aws.py has a health and count subcommands.
if entry.original_flavor != None
and if entry.original_flavor.status == READY
and AWS bucket does not contain a file with the same name as entries id
transfer to AWS-s3 bucket
apply 'archived_to_s3' tag to entry
if entry.original_flavor != None
and if entry.original_flavor.status == READY
and AWS bucket contains a file with the same name as entries id
and that file has the same size as the original flavor
delete all flavors including original
upload place holder video; create new thumbnail;
apply tag 'flavors_deleted'
wait until entry goes into READY state (sleep in between checks)
check the following invariant - see kaltura_aws.entry_health_check method
has original flavor and is in READY status
if AWS S3 contains a file with the entries ID
then the entry has the 'archived_to_s3' tag
if the entry does not have the 'archived_to_s3' tag
then there is no entry in AWS
if the entry has the 'flavors_deleted' tag
it should also have the 'archived_to_s3' tag
if the entry does not have 'flavors_deleted' and has 'archived_to_s3' tag
size of original flavor and size of s3 entry should match
to archive and replace with the place holder video choose a filter option that captures the videos you want to work on, then do:
kaltura_aws.py s3copy [filter-options]
kaltura_aws.py replace_video [filter-options]
kaltura_aws.py health [filter-options]
The health command should list all selected videos in a HEALTHY state and all videos should be tagged 'archived_to_s3' and 'flavors_deleted'
To count matchig videos
#entries with archived_to_s3
kaltura_aws.py count --tag archived_to_s3
#entries without archived_to_s3
kaltura_aws.py count --tag !archived_to_s3
# entries that were played between 2-3 years ago
kaltura_aws.py count --played_within 3 --unplayed_for 2
# entries that are not in the READY state
./kaltura_aws.py count --status --status 1 -2 0 1 7 4
To list videos that were created 3-4 years ago but never plyed:
python kaltura_aws.py list --created_before 3 --created_within 4 --plays 0
Kaltura updates the lastPlayedAt property of videos when they are played. So when a placeholder video is played it receives the current date as is its lastPlayedAt property
Apply the following steps to videos with the tag "archived_to_s3" that have been played within the last year
- Request restoral from AWS Glacier IF not yet available in S3; This request will take 5 - 12 hours to complete.
- Restore Video:
- replace placeholder video with original video IF video is availabe in S3
- recreate flavors and remove tag: "flavors_deleted"
It is rare that videos are edited. Very few users have this ability. The procedure above will result in errors if
- a video is archived
- a user replaces the original flavor with a new version after archival
The kaltura_aws.py command will error out when asked to replace the video unless the file size of the initial video and the versioned video are the same. So only if a video is replaced after archival which is unlikely since few users have that ability and if the versioned video has the same size, which is even more unlikely, will the procedure loose the versioned video.
add_ldap_status.py adds three columns 'status', 'name', and 'org_unit' to a tsv report. It derives these values from the creator_id value in the tsv report and ldap status data.
It uses the hard coded ldap base, 'o=princeton university,c=us', and the following ldap expression todetermine whether a user is stillactive, see puldap.py
(&(uid=<NETID>)(|(puresource=authentication=enabled)(puresource=authentication=goingaway)))
Use as follows:
./kaltura_aws.py list [filter-options] > list.tsv
cat list.tsv | add_ldap_status.py > emhanced_list.tsv
counts videos played / created within given years, how many have place holder videos, ... ;
useful to estimate how much more space can be saved by replacing videos with the placeholder
stats.py
set environment variabes so code connects to TEST KMC, then run in project's top directory
setenv PYTHONPATH src
python -m unittest discover -v src/test
to run individual tests do something along these lines:
python src/test/test_kaltura_aws.py TestKalturaAwsCli.testa_list_played_within_unplayed_for
run on the command line - you may have to install the openldap package
on EC2 server run 'sudo yum install openldap-clients'
Look for 'puresource=authentication=disabled' ; print only dn property
ldapsearch -LLL -x -H "ldaps://$LDAP_SERVER/" -b "o=Princeton University,c=US" -D "uid=NETID,o=princeton university,c=us" -W 'puresource=authentication=disabled' 1.1
Look for entries with a valid uid and with no pususpended value; in addition to dn, print uid and puresource properties
ldapsearch -LLL -x -H "ldaps://$LDAP_SERVER/" -b "o=Princeton University,c=US" -D "uid=NETID,o=princeton university,c=us" -W '(&(uid=*)(!(pususpended=*)))' uid puresource
The code in the python subdirectory runs only with Python v2.x, see .python-version
Please install pyenv according to github.com/pyenv/pyenv Pyenv allows you to install a local python copy and python libraries which functions independently of the system version.
# install pyenv
pyenv install 2.7
pyenv versions
which python # should show ~/.pyenv/shims/python
python -V # should return Python 2.7
Install necessary packages
pip install setuptools
pip install -r src/requirements.txt
V1.0.0 kaltura archiver executing in docker
V1.1.0 kaltura archiver and restorer in same docker image
V1.1.1 INFO logging in restore.rc, ERROR logging in archive.rc, pip install -r src/requirements.txt
V1.2.0
- Use Kaltura Api v15.10.0
- Filter on play lower than instead of equal number
- Stubs only for category filter
- kaltura_aws.py - exits with no-zero only if incorrect usage
- bash scripts archive.rc/restore.rc exit on first failed command
- bash scripts archive.rc/restore.rc rely on envirnmane variables: N_YEARS, DOIT
- implemented additional test cases
- moved docker_pull to bin/ directory
- added convenience scripts bin/docker_pull, bin/start_kaltura_session.sh