This python algorithm takes in input a music, and outputs a Music Video which fits the input music, made from segments of Youtube MVs. This project was made in the wake of a Master Thesis in the Machine Learning department of Tsinghua University. See here a summary of the whole project : related work, database constitution, generation algorithm and results on how well the generated videos can fool humans.
The best is to install the environment using conda.
- Python 3.5.6
- Librosa
conda install -c conda-forge librosa
- Pyscenedetect cd to folder then
python setup.py install
- Opencv
pip install opencv-python
- Msaf
pip install msaf
- ACRCloud Python SDK
python -m pip install git+https://github.com/acrcloud/acrcloud_sdk_python
For the algorithm to work, you must have a data folder containing, for each video :
- the video file (ex: video1.mp4)
- a folder of the name name (ex: video1/) containing :
- all the video scenes from this video (ex: video1_001.mp4, video1_002.mp4, ...)
- for each video scene file, a json file with 2 keys : its color histogram (array size 768) and its length in seconds
For having better results, you can also use a file statistics/songs_on_server.csv containing, info on each video :
- the id (the file is named id.mp4)
- the style of the video (electro/hiphop/pop/rock)
- the resolution of the video (size within the black bars)
In the next section we explain how to create a database following these requirements. Finally, you need to subscribe to an ACR Cloud API key and Last.FM API key if you want to use the genre recognition. Otherwise you can juste manually give the genre when running the algorithm.
main.py
takes 3 arguments, 2 required are the input and output path, the last one "genre" is optional and must be in the AUTHORIZED_GENRES list.
python main.py --input /path/to/music.mp3 --output /path/to/output_video.mp4 (--genre pop) (--no-csv /path/to/database)
As explained above, the database must have a precise structure.
- Download Youtube videos in a data/ folder. You can use youtube-dl for this.
- Create the
statistics/songs_on_server.csv
file by running the functiondatabase_info_to_csv()
insrc/database_constitution.py
. - (Optional) Resize the videos so that they all have the same format by running the function
harmonize_video(video_path)
insrc/database_constitution.py
. - Extract the scenes for all the videos by running the function
find_scenes(video_path)
insrc/video_analysis.py
. - Store the color histogram and length for each scene in json file by running the function
store_color_features(data_path)
insrc/video_analysis.py
.
Then you can stop here and run the algorithm using --no-csv
option.
If you only complete the previous steps, the result will not take into account:
- the genre of the videos used in the database
- the video resolution Which will give lesser quality results, as the generated video size might change during the video and the video content will be less consistent.
If you wish to take these into account, you must generate a .csv file containing infos on the database in statistics/songs_on_server:
- Extract audio files from the videos. In bash :
for file in data/*.mp4; do ffmpeg -i $file -ab 160k -ac 2 -ar 44100 -loglevel quiet -vn ${file%.*}.mp3; echo 'Extracted mp3 for' $file; done
- Get music and video info for all videos by running the function
database_info_to_csv(data_path)
insrc/database_constitution.py
.
You want to know more about what kind of data you have in your database ? This is the purpose of the csv files in statistics/ folder.
- The file
statistics/songs_on_server.csv
(generated bysrc/database_constitution.py/database_info_to_csv
) contain infos on the MV name, artist, genre, video resolution, length - The file
statistics/scenes_number.csv
(generated bysrc/video_analysis.py/find_scenes
) give the number of scenes for each video - The file
statistics/scenes_length.csv
(generated bysrc/video_analysis.py/find_scenes
) give for each scene the length in seconds and the number of frames
You can display the .csv contents using statistics/analysis.py
script.