Skip to content

KentoW/melody-lyrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Melody-lyric alignment data

All source URLs of the 1,000 songs for creating melody-lyric alignment data [1]

In progress (512 songs / 1,000 songs)

Description

We provide scripts for melody-lyric alignment.

Requirement

Python2
pip install romkan
pip install jaconv
pip install jcconv

install stanford corenlp pywrapper

Japanese Morpheme Parser Mecab url Japanese Dependency Parser CaboCha url python module for MeCab and CaboCha
MeCab Dictionary ipadic and UniDic

nkf (character code converter (Shift-JIS -> UTF8))

Usage

0a. Change default encoding

Change directory python site-packages (e.g. ~/anaconda2/envs/py2/lib/python2.3/site-packages/). Edit sitecustomize.py

import sys
sys.setdefaultencoding("utf-8")

0b. Prepare dictionary files

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2013-06-20.zip 
unzip stanford-corenlp-full-2013-06-20.zip

Download ipadic and unidic from MeCab: Yet Another Part-of-Speech and Morphological Analyzer and UniDic.

mv unidic dic/
mv dic/dicrc dic/unidic/
mv ipadic dic/

1. Collect text and melody files

  1. Prepare lyrics.txt of the following format.
@title sample
@artist anonymous
これはサンプルです
歌詞は行と段落で構成されます

段落の間には1行の空行があります

英語が混ざっている日本語の曲も対応しています
  1. Prepare melody.ust of the following format. (See Utau - Wikipedia)

  2. Convert character code of UTAU file. (Shift-JIS -> UTF8)

nkf -w8 --overwrite melody.ust

2. Move text and melody files

mkdir pair_data   
mkdir pair_data/sample  
cp lyrics.txt pair_data/sample/sample.txt
cp melody.ust pair_data/sample/sample.ust

3. Run!

  • Text format python align_data_readable.py > data.txt

  • JSON format python align_data_json.py > data.jsonl

Data format

See sample data.txt or data.jsonl


  • [1] Kento Watanabe, Yuichiroh Matsubayashi, Satoru Fukayama, Masataka Goto, Kentaro Inui and Tomoyasu Nakano. A Melody-conditioned Lyrics Language Model. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018)

About

All source URLs of the 1,000 songs for creating melody-lyric alignment data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages