한국어 시각 질의응답을 위한 Bilinear Attention Networks (BAN)

이 코드 저장소는 우리말로 시각적 질의응답을 수행할 수 있도록 수집된 KVQA 데이터셋을 학습할 수 있도록 Bilinear Attention Networks 모델을 구현하였습니다.

검증 데이터에 대하여 5회 반복 실험 후 평균 점수는 다음 표와 같습니다.

Embedding	Dimension	All	Yes/No	Number	Other	Unanswerable
Word2vec	200	29.75 ± 0.28	72.59	16.94	17.16	78.74
GloVe	100	30.93 ± 0.19	71.91	17.65	18.93	78.26
fastText	200	30.94 ± 0.09	72.48	17.74	18.96	77.92
BERT	768	30.56 ± 0.12	69.28	17.48	18.65	78.28

이 코드 저장소의 일부 코드는 @hengyuan-hu의 저장소의 코드 일부를 차용 또는 변형하였음을 알려드립니다. 해당 코드를 사용할 수 있게 허락해주셔서 감사드립니다.

미리 준비할 사항

타이탄 급 그래픽카드, 64기가 CPU 메모리가 장착된 서버 또는 워크스테이션이 필요합니다. Python3 기반의 PyTorch v1.1.0가 필요하며 이 도커 이미지를 사용하실 것을 강력히 추천드립니다.

pip install -r requirements.txt

mecab 설치를 위해서 다음 명령어를 실행하십시오.

sudo apt-get install default-jre curl
bash <(curl -s https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh)

KVQA 데이터셋 내려받기

KVQA 데이터셋은 이 링크를 이용하여 내려받으실 수 있습니다. 별도 라이센스(Korean VQA License)가 적용되므로 유의하시기 바랍니다.

전처리

이 구현은 bottom-up-attention에서 추출된 미리 학습된 이미지 특징을 사용합니다. 이미지 한 장 당 10개에서 100개의 가변적인 개수의 객체에 대한 이미지 특징들을 미리 구할 수 있습니다. 한글 단어 벡터를 위해서 다음의 코드 저장소를 참고하여 주십시오: Word2vec, GloVe, fastText, 그리고 BERT.

다음 과정을 따르면 데이터를 쉽게 준비할 수 있습니다.

KVQA 데이터셋 내려받기에서 다운받은 데이터의 경로를 아래와 같이 지정해주세요.

data
├── KVQA_annotations_train.json
├── KVQA_annotations_val.json
├── KVQA_annotations_test.json
└── features
    ├── KVQA_resnet101_faster_rcnn_genome.tsv
    └── VizWiz_resnet101_faster_rcnn_genome.tsv

전처리된 이미지 특징 파일(tsv)들을 다운받으시면 이미지 파일들을 다운 받으실 필요없이 학습을 진행할 수 있습니다.

download.sh와 process.sh 스크립트를 실행해주세요.

./tools/download.sh
./tools/process.sh

학습하기

학습을 시작하기 위해서 다음 명령을 실행하십시오.

python3 main.py

매 학습 주기마다 학습 점수와 검증 점수를 확인하실 수 있습니다. 가장 좋은 모델은 saved_models 디렉토리 아래 저장될 것입니다. 만약 다른 질의 임베딩을 이용하여 학습하고자 한다면 다음 명령어를 실행하십시오.

python3 main.py --q_emb glove-rg

논문 인용

연구 목적으로 이 코드 저장소의 일부를 사용하신다면 다음 논문들을 인용해주시면 감사하겠습니다.

@inproceedings{Kim_Lim2019,
author = {Kim, Jin-hwa and Lim, Soohyun and Park, Jaesun and Cho, Hansu},
booktitle = {AI for Social Good workshop at NeurIPS},
title = {{Korean Localization of Visual Question Answering for Blind People}},
year = {2019}
}
@inproceedings{Kim2018,
author = {Kim, Jin-Hwa and Jun, Jaehyun and Zhang, Byoung-Tak},
booktitle = {Advances in Neural Information Processing Systems 31},
title = {{Bilinear Attention Networks}},
pages = {1571--1581},
year = {2018}
}

라이센스

Korean VQA License for the KVQA Dataset
Creative Commons License Deed (CC BY 4.0) for the VizWiz subset
GNU GPL v3.0 for the Code

감사의 글

데이터 수집에 도움을 주신 테스트웍스 관계자 분들께 감사의 말씀을 드립니다.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs/assets/img		docs/assets/img
tools		tools
LICENSE_CODE		LICENSE_CODE
LICENSE_KVQA		LICENSE_KVQA
README.en.md		README.en.md
README.md		README.md
attention.py		attention.py
base_model.py		base_model.py
bc.py		bc.py
classifier.py		classifier.py
counting.py		counting.py
dataset.py		dataset.py
evaluate.py		evaluate.py
fc.py		fc.py
language_model.py		language_model.py
main.py		main.py
registry.py		registry.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

License

rlagywns0213/BAN-KVQA

Folders and files

Latest commit

History

Repository files navigation

한국어 시각 질의응답을 위한 Bilinear Attention Networks (BAN)

미리 준비할 사항

KVQA 데이터셋 내려받기

전처리

다음 과정을 따르면 데이터를 쉽게 준비할 수 있습니다.

학습하기

논문 인용

라이센스

감사의 글

About

Resources

License

Stars

Watchers

Forks

Languages