- 采用BLSTM最后一个神经元的输出,训练准确率93,测试准确率为83 过拟合解决方法:期权,正则,但是还没有做. 数据预处理还没有做完.
- 单层LSTM有问题,可以继续搞一搞,但基本知道什么问题了
- /data/csv/train.csv : Quora公开的数据集,具有数据标签
- /data/csv/test_part_aa, /data/csv/test_part_bb : 测试数据(test.py)split之后的数据,可以使用cat连接数据。
- /data/vovab.model : VocabularyProcessor的模型(max_length = 60)
http://blog.csdn.net/autocyz/article/details/53149760
[1] Ways of Asking and Replying in Duplicate Question Detection
http://www.aclweb.org/anthology/S17-1030
[3] 中文博客
https://www.leiphone.com/news/201802/X2NTBDXGARIUTWVs.html
[4] Quora Question Duplication
https://web.stanford.edu/class/cs224n/reports/2761178.pdf
[5] 上海交通大学报告(非常重要)
http://xiuyuliang.cn/about/kaggle_report.pdf
[6] Deep text-pair classification with Quora’s 2017 question dataset
https://explosion.ai/blog/quora-deep-text-pair-classification
[7] NOTES FROM QUORA DUPLICATE QUESTION PAIRS FINDING KAGGLE COMPETITION
http://laknath.com/2017/09/12/notes-from-quora-duplicate-question-pairs-finding-kaggle-competition/