Skip to content

hitxujian/Quora_query

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quora Question Pairs (短文本主题相似)

使用Siamese网络结构:

  1. 采用BLSTM最后一个神经元的输出,训练准确率93,测试准确率为83 过拟合解决方法:期权,正则,但是还没有做. 数据预处理还没有做完.
  2. 单层LSTM有问题,可以继续搞一搞,但基本知道什么问题了

数据(data文件夹)

  1. /data/csv/train.csv : Quora公开的数据集,具有数据标签
  2. /data/csv/test_part_aa, /data/csv/test_part_bb : 测试数据(test.py)split之后的数据,可以使用cat连接数据。
  3. /data/vovab.model : VocabularyProcessor的模型(max_length = 60)

Contrastive Loss (博客链接)


http://blog.csdn.net/autocyz/article/details/53149760

相关参考资料和论文

[1] Ways of Asking and Replying in Duplicate Question Detection
   http://www.aclweb.org/anthology/S17-1030

[2] 英文博客
   https://medium.com/mlreview/implementing-malstm-on-kaggles-quora-question-pairs-competition-8b31b0b16a07

[3] 中文博客
   https://www.leiphone.com/news/201802/X2NTBDXGARIUTWVs.html

[4] Quora Question Duplication
   https://web.stanford.edu/class/cs224n/reports/2761178.pdf

[5] 上海交通大学报告(非常重要)
   http://xiuyuliang.cn/about/kaggle_report.pdf

[6] Deep text-pair classification with Quora’s 2017 question dataset
   https://explosion.ai/blog/quora-deep-text-pair-classification

[7] NOTES FROM QUORA DUPLICATE QUESTION PAIRS FINDING KAGGLE COMPETITION
   http://laknath.com/2017/09/12/notes-from-quora-duplicate-question-pairs-finding-kaggle-competition/

About

quora短文本相似

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.5%
  • C++ 4.5%