This the the repository for the SIGIR 2020 paper "Enhancing Text Classification via Discovering Additional Semantic Clues from Logograms".
By leveraging the cross-linguistic variation of two types of writing systems, Leco utilizes logograms to capture reliable clues for the text classification of phonographic languages, especially for low-resource ones.
- code/ contains the source codes (Leco Classifier and Gaussian Embedding).
- data/ contains example datasets used for evaluating.
- Python (≥3.0)
- PyTorch (≥1.0)
- BERT-Base: Please initialize a pretrained BERT model (self.bert in class TextEmbedding) to obtain BERT embeddings.
- Hyperparameters are in _public.py.
If you find this study helpful or related, please kindly consider citing as:
@inproceedings{Leco,
title = {Enhancing Text Classification via Discovering Additional Semantic Clues from Logograms},
author = {Chen Qian and Fuli Feng and Lijie Wen and Li Lin and Tat-Seng Chua},
booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)},
year = {2020},
pages = {1201–1210}
}