Skip to content

elyase/eikon_challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eikon_challenge

This is part of the code for the Eikon challenge. Probably only the features.py file and the design diagram is of interest here.

In this challenge Thomson Reuters, was searching for an algorithm to accurately tag incoming news items by relevance for companies or organizations mentioned within the news item. I built a system capable of recognizing alternative company names (using DBpedia data), stock ticker based identification (Bloomber Symbiology data) and country based discrimination in the text of the news. The system has the following structure:

diagram

  • Lookup tagger: Performs authorithy driven mention detection, i.e. extracts with high recall possible mentions of company names.
  • Candidate generation: For each possible company mention several candidate companies are suggested
  • Feature generation: For each mention-candidate company generate features.
  • Classifier: This component finds the correct candidates using the features. One of the greatest challenges was to find data sources to augment the information about the list of companies complying with the accepted licenses.

About

Code for the eikon challenge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages