Record linkage aims at identifying duplicate records across datasets. Cross Language Record Linkage (CLRL) helps users to link records from datasets in different languages.
This folder contains Python files to extract datasets from DBPedia infobox files and Article title files. It also includes blocking and labeling using interlanguage links provided by DBPedia.
This folder contains the files that manages feature extraction and OOV treatment.
This folder contains test files to compare our approach with baselines and measure performance of different features.