PaIRSES

Bachelor Thesis Dissertation products, presented on 2013-09-23

Abstract

The remarkable quantity of structured data extracted by DBPedia from Infoboxes on Wikipedia articles lends itself as a great starting block for further extraction of data. The project aims to collect and catalogue the natural language patterns therewith the data is presented on the actual discourse of Wikipedia articles, and exploit these patterns in order to obtain and store an analogue data set (i.e. RDF statements pertaining to the predicates associated with these patterns) from both within Wikipedia and external text sources. Mimicking the natural human approaches as defined by the current chunking theories of language acquisition, the experimental algorithms developed for the purpose employ the model of Stanford Typed Dependencies to reach a precision rate of 0.26 and a recall rate of 0.26. These result from tests on 200 sentences sampled from the same corpora of 51,536 Wikipedia articles concerning human settlements (cities, towns, etc.) used for collecting patterns in conjunction with a training set retrieved article by article from DBPedia, and do not consider as retrieved the statements obtained by matching a pattern to the sentence it originated from.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
WikiExtractor		WikiExtractor
docs		docs
pairseslib		pairseslib
preliminaryStudy		preliminaryStudy
sandbox		sandbox
.DS_Store		.DS_Store
README.md		README.md
englishWikiModule.py		englishWikiModule.py
naivePatternHarvester.py		naivePatternHarvester.py
pairses.cfg		pairses.cfg
patternHarvester.py		patternHarvester.py
patternMatcher.py		patternMatcher.py
patternMatcherShell.py		patternMatcherShell.py
patterns.obj		patterns.obj
randomArticles.py		randomArticles.py
sampleCities.txt		sampleCities.txt
sampler.py		sampler.py
sentencesSample.txt		sentencesSample.txt
sentencesSampler.py		sentencesSampler.py
wikidump.cfg		wikidump.cfg

riccardoangius/pairses

Folders and files

Latest commit

History

Repository files navigation

PaIRSES

Abstract

About

Resources

Stars

Watchers

Forks

Languages