Skip to content

lobnaHosny/ArXiv-Sentence-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArXiv-Sentence-Extractor Python 3.6

This is a tool to help research students find relevant quotations in research papers automaically.

It downloads pdf papers related to the research topic from arXiv, then extracts only the relevant sentences from these papers based on key terms or phrases entered by the user. It then displays these senteces in a txt file, along with their references.

It can be used as either a script file or a web application.

Installation

1- Download the files in this repo.
Note: If you want to use this tool as a web application (see below), and assuming you are using xampp, move the files to htdocs

2- Run the following command while inside the folder downloaded from the repo : py -m pip install .

Usage as a Script File

The script file, extract.py, takes five arguments:

py extract.py  search_arxiv_and   search_arxiv_or   dest   max_len   search_pdf

Where:

Argument Notes
search_arxiv_and comma-separated list of search terms that must all be in the result
E.g.: nuclear, energy translates to nuclear AND energy
If you don't want to use this argument, set it as "0"
search_arxiv_or comma-separated list of search terms where at least one must be in the result
E.g.: nuclear, energy translates to nuclear OR energy
If you don't want to use this argument, set it as "0"
dest directory to save results to, it's recommended to use a different directory for each time the tool is used
max_len maximum number of papers to be installed
search_pdf extract sentences from the papers that contain this phrase/word in them

Examples

Ex #1

py extract.py  "air, filter"  "0"   C:/Users/JohnDoe/Desktop/myresults   3   "filter"

The above will download 3 papers that have the words air AND filter, and will extract the sentences that have the word filter from them.

Ex #2

py extract.py  "nuclear energy, harms "  "0"   C:/Users/JohnDoe/Desktop/myresults   10   "danger"

The above will download 10 papers that have the words nuclear energy AND harms, and will extract the sentences that have the word danger from them.

Ex #3

py extract.py  "nuclear energy, generator"  "power, electricity"   C:/Users/JohnDoe/Desktop/myresults   5   "danger"

The above will download 5 papers that have the words nuclear energy AND generator power OR electricity, and will extract the sentences that have the word danger from them.

Usage as a web application

This is quite similar to the script format, except that it comes with a simple GUI.

To use the web application, type localhost/[put name of directory where files are saved inside htdocs/interface.php in your web browser.

It will probably look something like this: localhost/ArXiv-Sentence-Extractor-master/interface.php

Output

After using the tool, the directory mentioned in dest will contain the papers found in arXiv as PDFs, as well as a file called results.txt, which includes the extracted sentences with their reference.
There will also be another txt file called convert.txt, which contains the text of all the downloaded papers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published