This is a tool to help research students find relevant quotations in research papers automaically.
It downloads pdf papers related to the research topic from arXiv, then extracts only the relevant
sentences from these papers based on key terms or phrases entered by the user.
It then displays these senteces in a txt file, along with their references.
It can be used as either a script file or a web application.
1- Download the files in this repo.
Note: If you want to use this tool as a web application (see below), and assuming you are using xampp,
move the files to htdocs
2- Run the following command while inside the folder downloaded from the repo :
py -m pip install .
The script file, extract.py
, takes five arguments:
py extract.py search_arxiv_and search_arxiv_or dest max_len search_pdf
Where:
Argument | Notes |
---|---|
search_arxiv_and |
comma-separated list of search terms that must all be in the result E.g.: nuclear, energy translates to nuclear AND energy If you don't want to use this argument, set it as "0" |
search_arxiv_or |
comma-separated list of search terms where at least one must be in the result E.g.: nuclear, energy translates to nuclear OR energy If you don't want to use this argument, set it as "0" |
dest |
directory to save results to, it's recommended to use a different directory for each time the tool is used |
max_len |
maximum number of papers to be installed |
search_pdf |
extract sentences from the papers that contain this phrase/word in them |
py extract.py "air, filter" "0" C:/Users/JohnDoe/Desktop/myresults 3 "filter"
The above will download 3 papers that have the words air AND filter, and will extract the sentences that have the word filter from them.
py extract.py "nuclear energy, harms " "0" C:/Users/JohnDoe/Desktop/myresults 10 "danger"
The above will download 10 papers that have the words nuclear energy AND harms, and will extract the sentences that have the word danger from them.
py extract.py "nuclear energy, generator" "power, electricity" C:/Users/JohnDoe/Desktop/myresults 5 "danger"
The above will download 5 papers that have the words nuclear energy AND generator power OR electricity, and will extract the sentences that have the word danger from them.
This is quite similar to the script format, except that it comes with a simple GUI.
To use the web application, type localhost/[put name of directory where files are saved inside htdocs/interface.php
in your web browser.
It will probably look something like this: localhost/ArXiv-Sentence-Extractor-master/interface.php
After using the tool, the directory mentioned in dest
will contain the papers found in arXiv as PDFs, as well as a file called results.txt
, which includes the extracted sentences with their reference.
There will also be another txt file called convert.txt
, which contains the text of all the downloaded papers.