Deprecated version
BIBOT stand for bibliography bot, it uses natural language processing (NLP) approaches to parse the content of abstracts of large number of publications; NLP is an emerging field of machine learning, which aims at capturing the meaning of sentences and texts written in a natural language (English), such as scientific articles. BIBOT is written in python 2.7 language, and is designed to interact with the Medline database through the NCBI API, in order to retrieve abstracts, articles and meta-data about theses articles (year of publication, author list, journal of publication, language, conflict of interest statement, etc.).
- Biopython
- unidecode
- nltk
- itertools
- os
- time
- shutil
- datetime
- glob
- getopt
- sys
Most of the python module used by BIBOT are native for the 2.7 python version, just type the following lines to install the other depedencies:
Command line:
pip install biopython
pip install bioservices
pip install nltk
pip install unidecode
Python console:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
You can then use BIBOT as a standard python script.
BIBOT v1 take 2 mandatory parameters and one optionnal parameters
- -a --action (mandatory)
- -r --rterm (mandatory)
- -c --conf (optionnal)
action is the global action to perform, only 2 options available for now!
- run
- debug
run for a classic run of BIBOT, debug to display a few informations about the initialisation of variables.
The list of terms you want to use to screen the NCBI database. Each combination of at least two keywords is used to generate a query, which leads to (2^n)-n-1 generated queries for a run, where n is the number of keywords provided to the program. rterms can be a semi-col delimited list of terms or the name of a txt file containing a semi-col delimited list of terms.
conf is an optionnal parameter, the name of a configuration file. The configuration file should be a semi-col delimited file and should wontain the following lines:
- min year
- authorized languages
- validation keywords
several validation keywords lines cab be add, only articles matching a term in each of the validation keywords list will be selected.
conf file exemple:
min year;2015
authorized languages;eng
validation keywords;autoimmunity,SLE,RA
validation keywords;machine learning, big data, artificial intelligence
if no configuration file is provide, default values are used.
python bibot.py -a run -r "machine learning;SjS;big data" -c myconf.csv
When the run is complete, bibot store the selected articles in the abstract subolder and the corresponding meta data in the meta subfolder. each articles is designed by it's pmid.