PyPubmedText

A python wrapper for fetching Pubmed articles

If you are doing biomedical text-mining using MEDLINE abstracts or PMC articles, you might often want to build a corpus of your own. You can maintain a local copy of the MEDLINE baseline. The baseline is updated on yearly bases. You might also want to retrieve the latest. PyPubmedText is a wrapper for such purposes. It's based on the Entrez package of Biopython.

Usage: python PyPubmedText.py -f corpus.file -c config.ini Note: the corpus file is a mandatory input argument. The configuration file is set to be "config.ini" by default.

== 2013-12-09

Fix the bug brought about by unicode. The solution is to check whether a string is in unicode or not, if so, then .encode('utf-8').
DB cursors are opened and close within each function.
Take commandline arguments for corpus file name and database connection details.
Added ReadConfig module for the configuration in 3.
PyPubmedText can be imported and used as a module
You need to set up corresponding database tables using the 'create_db_tables.sql' provided

== 2013-12-05

For the moment, PyPubmedText takes a corpus file (of PMIDs) and it fetches articles first from the local database and then try to communicate with Pubmed if some are missing from the local database. All retrieved information will be stored into a dictionary (PMID as the key, a NcbiArticle object for article information)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
PyPubmedText.py		PyPubmedText.py
README.md		README.md
ReadConfig.py		ReadConfig.py
config.ini		config.ini
create_db_tables.sql		create_db_tables.sql
thyroid_cancer_pubmed_2014-01-13_nci_query.txt		thyroid_cancer_pubmed_2014-01-13_nci_query.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

PyPubmedText.py

PyPubmedText.py

README.md

README.md

ReadConfig.py

ReadConfig.py

config.ini

config.ini

create_db_tables.sql

create_db_tables.sql

thyroid_cancer_pubmed_2014-01-13_nci_query.txt

thyroid_cancer_pubmed_2014-01-13_nci_query.txt

Repository files navigation

PyPubmedText

About

Releases

Packages

Languages

chengkun-wu/PyPubmedText

Folders and files

Latest commit

History

Repository files navigation

PyPubmedText

About

Resources

Stars

Watchers

Forks

Languages