Skip to content

downloads all the bibtex citations of a conference from an ACM DL url

Notifications You must be signed in to change notification settings

tangym/acm-citation-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

acm-citation-crawler

Introduction

This program downloads all the bibtex citations from an ACM conference url.

How to use

Follow these steps:

  • Find a ACM conference proceedings page, for example RecSys 14'. Notice the page should switch to flat view and contains the table of contents (links to each paper).

  • Hit Ctrl-S to save the conference page in the directory where the program locates.

  • Copy the whole file name of the saved web page, including the extension .html, and paste into pages.txt, one file name per line. The program will automatically parse each web page indicated in pages.txt and extract bibtex citations to corresponding files.

  • (Maybe optional) Find some free proxies and create proxy.txt file in the following format, one proxy per line:

<protocol>://<ip>:<port>

For example, the content of proxy.txt file can be

http://1.2.3.4:80
https://5.6.7.8:90
  • Run command prompt and run python crawler.py. If it warns some package is not installed, maybe you can try pip install -r requirements.txt. If the program fails to parse citations in the conference proceedings page, check whether there are multiple elements belongs to text12 class, delete all others and only keep the one which contains citations.

The program will take a while to finish collecting all the bibtex citations, because ACM library limits the connection speed from the same IP. The program may also fail crawling some citations sometimes, and it will not out put the fail citations' information. So if it fails, just try again util it successes. :)

BibTex format citation

JabRef group comments gramma

@comment{jabref-meta: groupsversion:3;}

@comment{jabref-meta: groupstree:
0 AllEntriesGroup:;
1 ExplicitGroup:1\;0\;b\;;
2 ExplicitGroup:1.1\;0\;a\;;
2 ExplicitGroup:c\;0\;;
}

[depth] ExplicitGroup:[group name];0;[bibtex key1];[bibtex key2];;

The parent node is indicated by the nearest previous line which depth is less than the current node.

Once you have a .bib citation file, open it with JabRef, and browse the citations. There's a set of extensions which can help to download all pdfs automatically.

-- EOF --

About

downloads all the bibtex citations of a conference from an ACM DL url

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages