Skip to content

rzjfr/LTCrawler

Repository files navigation

LT Crawler

crawler for getting information from LT

Modules

  • books.py: all methods for finding data for given book
  • users.py: all methods for finding data for given username
  • helpers.py: some general and helper methods

Scripts

  • find_books_users.py: find books for list of members with book ids in json data from LT JSON API
  • find_books_users_html.py: find books for list of members with LT Catalogs for each member
  • find_friends_book.py: find friends for members of a book
  • find_user_data.py: get all json data from LT JSON API for all members of a given book
  • graph_friends.py: graph analysis for members of two different books
  • reviews.py: text analysis for reviews by book or user
  • xml_work_to_isbn.py: to add isbn to work record from LT feed files
  • clean_up_mess.py: to clean the data sets if something goes wrong

Folders

  • data/backup/: all backup files
  • data/book/: members page of book with work id in html format
  • data/feeds/: LT files
  • data/profile/: all information of member in json format from LT JSON API
  • data/profile/html/: all books of each member in html format
  • figures: all figures including charts graphs in PDF, PNG and SVG formats

Data Sets

###friends.json each line a username with her friends username list

{"CorkyRingspot": ["loubyloo", "wordhound"]}

*_members.json files contain friends list for members of only one book

###books.json each line a username with its book work id list

{"username": ["workid", "workid"]}

###book_review.json each line work id with list of reviews. Each review contains text, username and rank

{"workid": [{"text": "...", "name": "username", "rank": "2"}, {...}]}

###tags.json each line work id with tags. Each tag contains tag text as key and frequency as value

{"1060": {"read": "42", "fic": "2", "print book": "1"}}

tags_user.json is similar to tags.json but it contains tags for users

###isbn_to_work.csv each line isbn with its corresponding work id

isbn,workid
002901986,483379

###bookid.csv each line book id with its corresponding work id

bookid,workid
86408774,111247

###Other files

  • compare.csv: number of shared books between to members

    username,username,#of_shared_books

  • AFINN-111.text: tsv format file, english words with sentiment rank for them at each line

  • english.stop: list of english stop words at each line

##URIs

#TODO

Data preparation

  • make a work to ISBN dataset
  • find friends of a given user
  • find books of a given user
  • find tags of a work, user
  • find reviews of a work, user
  • find authors of a user
  • find members of a work

Analysis

  • compare members of two books
  • analysing reviews of the books, users

About

bunch of codes to automate getting information from LT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages