Libretti Rolandi Entity Extraction

Add description

scraper: downloads the manifests of the libretti into the folder manifests
place extraction: OCRs the coperte of the libretti and extracts tentative city name, stores csv file with existing metadata and extracted city into the folder data
fuzzy place extraction: extracts tentative city name using fuzzy match, stores new csv file into the folder data
composers extraction: extracts composer names from copertas and titles, stores new csv file into the folder data
location extraction: extracts location of the representation (i.e. name of theater/church/...), stores new csv file into the folder data
title extraction: extracts mere title from title metadatum, stores new csv file into the folder data
genre extraction: extracts opera genre from title, stores new csv file into the folder data
occasion extraction: extracts occasion of representation (i.e. carnival, fair), stores new csv file into the folder data
quick fixes: improves composer extraction and wikimedia linking, stores new csv file into the folder data
data: contains all the produced csv files in order from oldest to most recent (with librettos_8 being the final version). Furthermore, it contains a ground truth containing the expected and observed entities for 20 random libretti.

Visualization

index.html: is the header page which provides a structure of the visualization which is further built upon using the Javascript code.
code/scripts: contains all the Python scripts for preprocessing and preparing the data for visualization purposes, for e.g. get all common composer or title links.
js/mapIntegration.js: builds the structure by working with the DOM and contains the most of the logic of the visualization, for e.g. mapping theaters, visualizing links or temporally looking at the librettos.
css/style.css: contains a single CSS file which provides the styling for the visualization.

To develop the visualization locally

Working and developing on your local machine can be done with the existing code base. Additionally, to counter the Cross Origin Resource Sharing (CORS) issue, one would need to copy the Python script given below and run it in the parent directory; so that the machine hosts the data and one can work locally.

#!/usr/bin/env python3
from http.server import HTTPServer, SimpleHTTPRequestHandler, test
import sys

class CORSRequestHandler (SimpleHTTPRequestHandler):
    def end_headers (self):
        self.send_header('Access-Control-Allow-Origin', '*')
        SimpleHTTPRequestHandler.end_headers(self)

if __name__ == '__main__':
    test(CORSRequestHandler, HTTPServer, port=int(sys.argv[1]) if len(sys.argv) > 1 else 8000)

Authors

Harshdeep
Aurel Maeder
Ludovica Schaerf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

css

css

data

data

images

images

js

js

manifests

manifests

.gitignore

.gitignore

README.md

README.md

index.html

index.html

Repository files navigation

Libretti Rolandi Entity Extraction

Contents

Visualization

To develop the visualization locally

Authors

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
code		code
css		css
data		data
images		images
js		js
manifests		manifests
.gitignore		.gitignore
README.md		README.md
index.html		index.html

Harshdeep1996/Harshdeep1996.github.io

Folders and files

Latest commit

History

Repository files navigation

Libretti Rolandi Entity Extraction

Contents

Visualization

To develop the visualization locally

Authors

About

Resources

Stars

Watchers

Forks

Languages