Skip to content

kawai924/SementicNYWeatherAccident

Repository files navigation

Generating and querying semantic data using NY weather and accident data

CECS 571 - Fundamentals of Semantic Web Technologies

Project 2 - Generate semantic data
Team 3: Dennis Lo, Andreas Saplacan, Mandar Vijay Kulkarni, Vatsal Patel

Project 3 - Querying semantic data
Team 2: Dennis Lo, Andreas Saplacan, Upasana Garg, Aditi Tomar, Gayathri Venna

This project converts 2 datasets referencing New York weather and crash accident reports from plain .csv to the semantic standard in .rdf. It gives the datasets a shared meaning and relationships of weather and accident concepts and enables systems to infer knowledge. The project also includes queries demonstrating the ability to answer complex questions using SPARQL as query language.

Project structure

.
├── converter                       # Contains converter scripts for weather and accident  (project 2)
├── data                            # Hosts input and output data  (project 2)
│   ├── csv                         # Input: datasets in .csv format
│   └── rdf                         # Output: populated ontologies in .rdf format (in zip file)
├── ontology                        # Contains generated ontologies using Protege in .ttl format  (project 2)
├── query                           # Contains query and graph scripts for executing SPARQL queries  (project 3)
│   └── output                      # Output: result of SPARQL queries in HTML format  (project 3)
├── venv                            # Dependencies needed to run the project
├── entrypoint.py                   # Main entry point of the program to execute converter and queries
├── NY_weather_data_extraction.py   # Script to pull and generate weather data into data/csv
└── README.md

How to convert .csv to meaningful .rdf

  1. Gather datasets: Use API's to pull the dataset from a given web service or search and download a dataset from https://www.data.gov/.

  2. Design ontology: Use Protege to construct domain models and knowledge based concepts.

Follow this tutorial on how to use Protege for ontology design

  1. Populate ontology: Add instances to the generated ontology using Python and RDFLib.

Documentation on how to create graphs and triplets using RDFLib

How to query an ontology using SPARQL and RDFLib

Use RDFLib to query graph using SPARQL syntax.

Follow this tutorial on how to create simple queries

Setting up and running the project

Disclaimer: This project was developed and tested using MacOS.

Python has to be installed on the machine (comes by default with XCode on MacOS)
The project comes with all dependencies needed in the venv.zip

  1. Open the terminal and clone the project repository
git clone https://github.com/kawai924/SementicNYWeatherAccident.git
  1. Open the project root folder in the file browser and unzip the shipped dependency file venv.zip

There are 2 ways to run this project: Using (a) PyCharm IDE or using (b) the Terminal.

(a) Using PyCharm:

  1. Open the project in PyCharm

  2. Pick your python interpreter Python 3 in the configuration at the top right corner

  3. Press Run at the top right corner to start the program

(b) Using MacOS Terminal:

  1. Navigate into the root folder of the project
cd SementicNYWeatherAccident
  1. Activate the Python virtual environment
source venv/bin/activate
  1. Execute the entrypoint script
python entrypoint.py
  1. After the script finished, exit the virtual environment
deactivate

The generated RDF files can be found under data/rdf/*.rdf
The generated HTML files can be found under query/output/*.html

Run different parts of the project

Depending on which part of the project you want to run, comment the specific line in entrypoint.py

Leaving only line 71: Will only run project 2 - generating RDF from CSV
Leaving only line 72: Will only run project 3 - run all pre-defined queries
Leaving only line 73: Will only run project 3 - run manual query

Queries and their developers

Following are the queries that were tested on the generated ontologies for accidents and weather in NY city and their respective developers.

Query Developer
What is the monthly summary of accidents including injuries and weather data? Upasana Garg
How many accidents in Queens could have been caused by Distraction due to Thunder in 2020? Andreas Saplacan
What are the top 5 vehicle types that were involved in the most accidents in Manhattan due to ice? Andreas Saplacan
Which weather station is located in the county of Ontario? Dennis Lo
Which accident happened due to view obstruction in heavy fog? Gayathri Venna /Aditi Tomar
Input query via terminal Aditi Tomar/Gayathri Venna

Running your own query

Input query can be used to manually query our graph by inputting a query via the terminal, similar to a search engine

Steps to run input query are as follows:

  1. Uncomment line 73 execute_manual_query() in entrypoint.py and comment line 71 and line 72 to run only the manual input query
  2. Run entrypoint.py
  3. Once the query loads all RDF and the console asks you for input, just type your query in the terminal. Once you are done, press enter one time and type ";;"
  4. Press enter and query will start running
  5. Keep observing the console output for runtime information
  6. The results can be found in ./query/output/input_results.html

Prefixes for namespaces:

Prefix Namespace Description
act http://github.com/kawai924/SementicNYWeatherAccident/accident# Accident data
STA http://github.com/kawai924/SementicNYWeatherAccident/station# Weather station ID
wea http://github.com/kawai924/SementicNYWeatherAccident/weather# Weather type by station and date
wean http://github.com/kawai924/SementicNYWeather/stationID# Weather number by station and date

Web server hosting:
After execution from entrypoint.py, it will spin up a browser to serve the output files that sitting in ./query/output, at localhost:8888. Hosted by httpd package in python.

Software dependencies:

All dependencies are included in venv.zip, which has to be unzipped before running the project

Python 3 - Python interpreter
Pandas - Tool for efficient data analysis and manipulation
RDFLib - API for creating and manipulating with RDF
Iribaker - API for creating URI's easily

Data Source:

All datasets were found via https://www.data.gov

Documentation:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •