Semantic-Scholar-Evaluation

This project investigate the coverage and the role of Semantic Scholar (S2) search engine in condunting secondary studies in software engineering.

For the execution of scripts, you have:

download the latest S2 corpus from http://api.semanticscholar.org/corpus/download/ and put it in the following path: data/sscholardump/.
install all required packages from requiremnts.txt file.

The project contains 4 main folders:

data:

This folder includes the used data for the elaboration of the project:

cso : Computer Science Ontology described in json file
swebok: Software Engineering Body of Knowledge described in json file
sscholardump: this where the dump needs to be saved for proper execution of scripts

results:

This folder includes the set of obtained results:

Findings.xlsx : includes final and intermediate results of the project
Studies.bib: includes the metadata of included studies in the elaborated review
Metadata.bib: includes the metadata of all the included papers in the selected studies (Stduies.bib)

scripts:

This folder includes python scripts used for the automatic elaboration of the project:

bibtexloader.py: enabels loading bibtex files and get needed information to be searched in the S2 dump
onto_handler.py: enabels cleaning cso.owl and tronsform it into appropriate json file
locate_papers_in_corpus.py: implement the preliminary searches where papers are located in the corpus
semantic_scholar_search.py: implement function to search in corpus within provided queries; it also imlement the snowballing process
query_analyzer.py: implement search query construction and expansion using ontology terms
main.py: is the main file used to launch the execution of the script.

studies:

This folder gives for each selected review in the study:

All.bib: list of all the included papers by the review
-Query.bib: list of papers not identified by the original query. References highlighted in red are missing from Semantic Scholar; those highlighted in yellow are found by the query but under a different research field than computer science; those highlighted in orange are also identified by the query but out oyear ranges specified in the review.
-Snowballing.bib: list of papers not identified after snowballing
-Ontology.bib: list of papers not identified after searching with refined queries

datasets for automatic screening of papers:

Each dataset incorporates the set of included studies for a specific SLR stated by the correspondent authors, extracted and saved in a readable format (.bib). In order to get a reasonable set of excluded studies, we applied the same query for each SLR into Scopus, we adopted the same inclusion criteria as mentioned in original SLRs: period, type and language of publications. The set of studies returned by Scopus and not included in SLRs are considered as excluded studies.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Datasets for automatic screening of papers		Datasets for automatic screening of papers
data		data
results		results
scripts		scripts
studies		studies
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets for automatic screening of papers

Datasets for automatic screening of papers

data

data

results

results

scripts

scripts

studies

studies

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Semantic-Scholar-Evaluation

data:

results:

scripts:

studies:

datasets for automatic screening of papers:

About

Releases

Packages

Languages

hannousse/Semantic-Scholar-Evaluation

Folders and files

Latest commit

History

Repository files navigation

Semantic-Scholar-Evaluation

data:

results:

scripts:

studies:

datasets for automatic screening of papers:

About

Resources

Stars

Watchers

Forks

Languages