Overview

This project has moved to this repository

First Step

git clone https://github.com/russelljjarvis/ScienceAccess.git
cd ScienceAccess

If you don't have python3:

sudo bash install_python3.sh

Installation Apple

sudo bash apple_setup.sh

Installation Linux

sudo bash setup.sh

Run

streamlit run app.py

Manuscript

Overview

Understanding a big word is hard, so when big ideas are written down with lots of big words, the large pile of big words is also hard to understand.

We used a computer to quickly visit and read many different websites to see how hard each piece of writing was to understand. People may avoid learning hard ideas, only because too many hard words encountered in the process. We think we can help by explaining the problem with smaller words, and by creating tools to address the problem.

Why Are We Doing This?

We want to promote clearer and simpler writing in science, by encorouging scientists in the same field to compete with each other over writing more clearly.

How Are we Doing This?

Machine Estimation of Writing Complexity:

The accessibility of written word can be approximated by a computer program that reads over the text and guesses the mental difficulty, associated with comprehending a written document. The computer program maps reading difficult onto a quantity that is informed by the cognitive load of the writing, and the number of years of schooling needed to decode the language in the document. For convenience, we can refer to the difficulty associated with the text as the 'complexity' of the document.

How do some well-known texts do?

First, we sample some extremes in writing style, and then we will tabulate results, so we have some nice reference points to help us to make sense of other results. On the lower and upper limits we have: XKCD: Pushing the limits of extremely readable science and for some comparison, we wanted to check some Machine generated postmodern nonesense

Higher is worse:

complexity	texts
6.0	upgoer5
9.0	readability of science declining
14.0	science of writing
14.9	mean wikipedia
16.5	mean post modern essay generator

Some particular cases:

complexity	texts
13.0	this readme.md
17.0	The number of olfactory stimuli that humans can discriminate is still unknown
18.68	Intermittent dynamics and hyper-aging in dense colloidal gels
37.0	Phytochromobilin C15-Z,syn - C15-E,anti isomerization: concerted or stepwise?

Proposed Remedies:

1 Previously I mentioned creating tools to remedy inaccessible academic research> One tool, that functions as a natural extension of this work, is to enable 'clear writing' tournaments between prominent academic researchers, for example:

mean complexity	author
28.85	professor R Gerkin
29.8	[ other_author]
30.58	[other_author]

Example code for the proposed tool would allow you to select academic authors who then play out a competition demand, and to utilize their writing contributions in the context of a tournament where academic tournament members compete to write simpler text. A more recently maintained version of that file

2 A different remedy proposal is to run the text through simplify, evaluate complexity after translating the document simplify. How different are the scores?

The Following is a plot of the Distribution of Science Writing Versus non-science writing the ART Science corpus:

The science writing niche is characterized, by having a mean reading grade level of 18, neutral, to negatively polarized sentiment type and close to an almost complete absence of subjectivity. Science writing is more resistant to file compression, meaning that information entropy is high, due to concise, coded language. These statistical features, give quite a lot to go on, with regards to using language style to predict the scientific status of a randomly selected web document. The same notion of entropy being generally higher in science is corroborated with the perplexity measure, which measures how improbable the particular frequency distribution of words of observed in a document was.

Developer Overview

Non-scientific writing typically exceeds genuine scientific writing in two important aspects: in contrast to genuine science, non-science is often expressed with a less complex, and more engaging writing style. We believe non-science writing occupies a more accessible niche, that academic science writing should also occupy.

Unfortunately, writing styles intended for different audiences, are predictably different We show that computers can learn to guess the type of a written document: blog, Wikipedia, opinion, and traditional science, by first sampling a large variety of web documents, and then classifying using sentiment, complexity, and other variables. By predicting which of the several different niches a document occupies, we are able to characterize the different writing types and to describe strategies to remedy writing complexity.

Multiple stakeholders benefit when science is communicated with lower complexity expression of ideas. With lower complexity science writing, knowledge would be more readily transferred into public awareness, additionally, the digital organization of facts derived from journal articles would occur more readily, as successful machine comprehension of documented science would likely occur with less human intervention.

The impact of science on society is likely proportional to the accessibility of written work. Objectively describing the character of the different writing styles will allow us to prescribe how, to shift academic science writing into a more accessible niche, where science can more aggressively compete with pseudo-science, and blogs.

Similar projects.

Name		Name	Last commit message	Last commit date
Latest commit History 840 Commits
.github/workflows		.github/workflows
Documentation		Documentation
install		install
science_access		science_access
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
__init__.py		__init__.py
align_data_sources.py		align_data_sources.py
analysis.py		analysis.py
app.py		app.py
apple_setup.sh		apple_setup.sh
authors.py		authors.py
benchmarks.p		benchmarks.p
crawl.py		crawl.py
enter_author_name.py		enter_author_name.py
gecko_install.sh		gecko_install.sh
get_bmark_corpus.py		get_bmark_corpus.py
install.sh		install.sh
install_python3.sh		install_python3.sh
license.md		license.md
more_authors_results.p		more_authors_results.p
nltk.txt		nltk.txt
online_app_backend.py		online_app_backend.py
paper.bib		paper.bib
paper.md		paper.md
requirements.txt		requirements.txt
scholar.py		scholar.py
scrape.py		scrape.py
setup.py		setup.py
setup.sh		setup.sh
t_analysis.py		t_analysis.py
utils.py		utils.py

License

russelljjarvis/ScienceAccessibility

Folders and files

Latest commit

History

Repository files navigation

Overview

Why Are We Doing This?

How Are we Doing This?

Machine Estimation of Writing Complexity:

How do some well-known texts do?

Some particular cases:

Proposed Remedies:

The Following is a plot of the Distribution of Science Writing Versus non-science writing the ART Science corpus:

Developer Overview

About

Topics

Resources

License

Stars

Watchers

Forks

Languages