yewno

yewno project

Project Workflow

MVP1: Create Gutenberg book pipeline

- Create a pipeline to pull data from Gutenberg.org - STALL
	- Saved books to specific file folder for now
- Use that pipeline to save a few books into csv format - COMPLETE
- Expansion idea: Retrieve books based on more metadata (i.e. Author, Topic/Subject)
- Database idea: Set up a database rather than saving books as CSV

MVP2: Preprocess book data

- Preprocess the book data in the data folder - COMPLETE
- Save the preprocessed data - COMPLETE
- Expansion idea: More pre process steps to get cleaner text (i.e. stemming, etc.)

MVP3: Test book data for language

- Create algorithm to test each sentence in book for language - COMPLETE
- Save the algorithmic percentage of language per sentence per book - COMPLETE
- Expansion idea: Extend more ways to detect languages, compare against each other

MVP4: Create feedback loop for crowd sourced corrections

- If a crowd source response to a sentence corrects that sentence language, if that response is "overwelming", then change the label from the detected language to the crowd sourced language

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Code		Code
Data		Data
Notebooks		Notebooks
Notes		Notes
Output		Output
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code

Code

Data

Data

Notebooks

Notebooks

Notes

Notes

Output

Output

.DS_Store

.DS_Store

.gitignore

.gitignore

README.md

README.md

Repository files navigation

yewno

Project Workflow

MVP1: Create Gutenberg book pipeline

MVP2: Preprocess book data

MVP3: Test book data for language

MVP4: Create feedback loop for crowd sourced corrections

MVP5: Create front end for the solution

MVP6: Now, with labeled data, we can train models to detect language using machine learning

About

Releases

Packages

Contributors 2

Languages

skyballin/yewno

Folders and files

Latest commit

History

Repository files navigation

yewno

Project Workflow

MVP1: Create Gutenberg book pipeline

MVP2: Preprocess book data

MVP3: Test book data for language

MVP4: Create feedback loop for crowd sourced corrections

MVP5: Create front end for the solution

MVP6: Now, with labeled data, we can train models to detect language using machine learning

About

Resources

Stars

Watchers

Forks

Languages