GitHub

PDF Scraper and Language Detector

A technical overview of the Scraping PDF Tabular data and Detect Language of the PDF Docuemnt.

Explore the docs »

View Demo · Report Bug · Request Feature

About The Project

This project manages two tasks.

PDF Tabular Data to JSON export
Detect language of both Image based and Text based PDFs

Built With

This is an open source project; built with;

Getting Started

Project can be started by Cloning the GitHub and Installing required Packages.

Prerequisites

python3.8

$ sudo apt update -y
$ sudo apt install python3.8

create a virtual environment

cd <Project DIR>
python3 -m venv venv

Activate virtual environment

source venv/bin/activate

Installation

Clone the repo

git clone https://github.com/virajds/PDFTools

Install packages

pip install -r PDFTools/requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
PDFImageLanguageDitector.py		PDFImageLanguageDitector.py
PDFLanguageDitector.py		PDFLanguageDitector.py
PDFScrape.py		PDFScrape.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDFImageLanguageDitector.py

PDFImageLanguageDitector.py

PDFLanguageDitector.py

PDFLanguageDitector.py

PDFScrape.py

PDFScrape.py

README.md

README.md

Repository files navigation

PDF Scraper and Language Detector

Table of Contents

About The Project

Built With

Getting Started

Prerequisites

Installation

About

Releases

Packages

Languages

virajds/PDFTools

Folders and files

Latest commit

History

Repository files navigation

PDF Scraper and Language Detector

Table of Contents

About The Project

Built With

Getting Started

Prerequisites

Installation

About

Resources

Stars

Watchers

Forks

Languages