A technical overview of the Scraping PDF Tabular data and Detect Language of the PDF Docuemnt.
Explore the docs »
View Demo · Report Bug · Request Feature
This project manages two tasks.
- PDF Tabular Data to JSON export
- Detect language of both Image based and Text based PDFs
This is an open source project; built with;
Project can be started by Cloning the GitHub and Installing required Packages.
- python3.8
$ sudo apt update -y
$ sudo apt install python3.8
- create a virtual environment
cd <Project DIR>
python3 -m venv venv
- Activate virtual environment
source venv/bin/activate
- Clone the repo
git clone https://github.com/virajds/PDFTools
- Install packages
pip install -r PDFTools/requirements.txt