Skip to content

milindmaha/SearchPDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Problem statement: Searching a particular context in pdf file without opening it

Problem description: “.pdf” documents can be thought of a graphical representation. They are unordered and can be thought of as a set of instructions which specify the place of the stuff on display. A pdf page is typically made up of textboxes, lines, figure blocks, curves, rectangles, annotations or notes etc. Table of content in pdf contains the page number as well as physical location of the content within the page number. It is possible to access the contents of pdf file via search, if we get an access to the table of content via mining techniques.

Idea: The idea here is to develop a protocol which will provide interfaces to search content within a pdf file without opening it. But this concept will be applicable to only particular set of pdf files. The backbone of this protocol will be developed in python as it compiles and executes faster thus making the user experience better.

Output: We analyzed the problems faced by all PDF users while searching any context with in multiple files or directory and thus provided a optimum possible solution to search the context within their directory on any cloud location, by using SearchPDF application

Steps to use PDF Search Site :-

1)Go to the URL:-http://searchinpdf.mybluemix.net/

2)For searching in PDF, link the Drop box account and authorizing for linking it with the PDF search applications

3)Once the drop box account is configured, user can select the path from the dropdown list, enter the context to search and click on search.

4)Final Search Result will de diaplyed and also the copy of the search result page will be placed in the user drop box

About

Python, Dropbox, PDFMiner, PDFQuery, PyPdf, Text Search,

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published