Skip to content

shyok0/String_parser-from_document

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

String_parser-from_document

Strip pdf/text into strings

args in: {file_name, type = 'file_type', delete = [], start = [], end = [], both = [], lower_case = Bool}

file_name: Name of the file with extension as string
path   : only specify if the file is in a different directory, input as string
type   : 'txt' or 'pdf'
delete : Removes the char everywhere in the string_list
start  : Removes the char at the start of string_list but not everywhere
end    : Removes the char at the end of string_list but not everywhere
both   : Removes the char at either ends. If both is defined, start and end lists are redundant
lower_case: Boolean input for forcing all chars to lower case; True by default

No packages required unless you wish to parse strings from pdf, then the only requirement is the package 'tika'

Note: tika requires java 7+ to run

Errors while running tika: Make sure that Java version 7+ is installed, added to path variable and PC restarted once Java is installed

About

Strip pdf/text into strings

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages