Skip to content

vbyravarasu/nifty

 
 

Repository files navigation

Nifty

Nifty is a library of utility functions and classes that simplify various common tasks in Python programming - a handy add-on to standard libraries that makes Python even easier to use. In addition, Nifty contains a number of advanced tools for web scraping, data processing and data mining. Brought to you by Marcin Wojnarski (Twitter, LinkedIn). Licensed on GPL.

Contents

Basic utilities in nifty.util, including 100 one-liners for common tasks:

  • is...() dynamic type checking: isstring, isint, isnumber, islist, istuple, isdict, istype, isfunction, isiterable, isgenerator, ...
  • classes and types inspection: classname, issubclass, baseclasses, subclasses, types
  • objects, generic types with extended interface: Object, NoneObject, ObjDict
  • collections: unique, flatten, list2str, obj2dict, dict2obj, subdict, splitkeys, lowerkeys, getattrs, setattrs, copyattrs, setdefaults, Heap
  • strings and text: merge_spaces, ascii, prefix, indent
  • JSON encoding & serialization of arbitrary objects: JsonObjEncoder, dumpjson, printjson, JsonDict
  • numbers: minmax, percent, bound, divup, noise, mnoise, parseint
  • date & time: Timer, now, nowString, utcnow, timestamp, asdatetime, convertTimezone, secondsBetween (minutes, hours, ...), secondsSince (minutes, hours, ...)
  • files: fileexists, normpath, filesize, filetime, filectime, filedatetime, readfile, writefile, Tee
  • file folders: normdir, listdir, listdirs, listfiles, findfiles, ifindfiles
  • concurrency: Lock, NoneLock

Text processing routines in nifty.text:

  • Levenshtein distance: levenshtein, levendist, levenscore
  • Bag-of-words model with TF-IDF weights: WordsModel
  • N-grams: ngrams

Web scraping tools in nifty.redex:

  • Redex patterns - a new language for extracting data from any markup document. Similar in spirit and structure to regular expressions (regex), but better suited to searching in large tagged documents. Bridges the gap between regex and XPaths as used in web scraping. Combines consistency and compactness of regexes (single pattern matches all document and extracts multiple variables at once) with strength and precision of XPaths: redex pattern is defined in a form much simpler than regexes and can span multiple fragments of the document, providing precise context where each fragment is allowed to match.
  • parsing of basic data types from human-readable formats used in web pages: pdate, pdatetime, pint, pfloat, pdecimal, percent
  • url absolutization & unquoting: url, url_unquote

Data Pipes. Architecture for scalable pipeline-oriented processing of Big Data, in nifty.data.pipes.

Data storage and object serialization with a new DAST format, in nifty.data.dast.

For more information, check pydocs and comments in the source code. Other modules to be documented in the near future.

Nifty includes code of Waxeye, a PEG parser generator (MIT license) used to generate parser for Redex.

Use cases

Projects that use Nifty:

  • Paperity, an aggregator of scholarly literature

About

Nifty Python toolbox

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.8%
  • Scheme 11.2%
  • Java 6.8%
  • C 5.8%
  • JavaScript 1.3%
  • Ruby 1.1%
  • Other 1.0%