Skip to content

python implementation of jordansissel's grok regular expression library

License

Notifications You must be signed in to change notification settings

fanfannothing/pygrok

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pygrok

A Python library to parse strings and extract information from structured/unstructured data

What can I use Grok for?

  • parsing and matching patterns in a string(log, message etc.)
  • relieving from complex regular expressions.
  • extracting information from structured/unstructured data

Installation

first, install regex, simply:

    $ pip regex

or from source:

download regex from https://pypi.python.org/pypi/regex, uncompress and install

    $ python setup.py install

then download, uncompress and install pygrok from here:

    $ tar zxvf pygrok-xx.tar.gz
    $ cd pygrok_dir
    $ sudo python setup.py install

Getting Started

>>> import pygrok
>>> text = 'gary is male, 25 years old and weighs 68.5 kilograms'
>>> pattern = '%{WORD:name} is %{WORD:gender}, %{NUMBER:age} years old and weighs %{NUMBER:weight} kilograms'
>>> print pygrok.grok_match(text, pattern)
{'gender': 'male', 'age': '25', 'name': 'gary', 'weight': '68.5'}

Pretty Cool ! Some of the pattern you can use are listed here:

`WORD` means \b\w+\b in regular expression.
`NUMBER` means (?:%{BASE10NUM})
`BASE10NUM` means (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))

other patterns such as `IP`, `HOSTNAME`, `URIPATH`, `DATE`, `TIMESTAMP_ISO8601`, `COMMONAPACHELOG`..

See All patterns here

More details

Beause python re module does not support regular expression syntax atomic grouping(?>),so pygrok requires regex to be installed.

pygrok is inspired by Grok developed by Jordan Sissel. This is not a wrapper of Jordan Sissel's Grok and totally implemented by me.

Grok is a simple software that allows you to easily parse strings, logs and other files. With grok, you can turn unstructured log and event data into structured data.Pygrok does the same thing.

I recommend you to have a look at logstash filter grok, it explains how Grok-like thing work.

pattern files come from logstash filter grok's pattern files

TODO

I use Trello to manage TODO list of this project.

Contribute

  • You are encouraged to fork, improve the code, then make a pull request.
  • Issue tracker

Get Help

mail:garygaowork@gmail.com
twitter:@garyelephant

About

python implementation of jordansissel's grok regular expression library

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 60.9%
  • JavaScript 39.1%