bashlex is a Python port of the parser used internally by GNU bash.
For the most part it's transliterated from C, the major differences are:
- it does not execute anything
- it is reentrant
- it generates a complete AST
$ pip install bashlex
$ python
>>> import bashlex
>>> parts = bashlex.parse('true && cat <(echo $(echo foo))')
>>> for ast in parts:
... print ast.dump()
ListNode(pos=(0, 31), parts=[
CommandNode(pos=(0, 4), parts=[
WordNode(pos=(0, 4), word='true'),
]),
OperatorNode(op='&&', pos=(5, 7)),
CommandNode(pos=(8, 31), parts=[
WordNode(pos=(8, 11), word='cat'),
WordNode(pos=(12, 31), word='<(echo $(echo foo))', parts=[
ProcesssubstitutionNode(command=
CommandNode(pos=(14, 30), parts=[
WordNode(pos=(14, 18), word='echo'),
WordNode(pos=(19, 30), word='$(echo foo)', parts=[
CommandsubstitutionNode(command=
CommandNode(pos=(21, 29), parts=[
WordNode(pos=(21, 25), word='echo'),
WordNode(pos=(26, 29), word='foo'),
]), pos=(19, 30)),
]),
]), pos=(12, 31)),
]),
]),
])
It is also possible to only use the tokenizer and get similar behaviour to shlex.split, but bashlex understands more complex constructs such as command and process substitutions:
>>> bashlex.split('cat <(echo "a $(echo b)") | tee'')
['cat', '<(echo "a $(echo b)")', '|', 'tee']
..compared to shlex:
>>> shlex.split('cat <(echo "a $(echo b)") | tee')
['cat', '<(echo', 'a $(echo b))', '|', 'tee']
The examples/ directory contains a sample script that demonstrate how to traverse the ast to do more complicated things.
Currently the parser has no support for:
- arithmetic expressions $((..))
- the more complicated parameter expansions such as ${parameter#word} are taken literally and do not produce child nodes
I wrote this library for another project of mine, explainshell which needed a new parsing backend to support complex constructs such as process/command substitutions.
The license for this is the same as that used by GNU bash, GNU GPL v3+.