/
wordish.py
659 lines (498 loc) · 21 KB
/
wordish.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
"""
Wordish is a script which executes a shell session parsed from a
documentation in the restructured text [#]_ format, then tests and builds a
report of the execution. To mark up shell code in an article,
*wordish* uses the custom directive ``sourcecode``, with the rquired
argument ``sh``. When presented with an article:
#. *wordish* filters out the text which is not marked with
``sourcecode``,
#. then, separates the blocks of shell codes between *commands* and
*outputs*:
#. *wordish* consumes the prompt, parses for the newline which ends
command,
#. then parses for the prompt which ends the output,
Example::
.. sourcecode:: sh
~$ echo "hello world" # Mmmh, insightful comment
hello world
This simply renders like:
.. sourcecode:: sh
~$ echo "hello world" # Mmmh, insightful comment
hello world
The *command* starts after the prompt ("~$ ") and continues until the
first newline is found in the source code block. Here, it is just
after the word ``comment``. The command is ``echo "hello world" # Mmmh,
insightful comment`` The output is the text block found until the next
prompt: this is ``hello world``. There are **two** possible prompts:
``~$``, and ``~#``. Both are required to be followed by a space are are
treated the same.
Note: the newlines which are nested in curly brackets or parentheses are
**not** interpreted as an *end of command* character. Shells do the same:
curly brackets are used to define functions and parentheses makes the
nested command to be interpreted in a subprocess shell. The two
following examples from the introduction make it clear:
.. sourcecode:: sh
~$ (
echo $((1+1)) )
2
~$ sum () {
echo $(( $1 + $2 ))
}
The first command is ``echo $((1+1))`` in a subproces, and it's output
is ``2``. The second command is the definition of a function named
``sum`` and has no output. Defining functions sometimes clarify the
intent in subsequent uses of the function. For functions to be re-used,
the state of the shell must be kept between each snippets. *wordish*
keep a connection with the same *shell* subprocess (*bash* is used)
for the duration of the article.
.. sourcecode:: sh
~$ sum 42 58
3
See how the output is obviously incorrect? we will see later how this
is reported.
When the output can not be completely predicted, such as when
displaying ``$RANDOM``, or displaying the size of a partitions in
bytes, there is a handy wildcard pattern which can be used:
``...``. It matches everything like ``.*`` in regexp [#]_.
.. sourcecode:: sh
~$ echo "a random number: " $RANDOM
...
One last thing, if a command does not exit gracefully, *wordish*
precautiously aborts, refusing to execute commands on the system under
test which is in an undefined state. *wordish* displays the remaining
unexecuted commands.
.. sourcecode:: sh
~$ What have the Romans ever done for us
aqueduct? roads? wine !
~$ echo "Bye bye"
Bye bye
This introduction is embedded in the wordish module as the
docstring. Just run *wordish* with no argument to get the example
report of this article:
.. sourcecode:: sh
~$ python -m wordish
Trying: echo "hello world" # Mmmh, insightful comment...
Expecting: hello world
ok
Trying: (
echo $((1+1)) )
Expecting: 2
ok
Trying: sum () {
echo $(( $1 + $2 ))
}
Expecting:
ok
Trying: sum 42 58
Expecting: 3
Failed, got: 100, 0
Trying: echo "a random number: " $RANDOM
Expecting: ...
ok
Trying: What have the Romans ever done for us
Expecting: aqueduct? roads? wine !
Failed, got: /bin/bash: line 19: What: command not found, 127
Command aborted, bailing out
Untested command:
echo "Bye bye"
python -m wordish
6 tests found.
4 tests passed, 2 tests failed.
There is another example real world article_ which is also included in
the sources distribution, and tested before each release. This is the
article which prompted the need for the development of *wordish*.
.. _article: http://jdb.github.com/sources/lvm.txt
.. [#] This syntax can be seen as light and readable version of html
or latex, and was designed with the experience of existing Wiki
syntaxes. The Sphinx project has a nice introduction_ on *rst*,
the reference documentation is here_.
.. _introduction: http://sphinx.pocoo.org/rest.html
.. _here: http://docutils.sourceforge.net/rst.html#user-documentation
.. [#] Regexp are not directly used so that the various special regexp
characters do not need to be escaped.
"""
__version__ = '1.0.2'
from subprocess import Popen, STDOUT, PIPE
from shlex import shlex
from itertools import takewhile, chain
from StringIO import StringIO
import re
from docutils import core
from docutils.parsers import rst
def trace( decorated ):
def f( *arg,**kwarg):
print "%s: %s, %s => " % (decorated.__name__, arg, kwarg),
ret = decorated( *arg,**kwarg )
print ret
return ret
return f
class ShellSessionParser( object ):
r"""
An iterator which parses a text file of a shell session and yields
pairs of commands and outputs
>>> sess = ShellSessionParser( "~$ (echo coucou) # hello\ncoucou" )
>>> for c,o in sess: print "command : %s\noutput : %s" % (c,o)
...
command : (echo coucou) # hello
output : coucou
"""
def __init__( self, f, prompts=['~# ', '~$ '], com='', whi='' ):
r"""
The constructor takes a filename or an open file or a string
as the shell session.
The constructor sets the :attr:`tokens` member attribute with
a shlex token stream initialised with the correct options for
parsing comments and whitespace.
The token commenters and whitespace are set to the empty string
and can be modified with the function arguments 'com' and
'whi'.
If the argument shlex_object is set to True then it'is not the
list of tokens but the shlex object itself so that you can
experiment with the :obj:`shlex` and it multiple attribute and
method.
>>> list(ShellSessionParser( "Yozza 1 2" ).tokens)
['Yozza', ' ', '1', ' ', '2']
>>> tokens = ShellSessionParser( "Yozza 1 2").tokens
>>> tokens.whitespace = ' '
>>> list(tokens)
['Yozza', '1', '2']
>>> list( ShellSessionParser("Yozza # a comment you dont want to see", whi=' ', com='#' ).tokens)
['Yozza']
"""
self.tokens = shlex( f if hasattr(f, "read") else StringIO( f ))
self.tokens.commenters = com
# deactivate shlex comments facility which won't work for us.
# The terminating linefeed means two things: end of comment an
# end of command. As shlex consume the terminating linefeed,
# there is no end of command left.
self.tokens.whitespace = whi
# deactivate shlex whitespace munging. characters cited in
# ``shlex.whitespace`` are not returned by get_token. If
# empty, whitespaces are returned as they are which is what we
# want: they definitely count in bash, and may count in
# output, so we just want to keep them as they are.
self.nested = 0
self.prompts = []
for p in prompts:
s=shlex(p)
s.commenters, s.whitespace = com, whi
self.prompts.append( list( s ) )
self.max_prompt_len = max([ len(p) for p in self.prompts ])
def _has_token( self ):
r"""
Returns False if the token list is empty (which usually means
the end of file has been reached). Returns True otherwise.
>>> sess = ShellSessionParser( "~$ env | grep USER\nUSER=jd" )
>>> sess._has_token()
True
>>> for t in sess.tokens: pass
>>> sess._has_token()
False
"""
t = self.tokens.get_token()
return False if t==self.tokens.eof else self.tokens.push_token( t ) or True
def is_output( self, token ):
r"""
Returns true if the token is after the last token of the
output of a command, else returns false. The functions stops
right before the prompt which can be either '~# ' or '~$ '.
>>> sess = ShellSessionParser( "youpi\n~# " )
>>> sess.tokens.next()
'youpi'
>>> sess.is_output( 'youpi' )
True
>>> sess.tokens.next(),sess.tokens.next()
('\n', '~')
>>> sess.is_output( '~' )
False
"""
# TODO: in the unittest, make sure '~' occurs in the -1, -2, and -3 position.
# also make sure it occurs in the three last position at the same time.
if token in [ p[0] for p in self.prompts ]:
tokens = [token,] + [ self.tokens.next()
for i in range( self.max_prompt_len - 1) ]
return False if any([tokens==p for p in self.prompts ]) else (
[ self.tokens.push_token(t) for t in reversed(tokens[1:]) ] or True)
else:
return True
def is_command( self, token ):
r"""
Returns true if the token is after the last token of a command
in the section, else returns false. The functions stops right
before the linefeed but only when the linefeed is not nested in
brackets or parenthesis.
>>> sess = ShellSessionParser( "date\n" )
>>> sess.tokens.next()
'date'
>>> sess.is_command( 'date' )
True
>>> sess.tokens.next()
'\n'
>>> sess.is_command( '\n' )
False
This is the end of output, since the linefeed was not nested.
>>> sess = ShellSessionParser( "(youpi\n)\n" )
>>> [ sess.is_command( sess.tokens.next()) for i in range(3) ]
[True, True, True]
>>> sess.is_command( sess.tokens.next()); sess.is_command( sess.tokens.next());
True
False
The first linefeed was nested in a subshell, hence was
not the end of a command. The second linefedd was.
"""
if token == '\n' and self.nested==0:
return False
elif token in '({': self.nested += 1
elif token in '})': self.nested -= 1
return True
def _takewhile( self, is_output=False ):
r"""
Returns a command or an output depending of the terminator
provided. i.e. every token on which the terminator returns
False. The terminator is a function which takes a token and
returns whether the stream of tokens is terminated or not.
>>> session = ShellSessionParser
>>> session( "cmd" )._takewhile()
'cmd'
>>> session( "t () {\ncmd\n}\nhello" )._takewhile()
't () {\ncmd\n}'
"""
return ''.join (
list( takewhile(
self.is_output if is_output else self.is_command,
self.tokens )
) ).strip()
def _get_command(self):
"""
Returns a string from the current token to the end of the
next *command*.
Maybe be confused about what is the end of a command when called
in the middle of an output.
"""
return self._takewhile(is_output=False)
def _get_output(self):
"""
Returns a string from the current token to the end of the
next *output*.
"""
return self._takewhile(is_output=True)
def __iter__(self):
r"""
Returns the next pair of command and output.
TODO: some unittest on the corner cases: empty file, only command, only outputs
>>> session = ShellSessionParser
>>> sess = session( "~# cmd\noutput\n~# true\n" )
>>> for c,o in sess: print "command: %s\noutput: %s" % (c,o)
command: cmd
output: output
command: true
output:
"""
self._get_output()
while self._has_token() :
yield self._get_command(), self._get_output()
def toscript(self):
commentize = lambda o: '# %s\n' % o.replace( '\n', '\n# ' )
return "#!/bin/sh\nset -e # -x\n\n%s\n" % '\n'.join(
chain( *( ( c, commentize(o) ) for c, o in self ) ) )
class CommandOutput( object ):
# TODO: docstrings
def __init__(self, out=None, err=None, returncode=None, cmd=None, match='ellipsis' ):
self.out, self.err, self.returncode = out, err, returncode
self.cmd = cmd
assert match in ['string', 're', 'ellipsis']
self.match = getattr(self, match + '_match')
def __str__(self):
"""
The string representation of a CommandOutput is a comma
separated list of the non null value of the out, err and
returncode members (in this order).
>>> CommandOutput(out=1,err=2,returncode=3)
1, 2, 3
>>> CommandOutput(out="2010-01-22")
2010-01-22
"""
return ', '.join( [ str( getattr(self,a) )
for a in 'out', 'err', 'returncode'
if not(getattr(self,a) in [None, ''])
] )
__repr__ = __str__
def __neq__(self, other ):
return not self.__eq__(other)
def __eq__(self, other ):
if isinstance(other, basestring):
return self.match( other, self.out )
attrs = 'out', 'err', 'returncode'
if any( [ not hasattr( other, a ) for a in attrs ]):
raise TypeError( "equality: argument must either a "
"string or a CommandOutput instance.")
return all( [
self.match( str( getattr( other, a )) , str( getattr( self, a )))
for a in attrs
if getattr( other, a ) is not None
and getattr( self, a ) is not None ] )
def ellipsis_match(self,pattern,string):
if '...' in pattern:
start, end = pattern.split('...')
return string.startswith(start) and string.endswith(end)
else:
return pattern==string
def string_match(self,pattern,string):
return pattern==string
def re_match(self,pattern,string):
return re.match(pattern, string) is not None
def exited_gracefully(self):
return self.returncode==0
def aborted(self):
return self.returncode!=0
class CommandRunner ( object ):
r""" Implements a python "context manager", when entering the
context (the *with* block ), creates a shell in a subprocess,
right before exiting the context (leaving the block or processing
an exception), the shell is assured to be terminated.
The call method takes a string meant to contain a shell command
and send it to the shell which executes it. call ret output of the
command.
>>> with CommandRunner() as sh:
... sh("echo coucou")
... sh("a=$((1+1))")
... sh("echo $a")
...
coucou, 0
0
2, 0
"""
separate_stderr = True
def __enter__( self ):
if CommandRunner.separate_stderr:
self.terminator = '\necho "~$ $?" \n' + 'echo "~$ " >&2 \n'
self.shell = Popen( "/bin/bash", shell=True,
stdin=PIPE, stdout=PIPE,stderr=PIPE,
universal_newlines=True)
self.stdout = ShellSessionParser( self.shell.stdout )
self.stderr = ShellSessionParser( self.shell.stderr )
else:
self.terminator = '\necho "~$ " \n'
self.shell = Popen( "sh", shell=True,
stdin=PIPE, stdout=PIPE, stderr=STDOUT)
self.stdout = ShellSessionParser( self.shell.stdout )
self.stderr = None
return self
def __call__( self, cmd):
r"""in it and sends it to the shell via stdin. Then, stdout is
read until a prompt following a linefeed. The prompt is
suppressed and the tokens read are joined and returned as
the"""
self.shell.stdin.write( cmd + self.terminator )
return CommandOutput( *self.read_output(), cmd=cmd )
def read_output(self):
out = self.stdout._takewhile( is_output=True )
ret = int( self.stdout.tokens.next() )
err = self.stderr._takewhile( True
) if CommandRunner.separate_stderr else None
return out,err, ret
def __exit__( self, *arg):
self.shell.terminate()
self.shell.wait()
# If the child shell is hanged, maybe
# self.p.send_signal( signal.SIGKILL ) will be needed
class TestReporter( object):
# should clearly present aborted from failed, and also not tested
# will need a crazy dot graph to sort this mess
def __init__( self ):
self.passcount, self.failcount = 0, 0
self.last_expected, self.last_output = None, None
def failed( self, output ):
self.failcount += 1
return "Failed, got:\t%s\n" % output
def passed( self, output):
self.passcount += 1
return "ok\n"
def before( self, cmd, expected ):
self.last_expected = expected
return "Trying:\t\t%s\nExpecting:\t%s" % ( cmd, expected )
def after(self, output ):
self.last_output = output
return self.passed( output
) if output==self.last_expected else self.failed( output )
def summary( self ):
print "%s tests found. " % (self.passcount + self.failcount)
if self.failcount==0:
print "All tests passed"
else:
print "%s tests passed, %s tests failed." % (
self.passcount, self.failcount)
return self.failcount
class BlockSelector( object):
# this object could be a function after all, the prototype would
# be a little more complex.
# to build more complex node matching, cleanup, articles, it may
# be easier to work on the doctree directly. Or the cleanup and
# articles separators could be inlined in comments. Need to
# clearify the ins and outs of both solutions. A shared instance
# of this object is not thread safe. A function would be thread
# safe but the class Directive would be redefined whenever called
# which might not be a problem in our case, but is unneeded in the
# general case.
def __init__( self, directive='sourcecode', arg=['sh'] ):
self.f = StringIO()
class Directive( rst.Directive ):
has_content = True
def run( this ):
this.assert_has_content()
if this.content[0].split()==arg:
self.f.write( '\n'.join(this.content[2:])+'\n' )
# The content could also be returned but it is not the
# point, the point of this function is the side effect
# of writing into self.f, which is done by now.
return []
rst.directives.register_directive( directive, Directive )
def __call__( self, f ):
# The following line is only good for its side effect of
# filling self.f with the filtered content, that is why there
# is no storage of the doctree.
self.f.seek(0)
self.f.truncate()
core.publish_doctree(f.read()).traverse()
self.f.seek(0)
return self.f
#########
######### Console script entry points
import sys
def wordish():
files = sys.argv[1:] if len( sys.argv ) > 1 else (StringIO( __doc__ ),)
files = [ f if f!="-" else sys.stdin for f in files ]
files = [ f if hasattr(f, 'read') else file(f) for f in files ]
ret = 0
for f in files:
report = TestReporter()
filter = BlockSelector( directive='sourcecode', arg=['sh'])
session = iter( ShellSessionParser( filter( f ) ))
with CommandRunner() as run:
for cmd, expected in session:
print report.before( cmd, expected )
print report.after( run( cmd ) )
if report.last_output.aborted():
# sole condition if expected.returncode is None if
# expected returncode is not None and expected!=actual
# output should bailout. Test and errors should be
# different.
print( "Command aborted, bailing out")
remaining_cmds = [ cmd for cmd, _ in session ]
if len( remaining_cmds )==0:
print( "No remaining command" )
else:
print "Untested command%s:\n\t" % (
"s" if len( remaining_cmds )>1 else "" ),
print( "\n\t".join( remaining_cmds ))
ret += report.summary()
return ret
def rst2sh():
files = sys.argv[1:] if len( sys.argv ) > 1 else StringIO( __doc__ ),
filter = BlockSelector( directive='sourcecode', arg=['sh'])
for f in files:
print ShellSessionParser(
filter ( f if hasattr(f, 'read') else file(f) )
).toscript()
if __name__=='__main__':
sys.exit( wordish() )