示例#1
0
文件: session-3.py 项目: datakid/nltk
merged = merger(aust.results, [1, 2],  newname = 'australian(s)')
plot('After merging Australian and Australians', merged, num_to_plot = 2)

# <headingcell level=4>
# conc()

# <markdowncell>
# The final function is *conc()*, which produces concordances of a subcorpus based on a Tregex query. Its main arguments are:

# 1. A subcorpus to search *(remember to put it in quotation marks!)*
# 2. A Tregex query

# <codecell>
# here, we use a subcorpus of politics articles,
# rather than the total annual editions.
conc(os.path.join(path,'1966'), r'/(?i)\baustral.?/') # adj containing a risk word

# <markdowncell>
# You can set *conc()* to print *n* random concordances with the *random = n* parameter. You can also store the output to a variable for further searching.

# <codecell>
randoms = conc(os.path.join(path,'1963'), r'/(?i)\baustral.?/', random = 5)
randoms

# <markdowncell>
# *conc()* takes another argument, window, which alters the amount of cowordsext appearing either side of the match.

# <codecell>
conc(os.path.join(path,'1981'), r'/(?i)\baustral.?/', random = 5, window = 50)

# <markdowncell>
示例#2
0
# <codecell>
#

# <markdowncell>
# ### conc()

# <markdowncell>
# `conc()` produces concordances of a subcorpus. Its main arguments are:

# 1. A subcorpus to search *(remember to put it in quotation marks!)*
# 2. A query

# If your data consists of parse trees, you can use a Tregex query. If your data is one or more plain-text files, you can just a regex. We'll show Tregex style here.

# <codecell>
lines = conc('data/nyt/years/1999', r'/JJ.?/ << /(?i).?\brisk.?\b/') # adj containing a risk word

# <markdowncell>
# You can set `conc()` to print only the first ten examples with `n = 10`, or ten random these with the `n = 15, random = True` parameter.

# <codecell>
lines = conc('data/nyt/years/2007', r'/VB.?/ < /(?i).?\brisk.?\b/', n = 15, random = True)

# <markdowncell>
# `conc()` takes another argument, window, which alters the amount of co-text appearing either side of the match. The default is 50 characters

# <codecell>
lines = conc('data/nyt/topics/health/2013', r'/VB.?/ << /(?i).?\brisk.?\b/', n = 15, random = True, window = 20)

# <markdowncell>
# `conc()` also allows you to view parse trees. By default, it's false: