Jieba分词。先获取xlsx文件的语料内容,然后再针对语料进行分词。
开发者:沙振宇(沙师弟专栏)
创建时间:2019-12-2
最后一次更新时间:2019-12-5
CSDN博客地址:https://shazhenyu.blog.csdn.net/article/details/103403711
seg_list = jieba.cut(label, cut_all=True)
seg_list = jieba.cut(label, cut_all=False)
seg_list = jieba.cut_for_search(label)
seg_list = jieba.cut(label)
words = jieba.analyse.textrank(label, topK=50, withWeight=False, allowPOS=('ns', 'n', 'vn', 'v'))
rule = re.compile(u"[^a-zA-Z0-9\u4e00-\u9fa5]")
label = rule.sub('',label)
punctuation ="""!?。"#$%&'()*+-/:;<=>@[\]^_`{|}~⦅⦆「」、、〃》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘'‛“”„‟…‧﹏"""
re_punctuation ="[{}]+".format(punctuation)
label = re.sub(re_punctuation, "", label).strip()
workbook = xlrd.open_workbook(path)
sheet = workbook.sheet_by_index(page)
print("Sheet的名称:", sheet.name, ",行数:", sheet.nrows, ",列数:", sheet.ncols)
sheet.cell_value(rown, coln)
s = [x.strip() for x in item_arr]