Python WikiAccessor.getIndex 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pywikiaccessor.wiki_accessor

클래스/타입: WikiAccessor

메소드/함수: getIndex

hotexamples.com에서의 예제들: 2

Python WikiAccessor.getIndex - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pywikiaccessor.wiki_accessor.WikiAccessor.getIndex에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

WikiAccessor(12)

getIndex(2)

자주 사용되는 메소드들

WikiAccessor (12)

getIndex (2)

예제 #1

파일 보기

def buildHeaders(categories, prefix):
    directory = "C:\\WORK\\science\\onpositive_data\\python\\"
    accessor = WikiAccessor(directory)
    categoryIndex = accessor.getIndex(CategoryIndex)
    titleIndex = accessor.getIndex(TitleIndex)
    documentTypes = accessor.getIndex(DocumentTypeIndex)

    pages = set()
    for cat in categories:
        categoryId = categoryIndex.getIdByTitle(cat)
        catPages = categoryIndex.getAllPagesAsSet(categoryId)
        pages.update(catPages)
    with codecs.open(directory + 'titles.txt', 'w', 'utf-8') as f:
        for p in list(pages):
            if (documentTypes.isDocType(p, 'person')
                    or documentTypes.isDocType(p, 'location')
                    or documentTypes.isDocType(p, 'entertainment')
                    or documentTypes.isDocType(p, 'organization')
                    or documentTypes.isDocType(p, 'event')):
                pages.discard(p)
            else:
                # print(titleIndex.getTitleById(p))
                f.write(titleIndex.getTitleById(p) + '\n')
        f.close()
    print(len(pages))
    hb = HeadersFileBuilder(accessor, list(pages), prefix)
    hb.build()
    hi = HeadersFileIndex(accessor, prefix)
    stat = hi.getAllStat()
    with codecs.open(directory + 'headers.txt', 'w', 'utf-8') as f:
        for item in stat:
            if item['cnt'] == 1:
                break
            print(item['text'] + ": " + str(item['cnt']))
            f.write(item['text'] + ": " + str(item['cnt']) + '\n')
        f.close()

예제 #2

파일 보기

파일: wiki_headers.py 프로젝트: egoralvolk/pywikitext

                'text': element[1],
                'cnt': element[2]
            })
        return res


if __name__ == "__main__":
    #regex1 = re.compile('\n[ \t]*==([^=]*)==[ \t\r]*\n')
    #text = " kdkd\n == kdkd==\n"
    #match = regex1.search(text)
    #print(match.end())
    from pywikiaccessor.title_index import TitleIndex
    directory = "C:\\WORK\\science\\onpositive_data\\python\\"
    accessor = WikiAccessor(directory)
    docTypesIndex = DocumentTypeIndex(accessor)
    docIds = docTypesIndex.getDocsOfType("substance")
    titleIndex = accessor.getIndex(TitleIndex)
    for docId in docIds:
        print(titleIndex.getTitleById(docId))
    doc_id = titleIndex.getIdByTitle("ALCAM")
    print(docTypesIndex.getDocTypeById(doc_id))
#hb = HeadersDBBuilder(accessor,list(docIds))
#hb.build()
#hb.preProcess()
#hb.processDocument(doc_id)
#hi = HeadersDBIndex(accessor)
#hi.getCountHeadersForDoc(docIds)
#stat = hi.getAllStat(docIds)
#for s in stat:
#    print (s['text']+": "+str(s['cnt']))