Python Document.from_mapping 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: data

클래스/타입: Document

메소드/함수: from_mapping

hotexamples.com에서의 예제들: 5

Python Document.from_mapping - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 data.Document.from_mapping에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Document(6)

from_mapping(5)

raw_sentences(2)

_cache_nc(1)

add_child(1)

restore_documents(1)

select(1)

store_documents(1)

예제 #1

파일 보기

def read_jsonl(path, _log, _run, name='test', encoding='utf-8', lower=True):
    _log.info('Reading %s JSONL file from %s', name, path)
    with open(path, encoding=encoding) as f:
        for line in f:
            yield Document.from_mapping(json.loads(line.strip()), lower=lower)
    if SAVE_FILES:
        _run.add_resource(path)

예제 #2

파일 보기

def main(args):
    os.makedirs(args.output_dir, exist_ok=True)
    with open(args.path, encoding=args.encoding) as f:
        for line in f:
            doc = Document.from_mapping(json.loads(line.strip()),
                                        lower=args.lower)
            write_neuralsum_oracle(doc,
                                   args.output_dir,
                                   encoding=args.encoding)

예제 #3

파일 보기

파일: corpus.py 프로젝트: MinhajulMU/mysum

def read_jsonl(path, _log, _run, name='test', encoding='utf-8', lower=True, remove_puncts=True,
               replace_digits=True, stopwords_path=None):
    _log.info('Reading %s JSONL file from %s', name, path)
    if SAVE_FILES:
        _run.add_resource(path)
    stopwords = None if stopwords_path is None else read_stopwords(stopwords_path)

    with open(path, encoding=encoding) as f:
        for line in f:
            yield Document.from_mapping(
                json.loads(line.strip()), lower=lower, remove_puncts=remove_puncts,
                replace_digits=replace_digits, stopwords=stopwords)

예제 #4

파일 보기

파일: make_oracle.py 프로젝트: xhendyagsx/indosum

def main(args):
    docs = []
    with open(args.path, encoding=args.encoding) as f:
        for linum, line in enumerate(f):
            try:
                obj = json.loads(line.strip())
                docs.append(Document.from_mapping(obj))
            except Exception as e:
                message = f'line {linum+1}: {e}'
                raise RuntimeError(message)

    with Executor(max_workers=args.max_workers) as ex:
        results = ex.map(label_sentences, docs)
        for best_rouge, doc in results:
            print(json.dumps(doc.to_dict(), sort_keys=True))
            if args.verbose:
                print(f'ROUGE-1-F: {best_rouge:.2f}', file=sys.stderr)

예제 #5

파일 보기

파일: tokenize_jsonl.py 프로젝트: xhendyagsx/indosum

def main(args):
    objs = []
    with open(args.path, encoding=args.encoding) as f:
        for linum, line in enumerate(f):
            try:
                objs.append(json.loads(line.strip()))
            except Exception as e:
                message = f'line {linum+1}: {e}'
                raise RuntimeError(message)

    nlp = spacy.blank('id')
    with ProcessPoolExecutor(max_workers=args.max_workers) as exc:
        tok_objs = exc.map(partial(tokenize_obj, nlp), objs, chunksize=args.chunk_size)
        docs = [Document.from_mapping(obj) for obj in tok_objs]
        if args.discard_long_summary:
            docs = [doc for doc in docs if not has_long_summary(doc)]
        print('\n'.join(json.dumps(doc.to_dict(), sort_keys=True) for doc in docs))