Python get_foia_url 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: utils

메소드/함수: get_foia_url

hotexamples.com에서의 예제들: 2

Python get_foia_url - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 utils.get_foia_url에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: procure.py 프로젝트: rlucioni/clinton

def get_records(count=1):
    """Retrieve the requested number of records.

    Keyword Arguments:
        count (int): The number of records to retrieve.

    Returns:
        dict: The retrieved records.
    """
    url = get_foia_url('Search/SubmitSimpleQuery')
    params = {
        'collectionMatch': 'Clinton_Email',
        'searchText': '*',
        'beginDate': 'false',
        'endDate': 'false',
        'postedBeginDate': 'false',
        'postedEndDate': 'false',
        'caseNumber': 'false',
        'page': 1,
        'start': 0,
        'limit': count
    }

    # SSL certificate verification fails. To get around this,
    # ignore verification of the SSL certificate.
    response = requests.get(url, params=params, verify=False)

    text = clean_timestamps(response.text)
    records = json.loads(text)

    return records

예제 #2

파일 보기

파일: tasks.py 프로젝트: rlucioni/clinton

def download(email):
    """Process the provided dictionary of email metadata.

    Download the corresponding PDF and extract plain text from it.

    Arguments:
        email (dict): A dictionary of email metadata. For example,
            {
                'from': 'H',
                'pdfLink': 'DOCUMENTS/HRCEmail_August_Web/IPS-0128/DOC_0C05775316/C05775316.pdf',
                'docDate': 1277956800000,
                'documentClass': 'Clinton_Email_August_Release',
                'messageNumber': '',
                'to': 'preines',
                'caseNumber': 'F-2014-20439',
                'subject': 'TEST',
                'originalLink': None,
                'postedDate': 1440993600000
            }

    Returns:
        dict: Containing the provided metadata, transformed if necessary,
            in addition to text from the downloaded PDF.
    """
    if email['from'] not in INTERESTING_SENDERS:
        return

    # TODO: These timestamps only give dates, not times. However, the emails
    # themselves contain dates and times. Extract these.
    email['sent'] = datetime_from_timestamp(email.pop('docDate'))
    email['pdf_posted'] = datetime_from_timestamp(email.pop('postedDate'))

    # TODO: Don't download the email if it's present on disk. Return None
    # so that a duplicate record isn't written to the database.
    url = get_foia_url(email.pop('pdfLink'))
    email['pdf_link'] = url

    # SSL certificate verification fails. To get around this,
    # ignore verification of the SSL certificate.
    response = requests.get(url, verify=False)
    pdf = response.content

    filename = get_filename(url)
    email['document_id'] = filename

    pdf_path, text = save_and_extract(filename, pdf)
    email['pdf_path'] = pdf_path
    
    body, is_redacted = get_body(text)
    email['body'] = body
    email['is_redacted'] = is_redacted

    return email