Python KenyaParser.convert_html_to_dataの例

プログラミング言語: Python

名前空間/パッケージ名: pombola.hansard.kenya_parser

クラス/型: KenyaParser

メソッド/関数: convert_html_to_data

hotexamples.comのコード掲載数: 5

Python KenyaParser.convert_html_to_data - 5件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのpombola.hansard.kenya_parser.KenyaParser.convert_html_to_dataの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

convert_html_to_data(3)

convert_pdf_to_html(3)

create_entries_from_data_and_source(3)

parse_time_string(2)

コード例 #1

ファイルを表示

    def handle_noargs(self, **options):

        for source in Source.objects.all().requires_processing():

            if int(options.get('verbosity')) >= 2:
                print "Looking at %s" % source

            source.last_processing_attempt = datetime.datetime.now()
            source.save()

            pdf = source.file()
            html = KenyaParser.convert_pdf_to_html(pdf)
            data = KenyaParser.convert_html_to_data(html)
            KenyaParser.create_entries_from_data_and_source(data, source)

コード例 #2

ファイルを表示

ファイル: hansard_process_sources.py プロジェクト: Sinar/pombola

    def handle_noargs(self, **options):

        for source in Source.objects.all().requires_processing():
            
            if int(options.get('verbosity')) >= 2:
                print "Looking at %s" % source

            source.last_processing_attempt = datetime.datetime.now()
            source.save()

            pdf = source.file()
            html = KenyaParser.convert_pdf_to_html( pdf )
            data = KenyaParser.convert_html_to_data( html )
            KenyaParser.create_entries_from_data_and_source( data, source )

コード例 #3

ファイルを表示

    def handle_noargs(self, **options):

        verbose = int(options.get('verbosity')) >= 2

        for source in Source.objects.all().requires_processing():

            if verbose:
                message = "{0}: Looking at {1}"
                print message.format(source.list_page, source)

            source.last_processing_attempt = datetime.datetime.now()
            source.save()

            pdf = source.file()
            try:
                html = KenyaParser.convert_pdf_to_html( pdf )
                data = KenyaParser.convert_html_to_data( html )
                KenyaParser.create_entries_from_data_and_source( data, source )
            except Exception as e:
                print "There was an exception when parsing {0}".format(pdf)
                raise

コード例 #4

ファイルを表示

ファイル: test_kenya_parser.py プロジェクト: Code4SA/pombola

    def test_converting_html_to_data(self):
        """test the convert_pdf_to_data function"""

        html_file = open( self.sample_html, 'r')
        html = html_file.read()

        data = KenyaParser.convert_html_to_data( html=html )

        # Whilst developing the code this proved useful
        # out = open( self.expected_data_json, 'w')
        # json_string = json.dumps( data, sort_keys=True, indent=4 )
        # json_string = re.sub(r" +\n", "\n", json_string) # trim trailing whitespace
        # json_string += "\n"
        # out.write( json_string )
        # out.close()

        expected = json.loads( open( self.expected_data_json, 'r'  ).read() )

        self.assertEqual( data['transcript'], expected['transcript'] )

        # FIXME
        self.assertEqual( data['meta'], expected['meta'] )

コード例 #5

ファイルを表示

ファイル: test_kenya_parser.py プロジェクト: Ufadhili/Ajibika

    def test_converting_html_to_data(self):
        """test the convert_pdf_to_data function"""

        html_file = open(self.sample_html, 'r')
        html = html_file.read()

        data = KenyaParser.convert_html_to_data(html=html)

        # Whilst developing the code this proved useful
        # out = open( self.expected_data_json, 'w')
        # json_string = json.dumps( data, sort_keys=True, indent=4 )
        # json_string = re.sub(r" +\n", "\n", json_string) # trim trailing whitespace
        # json_string += "\n"
        # out.write( json_string )
        # out.close()

        expected = json.loads(open(self.expected_data_json, 'r').read())

        self.assertEqual(data['transcript'], expected['transcript'])

        # FIXME
        self.assertEqual(data['meta'], expected['meta'])