Python PhillyLegistarSiteWrapper.extract_pdf_text Exemples

Langage de programmation: Python

Espace de nommage/Pack: phillyleg.management.scraper_wrappers

Méthode/Fonction: extract_pdf_text

Exemples au hotexamples.com: 6

Python PhillyLegistarSiteWrapper.extract_pdf_text - 6 exemples trouvés. Ce sont les exemples réels les mieux notés de phillyleg.management.scraper_wrappers.PhillyLegistarSiteWrapper.extract_pdf_text extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

PhillyLegistarSiteWrapper(12)

extract_pdf_text(3)

urlopen(3)

get_minutes_date(2)

check_for_new_content(1)

collect_minutes(1)

convert_date(1)

extract_xml_text(1)

get_minutes_doc(1)

is_error_page(1)

scrape_legis_file(1)

Méthodes fréquemment utilisées

PhillyLegistarSiteWrapper (12)

extract_pdf_text (3)

urlopen (3)

get_minutes_date (2)

check_for_new_content (1)

collect_minutes (1)

convert_date (1)

extract_xml_text (1)

get_minutes_doc (1)

is_error_page (1)

Méthodes fréquemment utilisées

scrape_legis_file (1)

Associées

remove_authorization_group_from_role

rmrootfs

load_settings

create_app

multi_rpc

ast_args_to_str

get_logger_adapter

TokenService

cloneClass

getBusStop

Related in langs

eme_new_event (PHP)

Need (PHP)

CityId (C#)

TokenStreamRewriter (C#)

IDirectSoundCapture_QueryInterface (C++)

marshall_u_int32 (C++)

IsCommitType (Go)

AttribLocation (Go)

SimpleDependecyInjector (Java)

D3.GameBalance.GBHandle.Handle (Java)

Exemple #1

0

Afficher le fichier

Fichier : management_tests.py Projet : phxdata/mesa-councilmatic

def test_DealsWith404PdfAddressesCorrectly(self): # I don't know why they'd be deleting these files, but when they do (and # they do) we have to handle it. wrapper = PhillyLegistarSiteWrapper(root_url='') expected_text = '' attachment_pdf = 'http://legislation.phila.gov/attachments/115954.pdf' attachment_text = wrapper.extract_pdf_text(attachment_pdf) self.assertEqual(attachment_text, expected_text)

Exemple #2

0

Afficher le fichier

Fichier : management_tests.py Projet : phxdata/mesa-councilmatic

def test_ResolutionPdfParsesCorrectly(self): wrapper = PhillyLegistarSiteWrapper(root_url='') expected_text = """\n\n\n\n\n\n\n\n\nCity of Philadelphia \n \n \n \n \nCity of Philadelphia \n- 1 - \n \n \n \nCity Council \nChief Clerk's Office \n402 City Hall \nPhiladelphia, PA 19107 \nRESOLUTION NO. 110406 \n \n \nIntroduced May 12, 2011 \n \n \nCouncilmember DiCicco \n \n \nReferred to the \nCommittee of the Whole \n \n \nRESOLUTION \n \nAppointing David Campoli to the Board of Directors of the Center City District. \n \n \n \nRESOLVED, BY THE COUNCIL OF THE CITY OF PHILADELPHIA, \nTHAT David Campoli is hereby appointed as a member of the Board of Directors of the \nCenter City District, to serve in a term ending December 31, 2012. \n \n \n\n\n\nCity of Philadelphia \n \nRESOLUTION NO. 110406 continued \n \n \n \n \n \nCity of Philadelphia \n- 2 - \n \n \n \n \n\n""" # Raw stream resolution_pdf = open(os.path.join(self.pdfs_dir, '11530.pdf')).read() resolution_text = wrapper.extract_pdf_text(resolution_pdf) self.assertEqual(resolution_text, expected_text) # File URL resolution_pdf = 'file://' + os.path.join(self.pdfs_dir, '11530.pdf') resolution_text = wrapper.extract_pdf_text(resolution_pdf) self.assertEqual(resolution_text, expected_text) # Web URL -- This will only work if you're online. resolution_pdf = 'http://legislation.phila.gov/attachments/11530.pdf' resolution_text = wrapper.extract_pdf_text(resolution_pdf) self.assertEqual(resolution_text, expected_text)

Exemple #3

0

Afficher le fichier

Fichier : management_tests.py Projet : citizennerd/councilmatic

def test_DealsWith404PdfAddressesCorrectly(self): # I don't know why they'd be deleting these files, but when they do (and # they do) we have to handle it. wrapper = PhillyLegistarSiteWrapper() expected_text = '' attachment_pdf = 'http://legislation.phila.gov/attachments/115954.pdf' attachment_text = wrapper.extract_pdf_text(attachment_pdf) self.assertEqual(attachment_text, expected_text)

Exemple #4

0

Afficher le fichier

Fichier : management_tests.py Projet : citizennerd/councilmatic

def test_ResolutionPdfParsesCorrectly(self): wrapper = PhillyLegistarSiteWrapper() expected_text = """\n\n\n\n\n\n\n\n\nCity of Philadelphia \n \n \n \n \nCity of Philadelphia \n- 1 - \n \n \n \nCity Council \nChief Clerk's Office \n402 City Hall \nPhiladelphia, PA 19107 \nRESOLUTION NO. 110406 \n \n \nIntroduced May 12, 2011 \n \n \nCouncilmember DiCicco \n \n \nReferred to the \nCommittee of the Whole \n \n \nRESOLUTION \n \nAppointing David Campoli to the Board of Directors of the Center City District. \n \n \n \nRESOLVED, BY THE COUNCIL OF THE CITY OF PHILADELPHIA, \nTHAT David Campoli is hereby appointed as a member of the Board of Directors of the \nCenter City District, to serve in a term ending December 31, 2012. \n \n \n\n\n\nCity of Philadelphia \n \nRESOLUTION NO. 110406 continued \n \n \n \n \n \nCity of Philadelphia \n- 2 - \n \n \n \n \n\n""" # Raw stream resolution_pdf = open(os.path.join(self.pdfs_dir, '11530.pdf')).read() resolution_text = wrapper.extract_pdf_text(resolution_pdf) self.assertEqual(resolution_text, expected_text) # File URL resolution_pdf = 'file://' + os.path.join(self.pdfs_dir, '11530.pdf') resolution_text = wrapper.extract_pdf_text(resolution_pdf) self.assertEqual(resolution_text, expected_text) # Web URL -- This will only work if you're online. resolution_pdf = 'http://legislation.phila.gov/attachments/11530.pdf' resolution_text = wrapper.extract_pdf_text(resolution_pdf) self.assertEqual(resolution_text, expected_text)

Exemple #5

0

Afficher le fichier

Fichier : management_tests.py Projet : citizennerd/councilmatic

def test_MinutesDocumentConstructedCorrectly(self): wrapper = PhillyLegistarSiteWrapper() wrapper.get_minutes_date = mock.Mock(return_value=dt.date(2083, 12, 6)) wrapper.extract_pdf_text = mock.Mock(return_value='This is the text') expected_doc = {'url': 'http://www.example.com/doc.pdf', 'fulltext': 'This is the text', 'date_taken': dt.date(2083, 12, 6)} minutes_doc = wrapper.get_minutes_doc('http://www.example.com/doc.pdf') self.assertEqual(minutes_doc, expected_doc)

Exemple #6

0

Afficher le fichier

Fichier : management_tests.py Projet : phxdata/mesa-councilmatic

def test_MinutesDocumentConstructedCorrectly(self): wrapper = PhillyLegistarSiteWrapper(root_url='') wrapper.get_minutes_date = mock.Mock(return_value=dt.date(2083, 12, 6)) wrapper.extract_pdf_text = mock.Mock(return_value='This is the text') expected_doc = { 'url': 'http://www.example.com/doc.pdf', 'fulltext': 'This is the text', 'date_taken': dt.date(2083, 12, 6) } minutes_doc = wrapper.get_minutes_doc('http://www.example.com/doc.pdf') self.assertEqual(minutes_doc, expected_doc)