Python HarvestObject.HarvestObject 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: ckanext.harvestodm.model

클래스/타입: HarvestObject

메소드/함수: HarvestObject

hotexamples.com에서의 예제들: 3

Python HarvestObject.HarvestObject - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 ckanext.harvestodm.model.HarvestObject.HarvestObject에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

HarvestObject(3)

save(3)

get(2)

add(1)

자주 사용되는 메소드들

HarvestObject (3)

save (3)

get (2)

add (1)

예제 #1

파일 보기

    def gather_stage(self, harvest_job):
        log.debug('In SocrataHarvester 2 gather_stage (%s)' %
                  harvest_job.source.url)
        get_all_packages = True

        dcatUrl = "%s/api/dcat.rdf" % harvest_job.source.url.rstrip('/')
        log.debug(dcatUrl)

        adaptorInstance = socrataAdaptor()
        package_ids = adaptorInstance.listDatasetIds(dcatUrl)
        print('****')
        print(len(package_ids))
        print(package_ids)

        try:
            object_ids = []
            if len(package_ids):
                for package_id in package_ids:
                    if "http" not in package_id:
                        # Create a new HarvestObject for this identifier
                        obj = HarvestObject(guid=package_id, job=harvest_job)
                        obj.save()
                        object_ids.append(obj.id)

                return object_ids

            else:
                self._save_gather_error('No packages received for URL: %s' %
                                        url, harvest_job)
                return None
        except Exception, e:
            self._save_gather_error('%r' % e.message, harvest_job)

예제 #2

파일 보기

파일: htmlharvester.py 프로젝트: rossjones/ckanext-htmlharvest

    def gather_stage(self, harvest_job):

        print('Html Harvest Gather Stage')
        db2 = client.odm
        collection = db2.html_jobs
        backupi = 0
        ## Get source URL
        source_url = harvest_job.source.url
        ## mongoDb connection
        document = collection.find_one({"cat_url": source_url})
        id1 = document['_id']
        if 'btn_identifier' in document.keys():
            if document['btn_identifier'] != None and document[
                    'btn_identifier'] != '':
                cat_url = document['cat_url']
                dataset_identifier = document['identifier']
                btn_identifier = document['btn_identifier']
                action_type = document['action_type']
                try:
                    sleep_time = document['sleep_time']
                except:
                    sleep_time = 3
                package_ids = javascript_case.ParseJavascriptPages(
                    cat_url, dataset_identifier, btn_identifier, action_type,
                    sleep_time)
                print(package_ids)
            else:
                package_ids = harvester_final.read_data(id1, backupi)
        else:
            package_ids = harvester_final.read_data(id1, backupi)
        print(package_ids)
        #print(len(package_ids))
        #package_ids=[]
        #package_ids.append('http://data.belgium.be/dataset/mortality-tables-gender')
        #package_ids.append('test')
        try:
            object_ids = []
            if len(package_ids):
                for package_id in package_ids:
                    # Create a new HarvestObject for this identifier
                    obj = HarvestObject(guid=package_id, job=harvest_job)
                    obj.save()
                    object_ids.append(obj.id)

                return object_ids

            else:
                self._save_gather_error(
                    'No packages received for URL: %s' % source_url,
                    harvest_job)
                return None
        except Exception, e:
            self._save_gather_error('%r' % e.message, harvest_job)

예제 #3

파일 보기

파일: base.py 프로젝트: rossjones/ckanext-htmlharvest

    def _create_harvest_objects(self, remote_ids, harvest_job):
        '''
        Given a list of remote ids and a Harvest Job, create as many Harvest Objects and
        return a list of their ids to be passed to the fetch stage.

        TODO: Not sure it is worth keeping this function
        '''
        try:
            object_ids = []
            if len(remote_ids):
                for remote_id in remote_ids:
                    # Create a new HarvestObject for this identifier
                    obj = HarvestObject(guid = remote_id, job = harvest_job)
                    obj.save()
                    object_ids.append(obj.id)
                return object_ids
            else:
               self._save_gather_error('No remote datasets could be identified', harvest_job)
        except Exception, e:
            self._save_gather_error('%r' % e.message, harvest_job)