Python WithKeys 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: apache_beam

클래스/타입: WithKeys

hotexamples.com에서의 예제들: 4

Python WithKeys - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 apache_beam.WithKeys에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

WithKeys(4)

자주 사용되는 메소드들

WithKeys (4)

예제 #1

파일 보기

 def expand(self, pcoll):
     return (
         pcoll
         # Bind window info to each element using element timestamp (or publish time).
         | "Window into fixed intervals" >> WindowInto(
             FixedWindows(self.window_size))
         | "Add timestamp to windowed elements" >> ParDo(AddTimestamp())
         # Assign a random key to each windowed element based on the number of shards.
         | "Add key" >>
         WithKeys(lambda _: random.randint(0, self.num_shards - 1))
         # Group windowed elements by key. All the elements in the same window must fit
         # memory for this. If not, you need to use `beam.util.BatchElements`.
         | "Group by key" >> GroupByKey())

예제 #2

파일 보기

def main():
    options = PipelineOptions()
    options.view_as(SetupOptions).save_main_session = True

    p = Pipeline(options=options)
    (p
     | Create(list(range(NUM_SHARDS)))
     | FlatMap(lambda _:
               (bytes(ELEMENT_BYTES) for _ in range(NUM_ELEMENTS_PER_SHARD)))
     | WithKeys('')
     | ParDo(BigBagDoFn()))

    p.run()

예제 #3

파일 보기

def main():
  options = PipelineOptions()
  options.view_as(SetupOptions).save_main_session = True

  BATCH_SIZE = 1000000
  BUFFERING_SECS = 600

  p = Pipeline(options=options)
  (p
   | Create(range(100), reshuffle=True)
   | ParDo(make_large_elements)  # 128 KiB
   | WithKeys('')
   | GroupIntoBatchesWithMultiBags(BATCH_SIZE, BUFFERING_SECS)  # Big batch size with 1 minute trigger
   | Map(lambda kv: logging.info('key: %s, value count: %s',
                                 kv[0], len(kv[1]))))

  run = p.run()
  run.wait_until_finish()

예제 #4

파일 보기

def main():
    options = PipelineOptions()
    options.view_as(SetupOptions).save_main_session = True

    BATCH_SIZE = 1000000
    BUFFERING_SECS = 600

    p = Pipeline(options=options)
    (p
     | Create(range(100), reshuffle=True)
     | ParDo(make_large_elements)  # 128 KiB
     | WithKeys('')
     | WindowInto(GlobalWindows(),
                  trigger=Repeatedly(
                      AfterAny(AfterCount(BATCH_SIZE),
                               AfterProcessingTime(BUFFERING_SECS))),
                  accumulation_mode=AccumulationMode.DISCARDING)
     | GroupByKey()
     | Map(lambda kv: logging.info('key: %s, value count: %s', kv[0], len(kv[1]
                                                                          ))))

    run = p.run()
    run.wait_until_finish()