Python RandomShuffler.RandomShuffler 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: rikai.parquet.shuffler

클래스/타입: RandomShuffler

메소드/함수: RandomShuffler

hotexamples.com에서의 예제들: 8

Python RandomShuffler.RandomShuffler - 8개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 rikai.parquet.shuffler.RandomShuffler.RandomShuffler에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

RandomShuffler(8)

append(2)

full(2)

pop(2)

자주 사용되는 메소드들

RandomShuffler (8)

append (2)

full (2)

pop (2)

예제 #1

파일 보기

파일: test_shuffler.py 프로젝트: AdaZhou/rikai

 def test_randomness(self):
     shuffler = RandomShuffler(16)
     expected = list(range(100))
     actual = self.shuffle_numbers(shuffler, expected)
     self.assertEqual(100, len(actual))
     self.assertNotEqual(expected, actual)
     self.assertEqual(expected, sorted(actual))

예제 #2

파일 보기

def test_randomness():
    shuffler = RandomShuffler(16)
    expected = list(range(100))
    actual = shuffle_numbers(shuffler, expected)
    assert len(actual) == 100
    assert expected != actual
    assert expected == sorted(actual)

예제 #3

파일 보기

 def __iter__(self):
     shuffler = RandomShuffler(
         self.shuffler_capacity if self.shuffle else 1, self.seed)
     group_count = 0
     for filepath in self.files:
         fs, path = FileSystem.from_uri(filepath)
         with fs.open_input_file(path) as fobj:
             parquet = pg.ParquetFile(fobj)
             for group_idx in range(parquet.num_row_groups):
                 # A simple form of row-group level bucketing without memory overhead.
                 # Pros:
                 #  - It requires zero communication to initialize the distributed policy
                 #  - It uses little memory and no startup overhead, i.e. collecting row groups.
                 # Cons:
                 #   The drawback would be if the world size is much larger than
                 #   the average number of row groups. As a result, many of the
                 #   file open operations would be wasted.
                 group_count += 1
                 if group_count % self.world_size != self.rank:
                     continue
                 row_group = parquet.read_row_group(group_idx,
                                                    columns=self.columns)
                 for batch in row_group.to_batches():  # type: RecordBatch
                     # TODO: read batches not using pandas
                     for _, row in batch.to_pandas().iterrows():
                         shuffler.append(row)
                         # Maintain the shuffler buffer around its capacity.
                         while shuffler.full():
                             yield self._convert(shuffler.pop().to_dict(),
                                                 self.spark_row_metadata)
     while shuffler:
         yield self._convert(shuffler.pop().to_dict(),
                             self.spark_row_metadata)

예제 #4

파일 보기

파일: test_shuffler.py 프로젝트: AdaZhou/rikai

 def test_randomness_with_large_capacity(self):
     """Test the case that the capacity is larger than total number of elements."""
     shuffler = RandomShuffler(128)
     expected = list(range(100))
     actual = self.shuffle_numbers(shuffler, expected)
     self.assertEqual(100, len(actual))
     self.assertNotEqual(expected, actual)
     self.assertEqual(expected, sorted(actual))

예제 #5

파일 보기

def test_fifo_with_single_item():
    shuffler = RandomShuffler(capacity=1)
    shuffler.append(1)
    assert shuffler
    assert shuffler.full()
    assert len(shuffler) == 1
    assert shuffler.pop() == 1

    assert not shuffler.full()

예제 #6

파일 보기

def test_randomness_with_large_capacity():
    """Test the case that the capacity is larger than total number
    of elements.
    """
    shuffler = RandomShuffler(128)
    expected = list(range(100))
    actual = shuffle_numbers(shuffler, expected)
    assert len(actual) == 100
    assert expected != actual
    assert expected == sorted(actual)

예제 #7

파일 보기

파일: test_shuffler.py 프로젝트: AdaZhou/rikai

 def test_fifo(self):
     shuffler = RandomShuffler(capacity=1)
     returned = self.shuffle_numbers(shuffler, range(100))
     self.assertEqual(list(range(100)), returned)

예제 #8

파일 보기

def test_fifo():
    shuffler = RandomShuffler(capacity=1)
    returned = shuffle_numbers(shuffler, range(100))
    assert len(returned) == 100