Python Context.saveAsTextFile Examples

Programming Language: Python

Namespace/Package Name: fast_pyspark_tester

Class/Type: Context

Method/Function: saveAsTextFile

Examples at hotexamples.com: 3

Python Context.saveAsTextFile - 3 examples found. These are the top rated real world Python examples of fast_pyspark_tester.Context.saveAsTextFile extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

Context(26)

collect(10)

count(3)

saveAsTextFile(3)

filter(1)

lookup(1)

map(1)

parallelize(1)

startswith(1)

takeSample(1)

top(1)

Example #1

Show file

File: test_textFile.py Project: svaningelgem/fast_pyspark_tester

def test_s3_textFile_loop():
    random.seed()

    fn = '{}/pysparkling_test_{:d}.txt'.format(S3_TEST_PATH, random.random() * 999999.0)

    rdd = Context().parallelize('Line {0}'.format(n) for n in range(200))
    rdd.saveAsTextFile(fn)
    rdd_check = Context().textFile(fn)

    assert rdd.count() == rdd_check.count() and all(e1 == e2 for e1, e2 in zip(rdd.collect(), rdd_check.collect()))

Example #2

Show file

File: test_textFile.py Project: svaningelgem/fast_pyspark_tester

def test_hdfs_file_exists():
    random.seed()

    fn1 = '{}/pysparkling_test_{:d}.txt'.format(HDFS_TEST_PATH, random.random() * 999999.0)
    fn2 = '{}/pysparkling_test_{:d}.txt'.format(HDFS_TEST_PATH, random.random() * 999999.0)

    rdd = Context().parallelize('Hello World {0}'.format(x) for x in range(10))
    rdd.saveAsTextFile(fn1)

    assert File(fn1).exists() and not File(fn2).exists()

Example #3

Show file

File: test_textFile.py Project: svaningelgem/fast_pyspark_tester

def test_hdfs_textFile_loop():
    random.seed()

    fn = '{}/pysparkling_test_{:d}.txt'.format(HDFS_TEST_PATH, random.random() * 999999.0)
    print('HDFS test file: {0}'.format(fn))

    rdd = Context().parallelize('Hello World {0}'.format(x) for x in range(10))
    rdd.saveAsTextFile(fn)
    read_rdd = Context().textFile(fn)
    print(rdd.collect())
    print(read_rdd.collect())
    assert rdd.count() == read_rdd.count() and all(r1 == r2 for r1, r2 in zip(rdd.collect(), read_rdd.collect()))