forked from WarrenWeckesser/textreader
dhomeier/textreader
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Yet another text reader for NumPy. The textreader module has the function readrows, which reads CSV (and similar) text files into a numpy array. This is experimental, unreleased software. Use at your own risk. To install, you will need cython. In the top directory, run $ python setup.py install So far, readrows() has been tested with Python 2.7.2, using numpy 1.4.1 and 1.6.1. The C function strptime is used for parsing dates. A work-around for this (perhaps just disabling date parsing as a quick fix) will be needed get textreader compiled on Windows. The directory python/examples/ contains, well, examples. While the primary motivation for this code is to have a faster, more efficient tool for reading CSV files into NumPy arrays, most of the code is written in C (in the src/ directory), and can be used independent of Python. Other than setup.py in the top directory, all the Python-specific code is in the python/ directory. Features -------- * Fields may be delimited with spaces or with a given delimiter character. * Fields may be quoted with a given quote character. * The maximum number of rows to read can be specified. Normally readrows() reads the file twice; the first pass just counts the number of rows in the file. This allows the function to allocate an array that is exactly the size of the data to be read. If the number of rows is specified by the user, the first pass is skipped, and the memory for the array is allocated based on the specified number of rows. If fewer rows than specified are found in the file, the array that is returned is a view of the originally allocated array. If the file has more rows than specified, the file pointer will be left at the start of the unread data, so readrows() (or anything else) can be called to continue reading the file. (See examples/read_multi.py for an example of reading three arrays from one file.) * Dates are parsed into datetime64 values (requires numpy version 1.6.1). The format of the date is specified with a string using the conventions of the C library function strptime(): http://pubs.opengroup.org/onlinepubs/007904975/functions/strptime.html Currently the only datetime64 unit allowed is 'us' (microseconds). * Embedded newlines are allowed in string fields. * The decimal point for floating point numbers can be specified (typically this is either '.' or ','). * The character used for exponential notation can specified as either 'E' or 'D' (case insensitive). The 'D' character is used by Fortran when writing floating point values with double precision format, e.g.:: write(100, x, y) 100 format(1X, D14.9, 1X, D12.5) * Complex numbers e.g. '1.23-45.6j' can be read. (The exact set of complex strings that should be parsed probably needs some work, though.) Currently, there must not be any spaces between the real and imaginary part (regardless of the delimiter). Also, the imaginary part must include at least one digit. E.g. '1-j' is invalid. Either 'i' or 'j' can be used for the imaginary part.
About
Yet another text file reader for numpy.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- C 71.4%
- Python 28.6%