Skip to content

dhomeier/textreader

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yet another text reader for NumPy.

The textreader module has the function readrows, which reads CSV (and similar)
text files into a numpy array.

This is experimental, unreleased software.  Use at your own risk.

To install, you will need cython.  In the top directory, run
$ python setup.py install

So far, readrows() has been tested with Python 2.7.2, using
numpy 1.4.1 and 1.6.1.

The C function strptime is used for parsing dates.  A work-around for this
(perhaps just disabling date parsing as a quick fix) will be needed get
textreader compiled on Windows.

The directory python/examples/ contains, well, examples.

While the primary motivation for this code is to have a faster, more efficient
tool for reading CSV files into NumPy arrays, most of the code is written in
C (in the src/ directory), and can be used independent of Python.  Other than
setup.py in the top directory, all the Python-specific code is in the python/
directory.


Features
--------

* Fields may be delimited with spaces or with a given delimiter character.

* Fields may be quoted with a given quote character.

* The maximum number of rows to read can be specified.  Normally readrows()
  reads the file twice; the first pass just counts the number of rows
  in the file.  This allows the function to allocate an array that is
  exactly the size of the data to be read.  If the number of rows is
  specified by the user, the first pass is skipped, and the memory for
  the array is allocated based on the specified number of rows.

  If fewer rows than specified are found in the file, the array that is
  returned is a view of the originally allocated array.

  If the file has more rows than specified, the file pointer will be
  left at the start of the unread data, so readrows() (or anything else)
  can be called to continue reading the file.  (See examples/read_multi.py
  for an example of reading three arrays from one file.)

* Dates are parsed into datetime64 values (requires numpy version 1.6.1).
  The format of the date is specified with a string using the conventions
  of the C library function strptime():
      http://pubs.opengroup.org/onlinepubs/007904975/functions/strptime.html
  Currently the only datetime64 unit allowed is 'us' (microseconds).

* Embedded newlines are allowed in string fields.

* The decimal point for floating point numbers can be specified (typically
  this is either '.' or ',').

* The character used for exponential notation can specified as either 'E'
  or 'D' (case insensitive).  The 'D' character is used by Fortran when
  writing floating point values with double precision format, e.g.::

        write(100, x, y)
   100  format(1X, D14.9, 1X, D12.5)

* Complex numbers e.g. '1.23-45.6j' can be read.  (The exact set of
  complex strings that should be parsed probably needs some work, though.)
  Currently, there must not be any spaces between the real and imaginary
  part (regardless of the delimiter).  Also, the imaginary part must
  include at least one digit.  E.g. '1-j' is invalid.  Either 'i' or 'j'
  can be used for the imaginary part.


About

Yet another text file reader for numpy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 71.4%
  • Python 28.6%