Skip to content

chuanwuliu/data-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fixed width file data parser and generator

Data parser to parse the fixed width file. The codebase includes three major parts:

  1. dataparser.py defines the main parser class
  2. generator.py defines the methods that are used to generate the example fixed width files
  3. testcases.py includes all the test cases. Test data are included in the directory tests

Run the Code

Prerequisites:

* Python 3.6
* Git

Download the repo

# Download the repo
git clone https://github.com/chuanwuliu/data-parser.git

Parse a fixed width file

  • The main data parser function is defined in dataparser.py. To convert a fixed width input_file and save the result to output_file:
    python dataparser.py input_file output_file
    For example
    python dataparser.py tests/test_input1.txt tests/_temp_output2.csv
  • The default delimiter is comma. You can customised the delimiter using the -d argument. For example, parsing with @
    python dataparser.py tests/test_input1.txt tests/_temp_output2.csv --d @
  • More details about the usage
 usage: dataparser.py [-h] [-d DELIMITER] [-s SPEC_FILE] input_file output_file
 
 positional arguments:
   input_file    Path to the input (fixed width) file
   output_file   Path to save the output
 
 optional arguments:
   -h, --help    show this help message and exit
   -d DELIMITER  Delimiter for parsing the file
   -s SPEC_FILE  Path to specification (json) file

Test Cases:

Run the test cases

python testcases.py

Following cases have been tested:

  • Parse input file with fully filled up fields
  • Parse input file with left aligned fields and blank fields
  • Parse input file with right aligned fields and blank fields
  • Parse file with all blank fields

In each case, the sample input is parsed and its sample output is compared with a manually parsed output.

Currently, fields in the fixed width file only include letters, digits and pure whitespace character. Fields with more complicated whitespaces such as \t and \r have not been considered and tested.

A helper function has been built for generating some example files

python generator.py

Contact:

Charles Liu: dr.liuchuanwu@gmail.com

About

Code Exercise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages