Skip to content
/ savutil Public

Utilities to help export survey data from SPSS proprietary SAV to CSV or Triple-S format

License

Notifications You must be signed in to change notification settings

angloc/savutil

Repository files navigation

savutil - converting SPSS .sav file data to open formats

The sav2util utilities convert an SPSS® .sav file to Triple-S XML metadata and data with an intermediary JSON format. Triple-S is imported by a wide range of survey data analysis software, and JSON is a convenient platform for taking SPSS data into the modern "big-data" world.

For those wanting to use data from SAV files without any interest in Triple-S or JSON formats an option exists to convert a SAV file into a CSV file, optionally interpreting the value labels so the file is immediately useful with, for instance, Excel pivot tables.

The sav2json component of savutil is based on the Windows DLL provided by IBM for programmed access to SAV files.

savutil comprises two programs:

  • sav2json - converts a SAV file into an intermediate JSON format
  • json2sss - converts a JSON file created by sav2json into Triple-S data set

Installation on Windows

Download the two executables sav2json.exe and json2sss.exe from this github project into a folder of your own choice, let's say <some-folder>.

To run sav2json your PATH should include the root folder of the SPSS toolkit. This may exist already if you are an SPSS Statistics user, otherwise it is the folder containing the extract of the IO_Module_for_SPSS_Statistics_xx.zip download from IBM (xx varying from time to time, e.g. 23. 23 or later should be fine). The crucial subfolders are win32 and win64. The distributed executables use win32.

The toolkit is available f.o.c. from IBM here - an SPSS licence is not required.

For your convenience the toolkit is also provided as a download on these pages, which the author believes to be in conformity with IBM's license therein (line 729 of the English text). If you use this download read and comply with the terms of the IBM license.

The sav2json program can find the toolkit without any change to your PATH if you extract the toolkit into the folder <some-folder>\spss, i.e. the critical folder is <some-folder>\spss\win32

Both programs must be executed from a command prompt.

Running sav2json

The sav2json command line

<some-folder>\sav2json [switches] <SAV-file>

where is the path to and name of the SAV file to be converted.

Switches taking no value

  • The -c switch if specified forces output of a CSV file
  • The -d switch if specified includes the data in the JSON file as well as the descriptions of the variables. This is mandatory if json2sss is to be run afterwards.
  • The -h switch if specified includes a header line in the CSV file
  • The -i switch if specified causes values in the CSV file if generated to be replaced by their SPSS value labels if available
  • The -p switch if specified causes the JSON output to be "pretty-printed" for readability. By default the JSON is compact.
  • The -t switch if specified causes any descriptive text in the SAV file to be written to a text file
  • The -v switch if specified displays the sav2json version number.

Switches requiring a value

NB: values are terminated by the next space or hyphen. To include a space or hypen in a value, enclose the switch and value in double quotes, e.g. "-tMy title"

  • The -e switch specifies the character encoding to be used in the CSV and/or TXT files if generated, by default cp-1252 (which should be fine for Windows users in almost all locales). The JSON file is always encoded in UTF-8 as this is part of the JSON specification.
  • The -o switch specifies the path and root file name for the output files. By default the output files (.json, .csv, .txt) have the same path and name as the SAV file.

sav2json Example

sav2json -dp survey

Converts the file survey.sav in the current working directory creating the file survey.json, including the data values and formatting the JSON for readability.

Practical considerations

SPSS SAV files contain variables that are either floating point or character strings. Categorical variables are represented by assigning labels to specific values - "value labels".

Semantic information about the variables comes from the output format, and this is often inappropriate because set by default or because the SAV file has been generated by automatic means, for instance by the Dimensions application.

sav2json is intended to produce a file that is immediately useful rather than a literal rendering of the SAV file. To facilitate this sav2json calculates a frequency distribution for each variable, and uses this in combination with the output format to perform "duck typing" of each field.

In this way variables that only have integer values are exported as integers and excessively precise decimal places are avoided.

Since the object is to provide data rather than a report, the multifarious representations of time and date information are all normalised to ISO 6401 format.

Durations (DTIME variables) are rendered as ISO 6401 durations.

The frequency distributions are stored in the JSON file, along with metadata such as variable titles and the value labels. The JSON file thus provides metadata to supplement the column headings in the CSV file.

sav2json can also include the case data as well as metadata in the JSON file, creating a single file with the same information as the SAV file but in a much more accessible form, i.e. no library is required to use it.

The case data are stored in the JSON file in a transposed form, i.e. there is one array for each variable containing the values for each case. To reduce space and time requirements repeated values in consecutive cases are compressed.

The JSON file format is not further documented at this time.

Running json2sss

The json2sss command line

<some-folder>\json2sss [switches] <SAV-file>.

where is the path to and name of a JSON file created by sav2json.

Switches taking no value

  • The -c switch specifies that a CSV data file should be created in Triple-S format. The name of the file will be `_sss.csv`. By default json2sss creates a fixed-length record `_sss.asc`.
  • The -s switch disables "sensible string lengths". This is a default option useful for data sets coming from Quancept ® which may have extremely long string variable lengths. By default each long string variable is reduced to a length no greater than the next power of 2 at least equal to the maximum size found in the file for that variable.
  • The -v switch forces display of the version number of json2sss

Switches taking a value

  • The -e switch specifies the character encoding to be used in the CSV or ASCII text files if generated. By default `cp-1252` (which should be fine for Windows users in almost all locales).
  • The -h switch is used to specify an href attribute for the <record> element. By default sav2sss includes an href which is a relative reference to the .asc file. To exclude the href altogether use
    "-h "
  • The -i switch is used to specify the ident attribute of the `record` element of the XML file. By default this is 'A', i.e. the default is
    -iA
    .
  • The -t switch is used to specify contents of either of the <name> and <title> elements in the XML. Separate the name and title by a semicolon. There is no default name or title. If there is no semicolon the whole value is used as the title. E.g.
    "-tOmnibus201401:January 2014 Omnibus survey"

    Name is 'Omnibus201401' and title is 'January 2014 Omnibus survey'

    "-tJanuary 2014 Omnibus survey"

    No name, and title is 'January 2014 Omnibus survey'

  • The -x switch can be used to specify the value of the 'user' element in the XML file. The value should not contain a semicolon (';') character. Note that switches whose values contain spaces should be enclosed in double quotes in the command line as shown above. The user element does not appear by default.

Triple-S considerations

json2sss exports a Triple-S XML version 2.0 file, though in most cases it will be conformant to the version 1.2 specification. Version 2.0 allows greater freedom to export zero and literal string valued codes, which do occur quite frequently in SAV files.

Metadata

The contents of the <user> element may be controlled by the -x parameter as described above. Otherwise:

  • The <date> element shows the date of the SAV2SSS run
  • The <time> element shows the time of day of the SAV2SSS run
  • The <origin> element contains
    json2sss {version} (Windows) by Computable Functions (http://www.computable-functions.com)

Missing values

A .sav file may declare certain values for a variable to be missing values. Triple-S represents missing values with a blank field, so sav2sss outputs all missing values as blanks (`null` values in JSON). The codes for missing values are not included in the XML file (these codes never appear in the exported data).

Numeric ranges

The SAV file does not provide information about valid value ranges. json2sss uses the information in the frequency distribution for each variable to infer a sensible range and precision.

Anomalous code values

SPSS allows variables with labelled values to be incompletely coded, and to have negative code values.

This is not compatible with the requirements for categorical variables in Triple-S.

Therefore variables with a value list are only exported as categorical, i.e. single, if either:

  • all values in the value list are non-negative integers. Such variables are exported as singles in numeric format. If there are (non-negative) values in the data that do not appear in the value list, json2sss will provide a range element as well as the individual value elements.

  • there are values in the value list not compatible with numeric format but all the values in the data can be found in the value list. Such variables are exported in literal format.

Variables that do meet either of these criteria are converted as Triple-S quantity variables.

CSV output

If the -c switch is used a .CSV file is generated as specified in the Triple-S standard.

Known issues

Truncated Unicode characters

It seems that in some circumstances an SPSS character data field of fixed length will be truncated in the middle of a multi-byte Unicode character.

It is not known whether this is a problem in the SPSSIO DDL, in some programs creating SAV files or elsewhere.

When detected json2sss removes such characters silently.

Release history

Version 0.1.2: September 2015

First version.

Version 0.1.3: October 14, 2015

Maintenance release

Enhancement

  • Caching introduced to speed up sav2json.

Bug fixes

  • Minimum and maximum values correctly calculated in sav2json where field has mixture of integer and fractional values
  • Number of decimal places consistent between minimum and maximum values in range output by json2sss

Version 0.1.4: October 15, 2015

Maintenance release

Bug fix

  • Error fixed in writing out descriptive text information

Version 0.1.5: October 22, 2015

Maintenance release

Enhancement

  • Temporary CSV file created to mitigate memory usage on large files

Using the source (for Python developers)

savutil was developed with Python 2.7.

Testing

To run sav2sss in the Python interpreter run the scripts sav2json.py and json2sss.py

Building

  1. Download the sources into a convenient folder.
  2. Building the Windows executable requires the py2exe library.
  3. Review the script setup.bat for its suitability on your system.
  4. If incorporating the SPSS DLLs, extract the toolkit into a subfolder `spss`.
  5. Execute setup.bat to create `sav2json.exe` and `json2sss.exe` in a subfolder `.\output`. The SPSS files if found will be copied to the subfolder `output\spss`.

Credits

The software is based on the [Python wrapper] for the SPSSIO DLL created and maintained by Albert-Jan Roskam. A slightly modified version is used here as the file savdllwrapper.py. [Python wrapper]:http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files/?in=user-4177640

There is a [newer version of the wrapper] that bundles the DLLs available and savutil may be adapted in due course to use it. [newer version of the wrapper]:https://pypi.python.org/pypi/savReaderWriter/

Disclaimer

The savutil software is provided 'as is' without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of fitness for a purpose, or the warranty of non-infringement.

Without limiting the foregoing, the author makes no warranty that:

  • the software will meet your requirements
  • the software will be uninterrupted, timely, secure or error-free
  • the results that may be obtained from the use of the software will be effective, accurate or reliable
  • the quality of the software will meet your expectations
  • any errors in the software will be corrected.

The savutil software and its documentation:

  • could include technical or other mistakes, inaccuracies or typographical errors
  • may be changed without notice
  • may be out of date, and the author makes no commitment to update these materials.

The author assumes no responsibility for errors or omissions in the savutil software or its documentation.

In no event shall the author be liable to you or any third parties for any special, punitive, incidental, indirect or consequential damages of any kind, or any damages whatsoever, including, without limitation, those resulting from loss of use, data or profits, whether or not the author has been advised of the possibility of such damages, and on any theory of liability, arising out of or in connection with the use of this software.

The use of the savutil software is made at your own discretion and risk and with agreement that you will be solely responsible for any damage to your computer system or loss of data that results from such activities.

No advice or information, whether oral or written, obtained by you from the author or from the documentation shall create any warranty for the software.

About

Utilities to help export survey data from SPSS proprietary SAV to CSV or Triple-S format

Resources

License

Stars

Watchers

Forks

Packages

No packages published