Overview

This package gives access to the data in the PSI archiving systems. It can be used to download channel data given a specific time range. For historic reasonst there are right now different modules for the different versions of APIs these archiving systems expose. In future the those modules will be merged and old versions will be removed.

Short overview about some modules (see below for details):

Module data_api.client returns data as Pandas data frame. This is the current way to access the DataBuffer. Works with the current databuffer server at https://data-api.psi.ch but has problems with duplicate timestamps, stray NaN values and inefficient transfers.

Module data_api3.h5 saves data as HDF5. This is the current way to access the ImageBuffer. Only available with imagebuffer and a pre-release service for databuffer within the machine network. This will become the recommended usage also for databuffer.

Installation

Install via Anaconda/Miniconda:

conda config --prepend channels paulscherrerinstitute
conda install data_api

Usage from commandline with current https://data-api.psi.ch

data_api save --filename output.h5 --from_time 2020-10-08T19:30:00Z --to_time 2020-10-08T19:31:00Z --channels SARES11-LSCP10-FNS:CH0:VAL_GET,SARES11-LSCP10-FNS:CH3:VAL_GET

Usage from commandline with /api/1 service

This newer service is currently in testing.

api3 --baseurl https://data-api.psi.ch/api/1 --default-backend sf-databuffer save output.h5 2020-10-08T19:30:00.123Z 2020-10-08T19:33:00.789Z SINLH01-DBAM010:EOM1_T1

Usage as library with /api/1 service

sf-databuffer

import data_api3.h5
query = {
  "channels": ["SINLH01-DBAM010:EOM1_T1"],
  "range": {
    "startDate": "2023-02-03T03:09:00Z",
    "endDate": "2023-02-03T03:09:02Z",
  },
}
data_api3.h5.request(query, baseurl="https://data-api.psi.ch/api/1", filename="output.h5", default_backend="sf-databuffer")

sf-imagebuffer

import data_api3.h5
query = {
  "channels": ["SOME-CAMERA:FPICTURE"],
  "range": {
    "startDate": "2020-10-08T19:30:00Z",
    "endDate": "2020-10-08T19:31:00Z",
  },
}
data_api3.h5.request(query, baseurl="http://sf-daq-5.psi.ch:8380/api/1", filename="output.h5", default_backend="sf-imagebuffer")

Usage as library with default service

import data_api as api

Search for channels:

channels = api.search("SINSB02-RIQM-DCP10:FOR-PHASE")

The channels variable will hold something like this:

[{'backend': 'sf-databuffer',
  'channels': ['SINSB02-RIQM-DCP10:FOR-PHASE',
   'SINSB02-RIQM-DCP10:FOR-PHASE-AVG']},
 {'backend': 'sf-archiverappliance',
  'channels': ['SINSB02-RIQM-DCP10:FOR-PHASE-AVG-P2P',
   'SINSB02-RIQM-DCP10:FOR-PHASE-JIT-P2P',
   'SINSB02-RIQM-DCP10:FOR-PHASE-STDEV']}]

Get data by global timestamp:

import datetime
now = datetime.datetime.now()
end = now-datetime.timedelta(minutes=1)
start = end-datetime.timedelta(seconds=10)
data = api.get_data(channels=['SINSB02-RIQM-DCP10:FOR-PHASE'], start=start, end=end)

In the case to query a specific backend specify the base_url option in the get_data call. For example for hipa use api.get_data(... base_url='https://data-api.psi.ch/hipa')

Get data by pulseId:

import datetime
start_pulse_id = 123456
stop_pulse_id = 234567
data = api.get_data(channels=['SINSB02-RIQM-DCP10:FOR-PHASE'], start=start_pulse_id, end=stop_pulse_id, range_type="pulseId")

Get approximate pulseId by global timestamp:

Warning: This will not give you an exact pulse_id, just the closest pulse_id in the data buffer from the global timestamp you requested. The pulse id might be skewed by maximum 30 seconds.

from datetime import datetime
global_timestamp = datetime.now()

# If you do not pass a global_timestamp, the current time will be used.
pulse_id = api.get_pulse_id_from_timestamp(global_timestamp)

Show head of datatable:

data.head()

Find all data corresponding to given index:

data.loc["1468476300.047550981"]

Plot data:

import matplotlib.pyplot as plt
data.plot.scatter("SINSB02-RIQM-DCP10:FOR-PHASE-AVG", "SINSB02-RKLY-DCP10:FOR-PHASE-AVG")
plt.show()

import matplotlib.pyplot as plt
data[['SINSB02-RIQM-DCP10:FOR-PHASE-AVG', ]].plot.box()
plt.show()

Plot waveforms:

plt.plot(data['SINSB02-RIQM-DCP10:FOR-PHASE']['1468476300.237551000'])
plt.show()

Find where you have data:

data[data['SINSB02-RIQM-DCP10:FOR-PHASE'].notnull()]

Save data:

# to csv
data.to_csv("test.csv")

# to hdf5
data.to_hdf("test.h5", "/dataset")

Use Server-Side Aggregation

To minimize data transfer requirements, data can be requested in an aggregated way from the API. The server than takes care of aggregating the values and only send the aggregated values to the client.

import data_api as api
import datetime
now = datetime.datetime.now()
end = now-datetime.timedelta(minutes=1)
start = end-datetime.timedelta(seconds=10)



aggregation = api.Aggregation(aggregation_type="value", aggregations=["min", "mean", "max"], extrema=None, nr_of_bins=None, duration_per_bin=None, pulses_per_bin=None) # Just set the parameters you explicitly want to set - this example is showing the defaults - for more details about the parameters and their effect see https://git.psi.ch/sf_daq/ch.psi.daq.queryrest#data-aggregation

data = data_api.get_data(channel_list, start=start, end=end, aggregation=aggregation)

For more details on the aggregation values and their effects see: https://git.psi.ch/sf_daq/ch.psi.daq.queryrest#data-aggregation

Query Specific Backend

By default the data API first queries the DataBuffer for the channel, if the channel is not found there, it then does a query to the Epics Archiver.

If you want to explicitly specify which backend/system the channel should be queried from you can prepend the channel name with either sf-databuffer/ or sf-archiverappliance/

"sf-databuffer/CHAN1"
# or
"sf-archiverappliance/CHAN1"

Query For PulseId Global Timestamp Mapping

To find the correspondig global timestamp of a given pulseid this method can be used:

import data_api as api

api.get_global_date(pulseid)

# Query for multiple pulseids mappings
api.get_global_date([pulseid1, pulseid2])

The method accepts a single or multiple pulseids and returns a list of global dates for the specified pulseids. By default the method uses the beam ok channel (SIN-CVME-TIFGUN-EVR0:BUNCH-1-OK) to do the mapping. If the mapping cannot be done the method raises an ValueException. In that case a different mapping channel via the functions optional parameter mapping_channel can be specified

Command Line Interface

The packages functionality is also provided by a command line tool. On the command line data can be retrieved as follow:

$ data_api -h
usage: data_api [-h] [--regex REGEX] [--from_time FROM_TIME]
                [--to_time TO_TIME] [--from_pulse FROM_PULSE]
                [--to_pulse TO_PULSE] [--channels CHANNELS]
                [--filename FILENAME] [--overwrite] [--split SPLIT] [--print]
                [--binary]
                action

Command line interface for the Data API

positional arguments:
  action                Action to be performed. Possibilities: search, save

optional arguments:
  -h, --help            show this help message and exit
  --regex REGEX         String to be searched
  --from_time FROM_TIME
                        Start time for the data query
  --to_time TO_TIME     End time for the data query
  --from_pulse FROM_PULSE
                        Start pulseId for the data query
  --to_pulse TO_PULSE   End pulseId for the data query
  --channels CHANNELS   Channels to be queried, comma-separated list
  --filename FILENAME   Name of the output file
  --overwrite           Overwrite the output file
  --split SPLIT         Number of pulses or duration (ISO8601) per file
  --print               Prints out the downloaded data. Output can be cut.
  --binary              Download as binary

To export data to a hdf5 file the command line tool can be used as follows:

data_api --from_time "2017-10-30 10:59:45.788" --to_time "2017-10-30 11:00:45.788" --channels S10CB01-RLOD100-PUP10:SIG-AMPLT-AVG --filename testit.h5  save

To improve download speeds use the --binary option for saving data into a hdf5 file.

As downloads might be pretty big and if you are not using the --binary option the current implementation need to keep all data in memory before writing you have to use the --split option to split up the data files. When having this option specified the query will be split in several smaller queries.

In case of an pulse based query this argument takes an integer, in case of a time based query it takes an ISO8601 duration string. Please note that in the case of duration year and month durations are not supported!

Pulse based query:

data_api --from_pulse 5166875100 --to_pulse 5166876100 --channels sf-databuffer/SINEG01-RCIR-PUP10:SIG-AMPLT --split 500 --filename testit.h5 save

Time based query:

data_api --from_time "2018-04-05 09:00:00.000" --to_time "2018-04-05 10:00:00.000" --channels sf-databuffer/SINEG01-RCIR-PUP10:SIG-AMPLT --split PT30M --filename testit.h5 save

Example durations:

PT2M - 2 minutes
PT1H2M - 1 hour and 2 minutes
PT10S - 10 seconds
P1W - 1 week
P1DT6H - one day and 6 hours

Examples

Jupyter Notebook

If you want to run our Jupyter Notebook examples, please clone this repository locally, then:

cd examples
ipython notebook

Name		Name	Last commit message	Last commit date
Latest commit History 265 Commits
.github/workflows		.github/workflows
conda-recipe		conda-recipe
data_api		data_api
data_api2		data_api2
data_api3		data_api3
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
setup.py		setup.py

License

paulscherrerinstitute/data_api_python

Folders and files

Latest commit

History

Repository files navigation