Skip to content

paulscherrerinstitute/data_api_python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

conda_publish

Overview

This package gives access to the data in the PSI archiving systems. It can be used to download channel data given a specific time range. For historic reasonst there are right now different modules for the different versions of APIs these archiving systems expose. In future the those modules will be merged and old versions will be removed.

Short overview about some modules (see below for details):

Module data_api.client returns data as Pandas data frame. This is the current way to access the DataBuffer. Works with the current databuffer server at https://data-api.psi.ch but has problems with duplicate timestamps, stray NaN values and inefficient transfers.

Module data_api3.h5 saves data as HDF5. This is the current way to access the ImageBuffer. Only available with imagebuffer and a pre-release service for databuffer within the machine network. This will become the recommended usage also for databuffer.

Installation

Install via Anaconda/Miniconda:

conda config --prepend channels paulscherrerinstitute
conda install data_api

Usage from commandline with current https://data-api.psi.ch

data_api save --filename output.h5 --from_time 2020-10-08T19:30:00Z --to_time 2020-10-08T19:31:00Z --channels SARES11-LSCP10-FNS:CH0:VAL_GET,SARES11-LSCP10-FNS:CH3:VAL_GET

Usage from commandline with /api/1 service

This newer service is currently in testing.

api3 --baseurl https://data-api.psi.ch/api/1 --default-backend sf-databuffer save output.h5 2020-10-08T19:30:00.123Z 2020-10-08T19:33:00.789Z SINLH01-DBAM010:EOM1_T1

Usage as library with /api/1 service

sf-databuffer

import data_api3.h5
query = {
  "channels": ["SINLH01-DBAM010:EOM1_T1"],
  "range": {
    "startDate": "2023-02-03T03:09:00Z",
    "endDate": "2023-02-03T03:09:02Z",
  },
}
data_api3.h5.request(query, baseurl="https://data-api.psi.ch/api/1", filename="output.h5", default_backend="sf-databuffer")

sf-imagebuffer

import data_api3.h5
query = {
  "channels": ["SOME-CAMERA:FPICTURE"],
  "range": {
    "startDate": "2020-10-08T19:30:00Z",
    "endDate": "2020-10-08T19:31:00Z",
  },
}
data_api3.h5.request(query, baseurl="http://sf-daq-5.psi.ch:8380/api/1", filename="output.h5", default_backend="sf-imagebuffer")

Usage as library with default service

import data_api as api

Search for channels:

channels = api.search("SINSB02-RIQM-DCP10:FOR-PHASE")

The channels variable will hold something like this:

[{'backend': 'sf-databuffer',
  'channels': ['SINSB02-RIQM-DCP10:FOR-PHASE',
   'SINSB02-RIQM-DCP10:FOR-PHASE-AVG']},
 {'backend': 'sf-archiverappliance',
  'channels': ['SINSB02-RIQM-DCP10:FOR-PHASE-AVG-P2P',
   'SINSB02-RIQM-DCP10:FOR-PHASE-JIT-P2P',
   'SINSB02-RIQM-DCP10:FOR-PHASE-STDEV']}]

Get data by global timestamp:

import datetime
now = datetime.datetime.now()
end = now-datetime.timedelta(minutes=1)
start = end-datetime.timedelta(seconds=10)
data = api.get_data(channels=['SINSB02-RIQM-DCP10:FOR-PHASE'], start=start, end=end)

In the case to query a specific backend specify the base_url option in the get_data call. For example for hipa use api.get_data(... base_url='https://data-api.psi.ch/hipa')

Get data by pulseId:

import datetime
start_pulse_id = 123456
stop_pulse_id = 234567
data = api.get_data(channels=['SINSB02-RIQM-DCP10:FOR-PHASE'], start=start_pulse_id, end=stop_pulse_id, range_type="pulseId")

Get approximate pulseId by global timestamp:

Warning: This will not give you an exact pulse_id, just the closest pulse_id in the data buffer from the global timestamp you requested. The pulse id might be skewed by maximum 30 seconds.

from datetime import datetime
global_timestamp = datetime.now()

# If you do not pass a global_timestamp, the current time will be used.
pulse_id = api.get_pulse_id_from_timestamp(global_timestamp)

Show head of datatable:

data.head()

Find all data corresponding to given index:

data.loc["1468476300.047550981"]

Plot data:

import matplotlib.pyplot as plt
data.plot.scatter("SINSB02-RIQM-DCP10:FOR-PHASE-AVG", "SINSB02-RKLY-DCP10:FOR-PHASE-AVG")
plt.show()

alt text

import matplotlib.pyplot as plt
data[['SINSB02-RIQM-DCP10:FOR-PHASE-AVG', ]].plot.box()
plt.show()

alt text

Plot waveforms:

plt.plot(data['SINSB02-RIQM-DCP10:FOR-PHASE']['1468476300.237551000'])
plt.show()

alt text

Find where you have data:

data[data['SINSB02-RIQM-DCP10:FOR-PHASE'].notnull()]

Save data:

# to csv
data.to_csv("test.csv")

# to hdf5
data.to_hdf("test.h5", "/dataset")

Use Server-Side Aggregation

To minimize data transfer requirements, data can be requested in an aggregated way from the API. The server than takes care of aggregating the values and only send the aggregated values to the client.

import data_api as api
import datetime
now = datetime.datetime.now()
end = now-datetime.timedelta(minutes=1)
start = end-datetime.timedelta(seconds=10)



aggregation = api.Aggregation(aggregation_type="value", aggregations=["min", "mean", "max"], extrema=None, nr_of_bins=None, duration_per_bin=None, pulses_per_bin=None) # Just set the parameters you explicitly want to set - this example is showing the defaults - for more details about the parameters and their effect see https://git.psi.ch/sf_daq/ch.psi.daq.queryrest#data-aggregation

data = data_api.get_data(channel_list, start=start, end=end, aggregation=aggregation)

For more details on the aggregation values and their effects see: https://git.psi.ch/sf_daq/ch.psi.daq.queryrest#data-aggregation

Query Specific Backend

By default the data API first queries the DataBuffer for the channel, if the channel is not found there, it then does a query to the Epics Archiver.

If you want to explicitly specify which backend/system the channel should be queried from you can prepend the channel name with either sf-databuffer/ or sf-archiverappliance/

"sf-databuffer/CHAN1"
# or
"sf-archiverappliance/CHAN1"

Query For PulseId Global Timestamp Mapping

To find the correspondig global timestamp of a given pulseid this method can be used:

import data_api as api

api.get_global_date(pulseid)

# Query for multiple pulseids mappings
api.get_global_date([pulseid1, pulseid2])

The method accepts a single or multiple pulseids and returns a list of global dates for the specified pulseids. By default the method uses the beam ok channel (SIN-CVME-TIFGUN-EVR0:BUNCH-1-OK) to do the mapping. If the mapping cannot be done the method raises an ValueException. In that case a different mapping channel via the functions optional parameter mapping_channel can be specified

Command Line Interface

The packages functionality is also provided by a command line tool. On the command line data can be retrieved as follow:

$ data_api -h
usage: data_api [-h] [--regex REGEX] [--from_time FROM_TIME]
                [--to_time TO_TIME] [--from_pulse FROM_PULSE]
                [--to_pulse TO_PULSE] [--channels CHANNELS]
                [--filename FILENAME] [--overwrite] [--split SPLIT] [--print]
                [--binary]
                action

Command line interface for the Data API

positional arguments:
  action                Action to be performed. Possibilities: search, save

optional arguments:
  -h, --help            show this help message and exit
  --regex REGEX         String to be searched
  --from_time FROM_TIME
                        Start time for the data query
  --to_time TO_TIME     End time for the data query
  --from_pulse FROM_PULSE
                        Start pulseId for the data query
  --to_pulse TO_PULSE   End pulseId for the data query
  --channels CHANNELS   Channels to be queried, comma-separated list
  --filename FILENAME   Name of the output file
  --overwrite           Overwrite the output file
  --split SPLIT         Number of pulses or duration (ISO8601) per file
  --print               Prints out the downloaded data. Output can be cut.
  --binary              Download as binary

To export data to a hdf5 file the command line tool can be used as follows:

data_api --from_time "2017-10-30 10:59:45.788" --to_time "2017-10-30 11:00:45.788" --channels S10CB01-RLOD100-PUP10:SIG-AMPLT-AVG --filename testit.h5  save

To improve download speeds use the --binary option for saving data into a hdf5 file.

As downloads might be pretty big and if you are not using the --binary option the current implementation need to keep all data in memory before writing you have to use the --split option to split up the data files. When having this option specified the query will be split in several smaller queries.

In case of an pulse based query this argument takes an integer, in case of a time based query it takes an ISO8601 duration string. Please note that in the case of duration year and month durations are not supported!

Pulse based query:

data_api --from_pulse 5166875100 --to_pulse 5166876100 --channels sf-databuffer/SINEG01-RCIR-PUP10:SIG-AMPLT --split 500 --filename testit.h5 save

Time based query:

data_api --from_time "2018-04-05 09:00:00.000" --to_time "2018-04-05 10:00:00.000" --channels sf-databuffer/SINEG01-RCIR-PUP10:SIG-AMPLT --split PT30M --filename testit.h5 save

Example durations:

  • PT2M - 2 minutes
  • PT1H2M - 1 hour and 2 minutes
  • PT10S - 10 seconds
  • P1W - 1 week
  • P1DT6H - one day and 6 hours

Examples

Jupyter Notebook

If you want to run our Jupyter Notebook examples, please clone this repository locally, then:

cd examples
ipython notebook

About

Releases: new releases are in conda, or you can use this git repo of course directly and check out a version tag.

Resources

License

Stars

Watchers

Forks

Languages