Skip to content

vshulyak/ts-eval

Repository files navigation

ts-eval Time Series analysis and evaluation tools

pypi Build Status codecov python3 Code style: black License: MIT Contributions welcome


A set of tools to make time series analysis easier.

🧩 Current features

  • N-step ahead time series evaluation – using a Jupyter widget.
  • Friedman / Nemenyi rank test (posthoc) – to see which model statistically performs better.
  • Relative Metrics – rMSE, rMAE + Forecasted Value analogues.
  • Prediction Interval Metrics – MIS, rMIS, FVrMIS
  • Fixed fourier series generation – fixed in time according to pandas index
  • Naive/Seasonal models for baseline predictions (with prediction intervals)
  • Statsmodels n-step evaluation – helper functions to evaluate n-step ahead forecasts using Statsmodels models or naive/seasonal naive models.

👩🏾‍🎨 Widget Preview

In:

TSMetrics(target, sm_seas, default)
.use_reference(snaive)
.for_horizons(0, 1, 5, 23)
.for_time_slices(time_slices.all, time_slices.weekend)
.with_description()
.with_prediction_rankings(mtx.FVrMSE, mtx.FVrMIS)
.with_predictions_plot()
.show()

Out: Demo Screenshot

👩🏾‍🚒 Demo

For a more elaborate example, please check out the Demo Notebook.

Alternatively, check out interactive Binder demo

🤦🏾‍ Motivation

While working on a long term time series analysis project, I had a need to summarize and store performance metrics of different models and compare them. As it's daunting to do this across dozens of notebooks, I huddled over some code to do what I want in a few lines of code.

👩🏾‍🚀 Installation

  pip install ts-eval

📋 Release Planning:

  • Release 0.3
    • remove collection of deps in style [tests_and_bla_bla] to [tests,bla]
    • links to papers – AvgRelMAE (Davydenko and Fildes, 2013); link to Nemenyi paper / implementations
    • make graphs with PIs more narrow on 0,1,.. steps as there's too much space left (with an option to turn this off).
    • better API for the end user – minimize interaction with xarray
    • pep517 build / wheels / better setup.py as per Hynek
    • travis: add 3.8 default python when it's available
    • docs: supported metrics & API options
    • Maybe use api like Summary in statsmodels MLEModel class, it has extend methods and warn/info messages
    • pretty legend for lots like here https://studywolf.wordpress.com/2017/11/21/matplotlib-legends-for-mean-and-confidence-interval-plots/
    • Look for TODOs
    • changable colors
    • turn off colored display option
    • a nicer API for raw metrics container
    • codacy badge
    • boxplots to compare models (as an alternative)
    • violin plots to compare predictions – areas can be colored, different metrics on left and right (like relative...)
  • Release 0.4
    • holiday/fourier features model
    • fix viz module to have less of important stuff
    • a gif with project visualization
    • check shapes of input arrays (target vs preds), now it doesn't raise an error
    • Baseline prediction using target dataset (without explicit calculation, but losing some time points)

💡 Ideas

  • components
    • Graph: Visualize outliers from confidence interval
    • Multi-comparison component: scikit_posthocs lib or homecooked?
    • inspect true confidence interval coverage via sampling (was done in postings around bayesian dropout sampling)
    • xarrays: compare if compared datasets are actually equal (offets by dates, shapes, maybe even hashing)
    • bin together step performance, like steps 0-1, 2-5, 6-12, 13-24
    • highlight regions using a mask (holidays, etc.)
    • option to view interactively points using widget (plotly)?
    • diagnostics: bias to over / underestimate points
    • animated graphs for change in seasonality
  • features
    • example notebook for fourier?
    • tests for fourier
    • nint generation
  • utils:
    • model adaptor (for different models, generic) which generates 3d prediction dataset. For stastmodels using dyn forecast or kalman filter
    • future importance calculator, but only if I can manipulate input features
    • feature selection using PACF / prewhiten?
  • project
  • sMAPE & MASE can be added for the jupyter evaluation tables
  • ? Residual stats: since I have residuals => Ljung-Box, Heteroscedasticity test, Jarque-Bera – like in statsmodels results, but probably these stats were inspected already by the user... and on which step should they be computed then?

See also

🤹🏼‍♂️ Development

Recommended development workflow:

pipenv install -e .[dev]
pipenv shell

The library doesn't use Flit/Poetry, so the suggested workflow is based on Pipenv (as per pypa/pipenv#1911). Pipfile* are ignored in the .gitignore.