skaters

Fast online univariate time series. Zero dependencies. Runs in Pyodide.

distributional O(1) per observation pure Python browser-compatible composable

pip install skaters. Every prediction is a Dist — a weighted Gaussian mixture — not a point estimate. Transforms compose, ensembles nest, and the whole stack runs in the browser via WebAssembly.

Quick start

from skaters import skater

f = skater(k=3)
state = None
for y in observations:
    dists, state = f(y, state)
    dists[0].mean              # point forecast
    dists[0].std               # uncertainty
    dists[0].quantile(0.975)   # 95th percentile
    dists[0].logpdf(y)         # log-likelihood
    dists[0].cdf(y)            # CDF at y

Every skater returns list[Dist] — one weighted mixture per horizon $h = 1, \ldots, k$. Point forecasts, uncertainty, density evaluation, and quantiles are all facets of the same object.

Named search policies

Every named function builds a Bayesian ensemble over the same full candidate population. The names represent different search strategies — different priors, learning rates, and complexity penalties — not different models.

from skaters import holt, hosking, laplace, samuelson, wald, dantzig

f = holt(k=1)       # expect trends (Holt 1957)
f = hosking(k=1)    # expect long memory (Hosking 1981)
f = laplace(k=1)    # no opinion — let the data decide
f = samuelson(k=1)  # there's a drift, find it carefully (Samuelson 1965)
f = wald(k=1)       # minimax caution (Wald)
f = dantzig(k=1)    # optimize under compute constraints (Dantzig 1947)
PolicyAfterPrior$\eta$$\lambda$Best for
holtHolt 1957Differencing + Holt linear0.500.02Trending data
hoskingHosking 1981Fractional differencing0.500.01Long memory
laplaceLaplaceUniform0.800.005General purpose (default)
samuelsonSamuelson 1965Drift + Holt0.400.01Persistent drift (GDP, prices)
waldWaldDepth 00.150.08Adversarial, non-stationary
dantzigDantzig 1947Adaptive search0.300.01Adaptive (grows pool online)

Architecture: transforms all the way down

Every model in skaters is a chain of bijective transforms with a distributional leaf at the bottom:

$y \;\xrightarrow{T_1}\; y' \;\xrightarrow{T_2}\; y'' \;\xrightarrow{\cdots}\; \text{leaf} \;\rightarrow\; \hat{D}$

The leaf estimates $\hat{D} = \mathcal{N}(0, \hat\sigma^2)$ from residuals via Welford's algorithm. Predictions in the original space come from inverting the transform chain: $\hat{D}_{\text{original}} = T_1^{-1}\bigl(T_2^{-1}\bigl(\cdots(\hat{D})\bigr)\bigr).$

Every node returns a list[Dist]. There is no separate “point forecast” vs “uncertainty” — both are aspects of the same $\hat{D}$. An EMA doesn't “predict”; it strips off a running level $\ell_t$, leaving simpler residuals $\varepsilon_t = y_t - \ell_t$, and the prediction is whatever the leaf's estimate becomes when run back through the inverse.

Transforms

TransformForwardUse case
ema_transform(α)$y'_t = y_t - \ell_t$Remove level
difference()$y'_t = y_t - y_{t-1}$Random walk
drift(α, λ)$y'_t = \Delta y_t - \hat\mu_t$Random walk + drift
holt_linear(α, β)$y'_t = y_t - (\ell_t + b_t)$Level + trend (Holt 1957)
ar(p)$y'_t = y_t - \sum \hat\phi_j y_{t-j}$Autoregression (online RLS)
fractional_difference(d)$y'_t = (1-B)^d y_t$Long memory
standardize(α)$y'_t = (y_t - \hat\mu_t)/\hat\sigma_t$Remove scale
garch(ω, α, β)$y'_t = y_t / \hat\sigma_t$Volatility clustering
seasonal_difference(s)$y'_t = y_t - y_{t-s}$Periodicity
power_transform(p)$y'_t = \mathrm{sgn}(y_t)|y_t|^p$Tail compression

Conjugation

Transforms compose via conjugation. Given a transform $T$ and a skater $f$, $f_{\text{conj}}(y) = T^{-1}\bigl(f\bigl(T(y)\bigr)\bigr).$

from skaters import conjugate, ema, difference, standardize

# diff removes trend, then EMA predicts the differenced series
f = conjugate(ema(alpha=0.1, k=3), difference(), k=3)

# Chain: standardize, then difference, then EMA
f = conjugate(
    conjugate(ema(alpha=0.1, k=3), difference(), k=3),
    standardize(),
    k=3,
)
# canonical name: std|diff|ema_t|leaf

Ensembles

Precision-weighted (MSE)

Weights by $w_i \propto 1/\text{MSE}_i$ where $\text{MSE} = \text{bias}^2 + \text{variance}$.

from skaters import precision_weighted_ensemble, ema

f = precision_weighted_ensemble([
    ema(alpha=0.05, k=1),
    ema(alpha=0.2, k=1),
], k=1)

Bayesian (log-likelihood with XGBoost-inspired regularisation)

Each model accumulates a log-weight updated at every observation: $\log w_i \mathrel{+}= \eta \cdot \log p_i(y_t) - \lambda \cdot d_i,$ where $\eta$ is the learning rate (shrinkage), $\lambda$ is the complexity penalty, and $d_i$ is the model's depth. Predictions are combined via Dist.combine with softmax weights.

from skaters import bayesian_ensemble, ema

f = bayesian_ensemble(
    [ema(alpha=0.05, k=1), ema(alpha=0.2, k=1)],
    k=1,
    learning_rate=0.5,       # eta: prevents over-concentrating
    complexity_penalty=0.02, # lambda: penalises deeper chains
    depths=[1, 1],
)

Adaptive search

Beam search over the transform grammar. Grows the candidate population online: expand top performers with new transforms, replay recent history to warm-start, prune losers.

from skaters import search

f = search(
    k=1,
    expand_interval=100,   # expand top performers every 100 obs
    max_depth=3,           # max transform chain depth
    replay_buffer=500,     # warm-start new candidates on recent history
    max_pool=30,
)

Design

Lineage

This package distills ideas from timemachines, which provided a common skater interface for dozens of time series packages. skaters is a from-scratch rewrite focused on speed, distributional predictions, and browser compatibility.

Get the source

github.com/microprediction/skaters · pip install skaters · Examples