skaters
Fast online univariate time series. Zero dependencies. Runs in Pyodide.
pip install skaters. Every prediction is a Dist — a weighted
Gaussian mixture — not a point estimate. Transforms compose, ensembles nest, and the whole
stack runs in the browser via WebAssembly.
Quick start
from skaters import skater
f = skater(k=3)
state = None
for y in observations:
dists, state = f(y, state)
dists[0].mean # point forecast
dists[0].std # uncertainty
dists[0].quantile(0.975) # 95th percentile
dists[0].logpdf(y) # log-likelihood
dists[0].cdf(y) # CDF at y
Every skater returns list[Dist] — one weighted mixture per horizon
$h = 1, \ldots, k$. Point forecasts, uncertainty, density evaluation, and quantiles are all
facets of the same object.
Named search policies
Every named function builds a Bayesian ensemble over the same full candidate population. The names represent different search strategies — different priors, learning rates, and complexity penalties — not different models.
from skaters import holt, hosking, laplace, samuelson, wald, dantzig
f = holt(k=1) # expect trends (Holt 1957)
f = hosking(k=1) # expect long memory (Hosking 1981)
f = laplace(k=1) # no opinion — let the data decide
f = samuelson(k=1) # there's a drift, find it carefully (Samuelson 1965)
f = wald(k=1) # minimax caution (Wald)
f = dantzig(k=1) # optimize under compute constraints (Dantzig 1947)
| Policy | After | Prior | $\eta$ | $\lambda$ | Best for |
|---|---|---|---|---|---|
holt | Holt 1957 | Differencing + Holt linear | 0.50 | 0.02 | Trending data |
hosking | Hosking 1981 | Fractional differencing | 0.50 | 0.01 | Long memory |
laplace | Laplace | Uniform | 0.80 | 0.005 | General purpose (default) |
samuelson | Samuelson 1965 | Drift + Holt | 0.40 | 0.01 | Persistent drift (GDP, prices) |
wald | Wald | Depth 0 | 0.15 | 0.08 | Adversarial, non-stationary |
dantzig | Dantzig 1947 | Adaptive search | 0.30 | 0.01 | Adaptive (grows pool online) |
Architecture: transforms all the way down
Every model in skaters is a chain of bijective transforms with a distributional
leaf at the bottom:
$y \;\xrightarrow{T_1}\; y' \;\xrightarrow{T_2}\; y'' \;\xrightarrow{\cdots}\; \text{leaf} \;\rightarrow\; \hat{D}$
The leaf estimates $\hat{D} = \mathcal{N}(0, \hat\sigma^2)$ from residuals via Welford's algorithm. Predictions in the original space come from inverting the transform chain: $\hat{D}_{\text{original}} = T_1^{-1}\bigl(T_2^{-1}\bigl(\cdots(\hat{D})\bigr)\bigr).$
Every node returns a list[Dist]. There is no separate “point
forecast” vs “uncertainty” — both are aspects of the same
$\hat{D}$. An EMA doesn't “predict”; it strips off a running level
$\ell_t$, leaving simpler residuals $\varepsilon_t = y_t - \ell_t$, and the prediction is
whatever the leaf's estimate becomes when run back through the inverse.
Transforms
| Transform | Forward | Use case |
|---|---|---|
ema_transform(α) | $y'_t = y_t - \ell_t$ | Remove level |
difference() | $y'_t = y_t - y_{t-1}$ | Random walk |
drift(α, λ) | $y'_t = \Delta y_t - \hat\mu_t$ | Random walk + drift |
holt_linear(α, β) | $y'_t = y_t - (\ell_t + b_t)$ | Level + trend (Holt 1957) |
ar(p) | $y'_t = y_t - \sum \hat\phi_j y_{t-j}$ | Autoregression (online RLS) |
fractional_difference(d) | $y'_t = (1-B)^d y_t$ | Long memory |
standardize(α) | $y'_t = (y_t - \hat\mu_t)/\hat\sigma_t$ | Remove scale |
garch(ω, α, β) | $y'_t = y_t / \hat\sigma_t$ | Volatility clustering |
seasonal_difference(s) | $y'_t = y_t - y_{t-s}$ | Periodicity |
power_transform(p) | $y'_t = \mathrm{sgn}(y_t)|y_t|^p$ | Tail compression |
Conjugation
Transforms compose via conjugation. Given a transform $T$ and a skater $f$, $f_{\text{conj}}(y) = T^{-1}\bigl(f\bigl(T(y)\bigr)\bigr).$
from skaters import conjugate, ema, difference, standardize
# diff removes trend, then EMA predicts the differenced series
f = conjugate(ema(alpha=0.1, k=3), difference(), k=3)
# Chain: standardize, then difference, then EMA
f = conjugate(
conjugate(ema(alpha=0.1, k=3), difference(), k=3),
standardize(),
k=3,
)
# canonical name: std|diff|ema_t|leaf
Ensembles
Precision-weighted (MSE)
Weights by $w_i \propto 1/\text{MSE}_i$ where $\text{MSE} = \text{bias}^2 + \text{variance}$.
from skaters import precision_weighted_ensemble, ema
f = precision_weighted_ensemble([
ema(alpha=0.05, k=1),
ema(alpha=0.2, k=1),
], k=1)
Bayesian (log-likelihood with XGBoost-inspired regularisation)
Each model accumulates a log-weight updated at every observation:
$\log w_i \mathrel{+}= \eta \cdot \log p_i(y_t) - \lambda \cdot d_i,$
where $\eta$ is the learning rate (shrinkage), $\lambda$ is the complexity penalty, and
$d_i$ is the model's depth. Predictions are combined via
Dist.combine with softmax weights.
from skaters import bayesian_ensemble, ema
f = bayesian_ensemble(
[ema(alpha=0.05, k=1), ema(alpha=0.2, k=1)],
k=1,
learning_rate=0.5, # eta: prevents over-concentrating
complexity_penalty=0.02, # lambda: penalises deeper chains
depths=[1, 1],
)
Adaptive search
Beam search over the transform grammar. Grows the candidate population online: expand top performers with new transforms, replay recent history to warm-start, prune losers.
from skaters import search
f = search(
k=1,
expand_interval=100, # expand top performers every 100 obs
max_depth=3, # max transform chain depth
replay_buffer=500, # warm-start new candidates on recent history
max_pool=30,
)
Design
- Online only — $O(1)$ per observation, no batch recomputation.
- Distributional — every prediction is a
Dist, not a point estimate. - Composable — transforms chain, ensembles nest, everything returns
Dist. - Pure Python — zero dependencies. Only
math.erfandmath.exp. - Pyodide compatible — works in the browser via WebAssembly.
Lineage
This package distills ideas from
timemachines, which
provided a common skater interface for dozens of time series packages. skaters
is a from-scratch rewrite focused on speed, distributional predictions, and browser
compatibility.
Get the source
github.com/microprediction/skaters
·
pip install skaters
·
Examples