We need a principled way of managing state in real-time ML pipelines.

Written by Sarah Wooders, Peter Schafhalter, and Joey Gonzalez

The RISE of Feature Stores

As more models are deployed in real-world pipelines, the recurring lesson is that data and data featurization matters above all else. The last generation of big data systems scaled ML to real-world datasets, and now feature stores are quickly emerging as a new frontier for connecting models to real-time data [1].

Feature stores, as the name implies, store features derived from raw data and serve them to downstream models for training and inference. For…

