Hey folks. I built Timefence because I kept hitting the same bug when building ML training sets. You LEFT JOIN feature tables to labels and some rows end up with feature timestamps after the prediction event so the model trains on future data. Took me forever to debug the first time because nothing errors out.
Timefence audits your dataset for any rows where feature_time > label_time, and can rebuild it with point-in-time correct joins. Built on DuckDB, handles 1M labels × 10 features in ~12s. Also has a --strict flag for CI.
pip install timefence
timefence quickstart churn-example && cd churn-example
timefence audit data/train_LEAKY.parquet
MIT licensed. Happy to answer any questions you might have.