LR-ASD★★★★★
The 2025 state-of-the-art for 'which face is actually talking.' Fast, tiny, accurate.
why it matters
LR-ASD is the newest open-source active speaker detection model (Springer IJCV 2025 paper). It tells your video pipeline which person in a multi-face frame is actually talking. Accuracy beats the older TalkNet approach and it's 23 times lighter — fast enough to run on every frame, not just samples.
If you're building your own clipping or auto-crop pipeline and accuracy matters more than a pre-built library, this is the one to drop in. MIT, free, Python.
install
git clone https://github.com/Junhua-Liao/LR-ASD && pip install -r requirements.txtwhere to find it
no commits in 1 year. this doesn't mean it's broken — some small repos are "finished" — but if you hit an install issue, it may not get patched quickly.
You're not building a pipeline yourself. This is a research model, not a product.