CV Brief · Monday, 18 May 2026
CV Brief
Research & Papers
VLM alignment: pre-train vision features closer to text space
Deep Pre-Alignment improves how vision and language models connect by aligning visual features to text space before LLM layers, reducing wasted depth on superficial modality alignment. Directly applicable to production VLM pipelines where feature-to-text mapping bottlenecks inference speed and accuracy.
Read more →Multimodal detection under forest canopy via LiDAR-thermal fusion
Combines LiDAR and thermal imaging to detect humans under dense occlusion—a hard real-world problem. Demonstrates practical multimodal fusion strategy for structured occlusion scenarios common in remote sensing and surveillance deployments.
Read more →COPRA: Adapt video anomaly detection with RL during inference
Uses reinforcement learning to dynamically adjust VLM parameters at inference time for video anomaly detection, addressing distribution shift between training and real-world scenarios. Tackles the practical problem of model drift in deployed VAD systems.
Read more →Getting Started in CV/ML
Face Detection in Python & OpenCV: Complete Beginner Guide
Hands-on tutorial walking through face detection implementation line-by-line using Python and OpenCV. Covers the fundamentals of how face detection works under the hood with runnable code examples that practitioners can apply immediately to real projects.
Read more →pHash deduplication for video crops: use Hamming distance ≤10 as your threshold. Too tight misses duplicates, too loose removes valid unique crops.