AI Research Brief

Archives
April 28, 2026

ProEval Cuts Benchmark Eval Samples 8-65x

  • Benchmark Eval Becomes a Probability Problem. Google's ProEval treats LLM benchmark scoring as Bayesian estimation with a pretrained Gaussian process surrogate, cutting sample budgets 8-65x at 1% error.
  • FT vs ICL Finally Has a Clean Comparison. On formal-language tasks, in-distribution FT wins clearly, out-of-distribution they tie, and ICL's sensitivity to model scale and tokenization shows up as structural rather than noise.
  • Copyrighted Corpora Get a Legal Workaround. Annotations release in plaintext while source text ships as non-reversible hashes; cross-edition alignment still hits 98.7%-99.79% token match.
  • SAM in the Clinic Stalls on Prompts, Not the Model. Saliency-guided anatomical priors plus cross-slice consistency keep SAM stable when the only input is a sloppy midline point.

Also Notable

  • Searching Surveillance Footage for Anomalous Behavior via Text. A cascade framework runs coarse alignment first and then refines, splitting geometric structure from semantic intent across two stages.
  • VLM Pseudo-Labels Carry Systematic Bias in Open-Vocabulary Detection. Hierarchical consistency constraints debias the labels so objectness doesn't inherit pretraining-distribution skew.
  • Same Person, Different Roles Across Events in a Video. Multimodal coreference makes identity-role mapping explicit so VidSitu stops fragmenting one person into many.
  • Text-to-Motion Modeled at Multiple Time Scales Separately. Hierarchical flow matching handles coarse structure and fine motion together, avoiding the single-scale tradeoff.
  • Semi-Supervised Medical Segmentation Goes Beyond Masks. Generative dual-distribution alignment adds feature-level supervision, mining more signal from unlabeled data.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.