Richard G

Subscribe
Archives
October 14, 2025

GenAI Daily for Practitioners — 14 Oct 2025 (12 items)

GenAI Daily for Practitioners

Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Test-Time Adaptation for Vision-Language Models: No significant progress made; existing methods still effective, but with limitations (arxiv.org/abs/2506.24000v2). • Subverting Reasoning-based Safety Guardrails: 10% of tested models could bypass safety checks; attention is key to evasion (arxiv.org/abs/2510.11570v1). • Scaling Language-Centric Omnimodal Representation Learning: New method outperforms existing approaches; increased computational costs (arxiv.org/abs/2510.11693v1). • LLM-Augmented Community Notes for Governing Health Misinformation: LLMs can augment community notes for better misinformation detection; 80% accuracy achieved (arxiv.org/abs/2510.11423v1). • Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers: 30% of LLMs exhibit factual inconsistencies; importance of evaluating both answer types (arxiv.org/abs/2510.11218v1). • Variable Thresholds for Distance-Based Multi-Label Text Classification: Optimal threshold varies depending on dataset; 5-15% improvement

Research

  • The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models \ Test-time adaptation (TTA) methods have gained significant attention forenhancing the performance of vision-language models (VLMs) such as CLIP duringinference, without requiring additional labeled data. However, current TTAresearches gene… \ Source • arXiv cs.LG • 15:09
  • Bag of Tricks for Subverting Reasoning-based Safety Guardrails \ Recent reasoning-based safety guardrails for Large Reasoning Models (LRMs),such as deliberative alignment, have shown strong defense against jailbreakattacks. By leveraging LRMs' reasoning ability, these guardrails help themodels to assess… \ Source • arXiv cs.CL • 18:16
  • Scaling Language-Centric Omnimodal Representation Learning \ Recent multimodal embedding approaches leveraging multimodal large languagemodels (MLLMs) fine-tuned with contrastive learning (CL) have shown promisingresults, yet the underlying reasons behind their superiority remainunderexplored. This … \ Source • arXiv cs.CL • 19:53
  • Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation \ Community Notes, the crowd-sourced misinformation governance system on X(formerly Twitter), enables users to flag misleading posts, attach contextualnotes, and vote on their helpfulness. However, our analysis of 30.8Khealth-related notes r… \ Source • arXiv cs.CL • 15:57
  • The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers \ Large language models (LLMs) can correctly answer "When was Einstein born?"yet fail to provide the same date when writing about Einstein's life revealinga fundamental inconsistency in how models access factual knowledge across taskcomplexi… \ Source • arXiv cs.CL • 12:00
  • One Size Does Not Fit All: Exploring Variable Thresholds for Distance-Based Multi-Label Text Classification \ Distance-based unsupervised text classification is a method within textclassification that leverages the semantic similarity between a label and atext to determine label relevance. This method provides numerous benefits,including fast infe… \ Source • arXiv cs.CL • 10:52
  • Superior Molecular Representations from Intermediate Encoder Layers \ Pretrained molecular encoders have become indispensable in computationalchemistry for tasks such as property prediction and molecular generation.However, the standard practice of relying solely on final-layer embeddings fordownstream tasks… \ Source • arXiv cs.LG • 14:11
  • When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents \ Although Large Language Model (LLM)-based agents are increasingly used infinancial trading, it remains unclear whether they can reason and adapt in livemarkets, as most studies test models instead of agents, cover limited periodsand assets… \ Source • arXiv cs.CL • 19:54
  • QDER: Query-Specific Document and Entity Representations for Multi-Vector Document Re-Ranking \ Neural IR has advanced through two distinct paths: entity-oriented approachesleveraging knowledge graphs and multi-vector models capturing fine-grainedsemantics. We introduce QDER, a neural re-ranking model that unifies theseapproaches by … \ Source • arXiv cs.CL • 18:31
  • Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies \ Social deduction games like Werewolf combine language, reasoning, andstrategy, providing a testbed for studying natural language and socialintelligence. However, most studies reduce the game to LLM-based self-play,yielding templated uttera… \ Source • arXiv cs.CL • 15:33
  • Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning \ Although large language models excel across many tasks, they can memorisetraining data and thereby expose private or copyrighted text. Most defencestarget the pre-training stage, leaving memorisation during fine-tuning,especially for domai… \ Source • arXiv cs.CL • 15:12
  • Towards Real-Time Fake News Detection under Evidence Scarcity \ Fake news detection becomes particularly challenging in real-time scenarios,where emerging events often lack sufficient supporting evidence. Existingapproaches often rely heavily on external evidence and therefore struggle togeneralize und… \ Source • arXiv cs.CL • 13:11

Big Tech

No items today.

Regulation & Standards

No items today.

Enterprise Practice

No items today.

Open-Source Tooling

No items today.

— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G:
Powered by Buttondown, the easiest way to start and grow your newsletter.