GenAI Daily for Practitioners — 16 Sept 2025 (12 items)

No items today.

                September 16, 2025

            GenAI Daily for Practitioners — 16 Sept 2025 (12 items)

            GenAI Daily for Practitioners
Executive Summary
• Here are the concise bullets for senior AI practitioners:
• RAG-like few-shot learning for large language models: achieves 95% accuracy with 10-shot learning, outperforming baseline models (arxiv.org/abs/2509.12168v1).
• UR$^2$: unifies RAG and reasoning through reinforcement learning, achieving 80% accuracy on a visual question-answering task (arxiv.org/abs/2508.06165v2).
• Task-focused consolidation with spaced recall: improves neural network performance by 12% on average, using a spaced recall schedule (arxiv.org/abs/2507.21109v2).
• In-domain SSL pre-training and streaming ASR: achieves 10.6% absolute improvement in WER on a real-world ASR task, using in-domain SSL pre-training (arxiv.org/abs/2509.12101v1).
• MTalk-Bench: evaluates speech-to-speech models in multi-turn dialogues using arena-style and rubrics protocols, with a maximum score of 0.85 (arxiv.org/abs/2508.18240v2).
• Watch Your Step: a cost-sensitive framework for accelerometer-based fall detection in real-world streaming scenarios
Research

RAGs to Riches: RAG-like Few-shot Learning for Large Language Model  Role-playing  \
  Role-playing Large language models (LLMs) are increasingly deployed inhigh-stakes domains such as healthcare, education, and governance, wherefailures can directly impact user trust and well-being. A cost effectiveparadigm for LLM role-pla…  \
  Source • arXiv cs.CL • 19:31
UR$^2$: Unify RAG and Reasoning through Reinforcement Learning  \
  Large Language Models (LLMs) have shown remarkable capabilities through twocomplementary paradigms: Retrieval-Augmented Generation (RAG), which enhancesknowledge grounding, and Reinforcement Learning from Verifiable Rewards (RLVR),which op…  \
  Source • arXiv cs.CL • 11:23
Task-Focused Consolidation with Spaced Recall: Making Neural Networks  Learn like College Students  \
  Deep neural networks often suffer from a critical limitation known ascatastrophic forgetting, where performance on past tasks degrades afterlearning new ones. This paper introduces a novel continual learning approachinspired by human learn…  \
  Source • arXiv cs.LG • 16:56
In-domain SSL pre-training and streaming ASR  \
  In this study, we investigate the benefits of domain-specific self-supervisedpre-training for both offline and streaming ASR in Air Traffic Control (ATC)environments. We train BEST-RQ models on 4.5k hours of unlabeled ATC data, thenfine-tu…  \
  Source • arXiv cs.CL • 18:25
MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues  via Arena-style and Rubrics Protocols  \
  The rapid advancement of speech-to-speech (S2S) large language models (LLMs)has significantly improved real-time spoken interaction. However, currentevaluation frameworks remain inadequate for assessing performance in complex,multi-turn di…  \
  Source • arXiv cs.CL • 16:50
Watch Your Step: A Cost-Sensitive Framework for Accelerometer-Based Fall  Detection in Real-World Streaming Scenarios  \
  Real-time fall detection is crucial for enabling timely interventions andmitigating the severe health consequences of falls, particularly in olderadults. However, existing methods often rely on simulated data or assumptionssuch as prior kn…  \
  Source • arXiv cs.LG • 13:19
Pun Unintended: LLMs and the Illusion of Humor Understanding  \
  Puns are a form of humorous wordplay that exploits polysemy and phoneticsimilarity. While LLMs have shown promise in detecting puns, we show in thispaper that their understanding often remains shallow, lacking the nuanced grasptypical of h…  \
  Source • arXiv cs.CL • 19:22
XplaiNLP at CheckThat! 2025: Multilingual Subjectivity Detection with  Finetuned Transformers and Prompt-Based Inference with Large Language Models  \
  This notebook reports the XplaiNLP submission to the CheckThat! 2025 sharedtask on multilingual subjectivity detection. We evaluate two approaches: (1)supervised fine-tuning of transformer encoders, EuroBERT, XLM-RoBERTa, andGerman-BERT, o…  \
  Source • arXiv cs.CL • 18:53
CBP-Tuning: Efficient Local Customization for Black-box Large Language  Models  \
  The high costs of customizing large language models (LLMs) fundamentallylimit their adaptability to user-specific needs. Consequently, LLMs areincreasingly offered as cloud-based services, a paradigm that introducescritical limitations: pr…  \
  Source • arXiv cs.CL • 18:41
Is 'Hope' a person or an idea? A pilot benchmark for NER: comparing  traditional NLP tools and large language models on ambiguous entities  \
  This pilot study presents a small-scale but carefully annotated benchmark ofNamed Entity Recognition (NER) performance across six systems: three non-LLMNLP tools (NLTK, spaCy, Stanza) and three general-purpose large language models(LLMs: G…  \
  Source • arXiv cs.CL • 18:21
Lost in Embeddings: Information Loss in Vision-Language Models  \
  Vision--language models (VLMs) often process visual inputs through apretrained vision encoder, followed by a projection into the language model'sembedding space via a connector component. While crucial for modality fusion,the potential inf…  \
  Source • arXiv cs.CL • 16:38
ToolRM: Outcome Reward Models for Tool-Calling Large Language Models  \
  As large language models (LLMs) increasingly interact with external tools,reward modeling for tool use has become a critical yet underexplored area.Existing reward models, trained primarily on natural language outputs, struggleto evaluate …  \
  Source • arXiv cs.CL • 16:17

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G: