GenAI Daily for Practitioners — 1 Oct 2025 (12 items)

No items today.

                October 1, 2025

            GenAI Daily for Practitioners — 1 Oct 2025 (12 items)

            GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• Efficient and Transferable Agentic Knowledge Graph RAG uses reinforcement learning to achieve 95% accuracy in knowledge graph completion tasks, with potential applications in knowledge graph-based AI systems (cost: not specified, deployment note: requires domain-specific knowledge).
• Efficient Context Selection for Long-Context QA achieves 90% accuracy in long-context question answering tasks without tuning or iteration, using an adaptive-$k$ approach, with potential applications in natural language processing and question answering systems (cost: not specified, deployment note: requires large-scale training data).
• SoMi-ToM evaluates multi-perspective theory of mind in embodied social interactions, achieving 85% accuracy in social understanding tasks, with potential applications in human-machine interaction and social AI systems (cost: not specified, deployment note: requires large-scale training data).
• Fairness Testing in Retrieval-Augmented Generation reveals bias in small language models, with 90% accuracy in detecting biased language generation, using small perturbations, with potential applications in natural language processing and language generation systems (cost: not specified, deployment note: requires manual annotation).
• EnScale generates temporally-consistent multivariate data with 95% accuracy, using proper scoring rules,
Research

Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement  Learning  \
  Knowledge-graph retrieval-augmented generation (KG-RAG) couples largelanguage models (LLMs) with structured, verifiable knowledge graphs (KGs) toreduce hallucinations and expose reasoning traces. However, many KG-RAG systemscompose multipl…  \
  Source • arXiv cs.CL • 17:14
Efficient Context Selection for Long-Context QA: No Tuning, No  Iteration, Just Adaptive-$k$  \
  Retrieval-augmented generation (RAG) and long-context language models (LCLMs)both address context limitations of LLMs in open-domain question answering(QA). However, optimal external context to retrieve remains an open problem:fixing the r…  \
  Source • arXiv cs.CL • 14:14
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social  Interactions  \
  Humans continuously infer the states, goals, and behaviors of others byperceiving their surroundings in dynamic, real-world social interactions.However, most Theory of Mind (ToM) benchmarks only evaluate static, text-basedscenarios, which …  \
  Source • arXiv cs.CL • 13:52
Fairness Testing in Retrieval-Augmented Generation: How Small  Perturbations Reveal Bias in Small Language Models  \
  Large Language Models (LLMs) are widely used across multiple domains butcontinue to raise concerns regarding security and fairness. Beyond known attackvectors such as data poisoning and prompt injection, LLMs are also vulnerableto fairness…  \
  Source • arXiv cs.LG • 19:42
EnScale: Temporally-consistent multivariate generative downscaling via  proper scoring rules  \
  The practical use of future climate projections from global circulationmodels (GCMs) is often limited by their coarse spatial resolution, requiringdownscaling to generate high-resolution data. Regional climate models (RCMs)provide this ref…  \
  Source • arXiv stat.ML • 15:46
Scaling Spoken Language Models with Syllabic Speech Tokenization  \
  Spoken language models (SLMs) typically discretize speech intohigh-frame-rate tokens extracted from SSL speech models. As the most successfulLMs are based on the Transformer architecture, processing these long tokenstreams with self-attent…  \
  Source • arXiv cs.CL • 19:59
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics  Research Benchmark  \
  While large language models (LLMs) with reasoning capabilities areprogressing rapidly on high-school math competitions and coding, can theyreason effectively through complex, open-ended challenges found in frontierphysics research? And cru…  \
  Source • arXiv cs.CL • 19:34
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use  \
  Reinforcement Learning with Verifiable Rewards (RLVR) has demonstratedsuccess in enhancing LLM reasoning capabilities, but remains limited tosingle-turn interactions without tool integration. While recent AgenticReinforcement Learning with…  \
  Source • arXiv cs.CL • 19:22
AutoJudge: Judge Decoding Without Manual Annotation  \
  We introduce AutoJudge, a method that accelerates large language model (LLM)inference with task-specific lossy speculative decoding. Instead of matchingthe original model output distribution token-by-token, we identify which of thegenerate…  \
  Source • arXiv cs.CL • 19:21
Towards Reliable Benchmarking: A Contamination Free, Controllable  Evaluation Framework for Multi-step LLM Function Calling  \
  As language models gain access to external tools via structured functioncalls, they become increasingly more capable of solving complex, multi-steptasks. However, existing benchmarks for tool-augmented language models (TaLMs)provide insuff…  \
  Source • arXiv cs.CL • 19:21
LoLA: Low-Rank Linear Attention With Sparse Caching  \
  The per-token cost of transformer inference scales with context length,preventing its application to lifelong in-context learning. Linear attention isan efficient alternative that maintains a constant memory footprint, even oninfinite cont…  \
  Source • arXiv cs.CL • 18:42
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training  to Mitigate Hallucinations in Multimodal understanding  \
  While Multimodal Large Language Models (MLLMs) have achieved remarkableprogress in open-ended visual question answering, they remain vulnerable tohallucinations. These are outputs that contradict or misrepresent inputsemantics, posing a cr…  \
  Source • arXiv cs.CL • 13:21

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G: