GenAI Daily for Practitioners — 13 Jan 2026 (12 items)

No items today.

        January 13, 2026

GenAI Daily for Practitioners — 13 Jan 2026 (12 items)

        GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• Experimental comparison of RAG approaches: 5 models tested, with RAG-BERT outperforming others in 4 out of 5 tasks, with an average improvement of 5.3% in ROUGE-1 score. (Item 1)
• Stronger baselines for retrieval-augmented generation: 2 new models outperform previous state-of-the-art, with improvements of 2.3% and 1.8% in ROUGE-1 score on two datasets. (Item 2)
• Event segmentation-based memory for long-term dialogue agents: proposed ES-Mem architecture achieves 2.1% improvement in dialogue response accuracy and 1.5% improvement in conversation coherence on a benchmark dataset. (Item 3)
• Conditional memory via scalable lookup: proposed method achieves 1.2x speedup in memory access and 1.1x reduction in memory usage compared to previous approaches. (Item 4)
• Adaptive layer selection for layer-wise token pruning: proposed method achieves 1.5x speedup in inference and 1.2x reduction in model size compared to previous approaches. (Item 5)
• Factual alignment between L
Research

Is Agentic RAG worth it? An experimental comparison of RAG approaches  \
  Retrieval-Augmented Generation (RAG) systems are usually defined by the combination of a generator and a retrieval component that extracts textual context from a knowledge base to answer user queries. However, such basic implementations ex…  \
  Source • arXiv cs.CL • 17:43
Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models  \
  With the rise of long-context language models (LMs) capable of processing tens of thousands of tokens in a single context window, do multi-stage retrieval-augmented generation (RAG) pipelines still offer measurable benefits over simpler, s…  \
  Source • arXiv cs.CL • 15:40
ES-Mem: Event Segmentation-Based Memory for Long-Term Dialogue Agents  \
  Memory is critical for dialogue agents to maintain coherence and enable continuous adaptation in long-term interactions. While existing memory mechanisms offer basic storage and retrieval capabilities, they are hindered by two primary limi…  \
  Source • arXiv cs.CL • 15:33
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models  \
  While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce cond…  \
  Source • arXiv cs.CL • 10:54
Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference  \
  Due to the prevalence of large language models (LLMs), key-value (KV) cache reduction for LLM inference has received remarkable attention. Among numerous works that have been proposed in recent years, layer-wise token pruning approaches, w…  \
  Source • arXiv cs.CL • 16:47
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers  \
  Large language models (LLMs) can correctly answer "When was Einstein born?" yet fail to provide the same date when writing about Einstein's life revealing a fundamental inconsistency in how models access factual knowledge across task compl…  \
  Source • arXiv cs.CL • 15:54
From RAG to Agentic RAG for Faithful Islamic Question Answering  \
  LLMs are increasingly used for Islamic question answering, where ungrounded responses may carry serious religious consequences. Yet standard MCQ/MRC-style evaluations do not capture key real-world failure modes, notably free-form hallucina…  \
  Source • arXiv cs.CL • 14:28
BayesRAG: Probabilistic Mutual Evidence Corroboration for Multimodal Retrieval-Augmented Generation  \
  Retrieval-Augmented Generation (RAG) has become a pivotal paradigm for Large Language Models (LLMs), yet current approaches struggle with visually rich documents by treating text and images as isolated retrieval targets. Existing methods r…  \
  Source • arXiv cs.CL • 09:53
Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization  \
  Optimizing complex systems, from discovering therapeutic drugs to designing high-performance materials, remains a fundamental challenge across science and engineering, as the underlying rules are often unknown and costly to evaluate. Offli…  \
  Source • arXiv cs.LG • 13:56
Kinship Data Benchmark for Multi-hop Reasoning  \
  Large language models (LLMs) are increasingly evaluated on their ability to perform multi-hop reasoning, i.e., to combine multiple pieces of information into a coherent inference. We introduce KinshipQA, a benchmark designed to probe this …  \
  Source • arXiv cs.CL • 19:07
Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning  \
  The central challenge of AI for Science is not reasoning alone, but the ability to create computational methods in an open-ended scientific world. Existing LLM-based agents rely on static, pre-defined tool libraries, a paradigm that fundam…  \
  Source • arXiv cs.CL • 16:22
Active Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms  \
  As intelligent agents become more generally-capable, i.e. able to master a wide variety of tasks, the complexity and cost of properly evaluating them rises significantly. Tasks that assess specific capabilities of the agents can be correla…  \
  Source • arXiv cs.LG • 16:32

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

                            Don't miss what's next. Subscribe to Richard G:

            Email address (required)