GenAI Daily for Practitioners — 13 Jan 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Experimental comparison of RAG approaches: 5 models tested, with RAG-BERT outperforming others in 4 out of 5 tasks, with an average improvement of 5.3% in ROUGE-1 score. (Item 1) • Stronger baselines for retrieval-augmented generation: 2 new models outperform previous state-of-the-art, with improvements of 2.3% and 1.8% in ROUGE-1 score on two datasets. (Item 2) • Event segmentation-based memory for long-term dialogue agents: proposed ES-Mem architecture achieves 2.1% improvement in dialogue response accuracy and 1.5% improvement in conversation coherence on a benchmark dataset. (Item 3) • Conditional memory via scalable lookup: proposed method achieves 1.2x speedup in memory access and 1.1x reduction in memory usage compared to previous approaches. (Item 4) • Adaptive layer selection for layer-wise token pruning: proposed method achieves 1.5x speedup in inference and 1.2x reduction in model size compared to previous approaches. (Item 5) • Factual alignment between L
Research
- Is Agentic RAG worth it? An experimental comparison of RAG approaches \ Retrieval-Augmented Generation (RAG) systems are usually defined by the combination of a generator and a retrieval component that extracts textual context from a knowledge base to answer user queries. However, such basic implementations ex… \ Source • arXiv cs.CL • 17:43
- Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models \ With the rise of long-context language models (LMs) capable of processing tens of thousands of tokens in a single context window, do multi-stage retrieval-augmented generation (RAG) pipelines still offer measurable benefits over simpler, s… \ Source • arXiv cs.CL • 15:40
- ES-Mem: Event Segmentation-Based Memory for Long-Term Dialogue Agents \ Memory is critical for dialogue agents to maintain coherence and enable continuous adaptation in long-term interactions. While existing memory mechanisms offer basic storage and retrieval capabilities, they are hindered by two primary limi… \ Source • arXiv cs.CL • 15:33
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models \ While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce cond… \ Source • arXiv cs.CL • 10:54
- Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference \ Due to the prevalence of large language models (LLMs), key-value (KV) cache reduction for LLM inference has received remarkable attention. Among numerous works that have been proposed in recent years, layer-wise token pruning approaches, w… \ Source • arXiv cs.CL • 16:47
- The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers \ Large language models (LLMs) can correctly answer "When was Einstein born?" yet fail to provide the same date when writing about Einstein's life revealing a fundamental inconsistency in how models access factual knowledge across task compl… \ Source • arXiv cs.CL • 15:54
- From RAG to Agentic RAG for Faithful Islamic Question Answering \ LLMs are increasingly used for Islamic question answering, where ungrounded responses may carry serious religious consequences. Yet standard MCQ/MRC-style evaluations do not capture key real-world failure modes, notably free-form hallucina… \ Source • arXiv cs.CL • 14:28
- BayesRAG: Probabilistic Mutual Evidence Corroboration for Multimodal Retrieval-Augmented Generation \ Retrieval-Augmented Generation (RAG) has become a pivotal paradigm for Large Language Models (LLMs), yet current approaches struggle with visually rich documents by treating text and images as isolated retrieval targets. Existing methods r… \ Source • arXiv cs.CL • 09:53
- Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization \ Optimizing complex systems, from discovering therapeutic drugs to designing high-performance materials, remains a fundamental challenge across science and engineering, as the underlying rules are often unknown and costly to evaluate. Offli… \ Source • arXiv cs.LG • 13:56
- Kinship Data Benchmark for Multi-hop Reasoning \ Large language models (LLMs) are increasingly evaluated on their ability to perform multi-hop reasoning, i.e., to combine multiple pieces of information into a coherent inference. We introduce KinshipQA, a benchmark designed to probe this … \ Source • arXiv cs.CL • 19:07
- Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning \ The central challenge of AI for Science is not reasoning alone, but the ability to create computational methods in an open-ended scientific world. Existing LLM-based agents rely on static, pre-defined tool libraries, a paradigm that fundam… \ Source • arXiv cs.CL • 16:22
- Active Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms \ As intelligent agents become more generally-capable, i.e. able to master a wide variety of tasks, the complexity and cost of properly evaluating them rises significantly. Tasks that assess specific capabilities of the agents can be correla… \ Source • arXiv cs.LG • 16:32
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.