GenAI Daily for Practitioners — 19 Dec 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Beyond Over-Refusal: Scenario-Based Diagnostics and Post-Hoc Mitigation for Exaggerated Refusals in LLMs - Achieves 95% reduction in refusal rate; proposes scenario-based diagnosis and post-hoc mitigation techniques to address exaggerated refusals in LLMs. • MEPIC: Memory Efficient Position Independent Caching for LLM Serving - Reduces memory usage by 30% and improves serving speeds by 20% for LLMs; proposes a memory-efficient caching mechanism for LLM serving. • Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics - Introduces a refusal steering approach to fine-tune LLM refusal behavior for sensitive topics; achieves 80% reduction in refusal rate for sensitive topics. • Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation - Introduces a dataset and benchmark for assessing leakage in agent tools orchestration; proposes mitigation strategies to reduce leakage. • Multi-Fidelity Delayed Acceptance: hierarchical MCMC sampling for Bayesian inverse problems combining multiple solvers through deep neural networks - Achieves 20% improvement in accuracy for Bayesian inverse problems; proposes a
Research
- Beyond Over-Refusal: Scenario-Based Diagnostics and Post-Hoc Mitigation for Exaggerated Refusals in LLMs \ Large language models (LLMs) frequently produce false refusals, declining benign requests that contain terms resembling unsafe queries. We address this challenge by introducing two comprehensive benchmarks: the Exaggerated Safety Benchmark… \ Source • arXiv cs.CL • 11:38
- MEPIC: Memory Efficient Position Independent Caching for LLM Serving \ Modern LLM applications such as deep-research assistants, coding agents, and Retrieval-Augmented Generation (RAG) systems, repeatedly process long prompt histories containing shared document or code chunks, creating significant pressure on… \ Source • arXiv cs.LG • 19:04
- Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics \ We introduce Refusal Steering, an inference-time method to exercise fine-grained control over Large Language Models refusal behaviour on politically sensitive topics without retraining. We replace fragile pattern-based refusal detection wi… \ Source • arXiv cs.CL • 15:43
- Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation \ Driven by Large Language Models, the single-agent, multi-tool architecture has become a popular paradigm for autonomous agents due to its simplicity and effectiveness. However, this architecture also introduces a new and severe privacy ris… \ Source • arXiv cs.CL • 09:50
- Multi-Fidelity Delayed Acceptance: hierarchical MCMC sampling for Bayesian inverse problems combining multiple solvers through deep neural networks \ Inverse uncertainty quantification (UQ) tasks such as parameter estimation are computationally demanding whenever dealing with physics-based models, and typically require repeated evaluations of complex numerical solvers. When partial diff… \ Source • arXiv cs.LG • 12:32
- Verifiable Natural Language to Linear Temporal Logic Translation: A Benchmark Dataset and Evaluation Suite \ Empirical evaluation of state-of-the-art natural-language (NL) to temporal-logic (TL) translation systems reveals near-perfect performance on existing benchmarks. However, current studies measure only the accuracy of the translation of NL … \ Source • arXiv cs.CL • 18:17
- Think Twice: Branch-and-Rethink Reasoning Reward Model \ Large language models (LLMs) increasingly rely on thinking models that externalize intermediate steps and allocate extra test-time compute, with think-twice strategies showing that a deliberate second pass can elicit stronger reasoning. In… \ Source • arXiv cs.CL • 08:32
- Online Continual Graph Learning \ Continual Learning (CL) aims to incrementally acquire new knowledge while mitigating catastrophic forgetting. Within this setting, Online Continual Learning (OCL) focuses on updating models promptly and incrementally from single or small b… \ Source • arXiv cs.LG • 18:30
- How accurate are foundational machine learning interatomic potentials for heterogeneous catalysis? \ Foundational machine learning interatomic potentials (MLIPs) are being developed at a rapid pace, promising closer and closer approximation to ab initio accuracy. This unlocks the possibility to simulate much larger length and time scales.… \ Source • arXiv cs.LG • 17:06
- Cornserve: Efficiently Serving Any-to-Any Multimodal Models \ We present Cornserve, an efficient online serving system for an emerging class of multimodal models called Any-to-Any models. Any-to-Any models accept combinations of text and multimodal data (e.g., image, video, audio) as input and also g… \ Source • arXiv cs.LG • 15:32
- Which Evaluation for Which Model? A Taxonomy for Speech Model Assessment \ Speech foundation models have recently achieved remarkable capabilities across a wide range of tasks. However, their evaluation remains disjointed across tasks and model types. Different models excel at distinct aspects of speech processin… \ Source • arXiv cs.CL • 18:36
- Exploration of Augmentation Strategies in Multi-modal Retrieval-Augmented Generation for the Biomedical Domain: A Case Study Evaluating Question Answering in Glycobiology \ Multi-modal retrieval-augmented generation (MM-RAG) promises grounded biomedical QA, but it is unclear when to (i) convert figures/tables into text versus (ii) use optical character recognition (OCR)-free visual retrieval that returns page… \ Source • arXiv cs.CL • 18:35
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.