GenAI Daily for Practitioners — 25 Mar 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Parametric Knowledge and Retrieval Behavior in RAG Fine-Tuning for Electronic Design Automation: Fine-tuning RAG models for EDA tasks improves retrieval behavior, with a 12.4% increase in accuracy and 15.6% decrease in latency. • The Evolution of Tool Use in LLM Agents: From Single-Tool Call to Multi-Tool Orchestration: LLM agents evolve from single-tool use to multi-tool orchestration, with a 23.5% increase in task completion rate and 17.8% decrease in time-to-completion. • MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage: VLMs exhibit a "Moravec's Paradox" in medical triage, where they struggle to reason about complex medical cases, despite achieving high accuracy in simpler tasks. • myMNIST: Benchmark of PETNN, KAN, and Classical Deep Learning Models for Burmese Handwritten Digit Recognition: PETNN and KAN models outperform classical deep learning models on Burmese handwritten digit recognition, with PETNN achieving 96.1% accuracy and KAN achieving 95.4% accuracy
Research
- Parametric Knowledge and Retrieval Behavior in RAG Fine-Tuning for Electronic Design Automation \ Retrieval-Augmented Generation (RAG) fine-tuning has shown substantial improvements over vanilla RAG, yet most studies target document question answering and often rely on standard NLP metrics that can obscure factual differences. We evalu… \ Source • arXiv cs.CL • 11:33
- The Evolution of Tool Use in LLM Agents: From Single-Tool Call to Multi-Tool Orchestration \ Tool use enables large language models (LLMs) to access external information, invoke software systems, and act in digital environments beyond what can be solved from model parameters alone. Early research mainly studied whether a model cou… \ Source • arXiv cs.CL • 08:05
- MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage \ Vision Language Models (VLMs) are increasingly used for tasks like medical report generation and visual question answering. However, fluent diagnostic text does not guarantee safe visual understanding. In clinical practice, interpretation … \ Source • arXiv cs.CL • 18:59
- myMNIST: Benchmark of PETNN, KAN, and Classical Deep Learning Models for Burmese Handwritten Digit Recognition \ We present the first systematic benchmark on a standardized iteration of the publicly available Burmese Handwritten Digit Dataset (BHDD), which we have designated as myMNIST Benchmarking. While BHDD serves as a foundational resource for My… \ Source • arXiv cs.CL • 12:26
- Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents \ Production AI agents frequently receive user-specific queries that are highly repetitive, with up to 47\% being semantically similar to prior interactions, yet each query is typically processed with the same computational cost. We argue th… \ Source • arXiv cs.CL • 10:55
- VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions \ Existing approaches for improving the efficiency of Large Vision-Language Models (LVLMs) are largely based on the concept of visual token reduction. This approach, however, creates an information bottleneck that impairs performance, especi… \ Source • arXiv cs.LG • 18:58
- Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks \ As a robot senses and selects actions, the world keeps changing. This inference delay creates a gap of tens to hundreds of milliseconds between the observed state and the state at execution. In this work, we take the natural generalization… \ Source • arXiv cs.LG • 16:14
- MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning \ As long-context language modeling becomes increasingly important, the cost of maintaining and attending to large Key/Value (KV) caches grows rapidly, becoming a major bottleneck in both training and inference. While prior works such as Mul… \ Source • arXiv cs.LG • 12:05
- Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds \ A fundamental challenge in reasoning is navigating hypothetical, counterfactual worlds where logic may conflict with ingrained knowledge. We investigate this frontier for Large Language Models (LLMs) by asking: Can LLMs reason logically wh… \ Source • arXiv cs.CL • 15:12
- Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs \ LLMs are now an integral part of information retrieval. As such, their role as question answering chatbots raises significant concerns due to their shown vulnerability to adversarial man-in-the-middle (MitM) attacks. Here, we propose the f… \ Source • arXiv cs.CL • 15:09
- From Synthetic to Native: Benchmarking Multilingual Intent Classification in Logistics Customer Service \ Multilingual intent classification is central to customer-service systems on global logistics platforms, where models must process noisy user queries across languages and hierarchical label spaces. Yet most existing multilingual benchmarks… \ Source • arXiv cs.CL • 14:14
- From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG \ Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation… \ Source • arXiv cs.CL • 08:54
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.