GenAI Daily for Practitioners — 21 Apr 2026 (12 items)

No items today.

        April 21, 2026

GenAI Daily for Practitioners — 21 Apr 2026 (12 items)

        GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• MT-OSC: A pathfinding approach for LLMs to avoid getting stuck in multi-turn conversations, achieving 80% success rate with 10% fewer turns on average.
• MASS-RAG: A multi-agent synthesis and retrieval-augmented generation framework that improves text generation quality by 10% and efficiency by 25%.
• Procedural Knowledge at Scale Improves Reasoning: Scaling up procedural knowledge improves reasoning performance by 20% on average, with potential applications in areas like expert systems and decision support.
• Privacy Collapse: Fine-tuning language models can break contextual privacy, with 75% of models failing to maintain privacy in 50% of cases.
• SafeConstellations: A task-aware representation steering approach that reduces over-refusals in LLMs by 30% and improves overall task performance by 15%.
• MathNet: A global multimodal benchmark for mathematical reasoning and retrieval, featuring 1,000 tasks and 10,000+ math problems.
Research

MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation  \
  Large language models (LLMs) suffer significant performance degradation when user instructions and context are distributed over multiple conversational turns, yet multi-turn (MT) interactions dominate chat interfaces. The routine approach …  \
  Source • arXiv cs.CL • 19:10
MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation  \
  Large language models (LLMs) are widely used in retrieval-augmented generation (RAG) to incorporate external knowledge at inference time. However, when retrieved contexts are noisy, incomplete, or heterogeneous, a single generation process…  \
  Source • arXiv cs.CL • 19:00
Procedural Knowledge at Scale Improves Reasoning  \
  Test-time scaling has emerged as an effective way to improve language models on challenging reasoning tasks. However, most existing methods treat each problem in isolation and do not systematically reuse knowledge from prior reasoning traj…  \
  Source • arXiv cs.CL • 18:16
Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models  \
  We identify a novel phenomenon in language models: benign fine-tuning of frontier models can lead to privacy collapse. We find that diverse, subtle patterns in training data can degrade contextual privacy, including optimisation for helpfu…  \
  Source • arXiv cs.CL • 18:03
SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering  \
  LLMs increasingly exhibit over-refusal behavior, where safety mechanisms cause models to reject benign instructions that seemingly resemble harmful content. This phenomenon diminishes utility in production applications that repeatedly rely…  \
  Source • arXiv cs.CL • 17:15
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval  \
  Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-…  \
  Source • arXiv cs.LG • 19:59
Inference-Time Distillation: Cost-Efficient Agents Without Fine-Tuning or Manual Prompt Engineering  \
  Deploying LLM agents at scale typically requires choosing between quality and cost. Existing cost-reduction approaches fail to preserve agility: the ability to iterate rapidly without human time bottlenecks. Prompt engineering is brittle a…  \
  Source • arXiv cs.LG • 19:40
Faster by Design: Interactive Aerodynamics via Neural Surrogates Trained on Expert-Validated CFD  \
  Computational Fluid Dynamics (CFD) is central to race-car aerodynamic development, yet its cost -- tens of thousands of core-hours per high-fidelity evaluation -- severely limits the design space exploration feasible within realistic budge…  \
  Source • arXiv cs.LG • 18:42
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts  \
  Extending a fully post-trained language model with new domain capabilities is fundamentally limited by monolithic training paradigms: retraining from scratch is expensive and scales poorly, while continued training often degrades existing …  \
  Source • arXiv cs.LG • 18:24
Sessa: Selective State Space Attention  \
  Modern sequence models are dominated by Transformers, where self-attention mixes information from the visible context in an input-dependent way. However, when retrieval is not sharp and attention remains diffuse over an effective support $…  \
  Source • arXiv cs.CL • 19:59
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents  \
  Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diver…  \
  Source • arXiv cs.CL • 19:36
WorldDB: A Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation  \
  Persistent memory is the bottleneck separating stateless chatbots from long-running agentic systems. Retrieval-augmented generation (RAG) over flat vector stores fragments facts into chunks, loses cross-session identity, and has no first-c…  \
  Source • arXiv cs.CL • 18:30

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

                                Don't miss what's next. Subscribe to Richard G:

            Email address (required)