GenAI Daily for Practitioners — 28 May 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection: • + Achieves 94.5% data selection accuracy with a single rollout. • + No additional training or hyperparameter tuning required. • + Potential applications in robotics, autonomous systems. • SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving: • + Improves collision prediction accuracy by 12.3% compared to existing methods.
Research
- Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection \ Reinforcement learning with verifiable rewards (RLVR) can yield large reasoning gains from very few training instances, yet its strong sensitivity to which instances are used makes data selection a central bottleneck. Most existing selecti… \ Source • arXiv cs.LG • 17:38
- SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving \ Ensuring both safety and efficiency in decision-making for autonomous driving systems remains a fundamental challenge. Traditional Deep Reinforcement Learning (DRL) suffers from unsafe random exploration and slow convergence, while Large L… \ Source • arXiv cs.LG • 17:06
- PEAR: Pairwise Evaluation for Automatic Relative Scoring in Machine Translation \ We present PEAR (Pairwise Evaluation for Automatic Relative Scoring), a supervised quality estimation (QE) metric family that reframes reference-free machine translation (MT) evaluation as a graded pairwise comparison. Given a source segme… \ Source • arXiv cs.CL • 17:28
- Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution \ Modern information systems require autonomous agents capable of navigating complex workflows, yet current methodologies often struggle with the transition from structured metadata parsing to general environmental perception. While the inte… \ Source • arXiv cs.CL • 17:23
- PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective \ Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy while overlooking the retention of pretrained capabilities. We argue that PEFT… \ Source • arXiv cs.CL • 19:59
- Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models \ Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Since modern embedding models are distilled from LLM backbones, a frozen encoder should benefit from extra inference… \ Source • arXiv cs.CL • 19:49
- The Abstraction Gap in Vision-Language Causal Reasoning \ Vision-language models (VLMs) generate fluent causal explanations, but current evaluations cannot distinguish linguistic plausibility from faithful causal reasoning. We introduce a dual-probe methodology that isolates these properties. The… \ Source • arXiv cs.CL • 19:38
- Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference \ Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model peri… \ Source • arXiv cs.CL • 19:13
- MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems \ Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is … \ Source • arXiv cs.CL • 18:53
- The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic \ The GSM-Symbolic benchmark (Mirzadeh et al., 2025) reported consistent performance drops across 25 Large Language Models (LLMs) when tested on template-generated variants of GSM8K problems, concluding that the models lack genuine reasoning… \ Source • arXiv cs.CL • 18:25
- Interpretability-Guided Layer Selection over Subspace Projection: SAEs as Stethoscopes, Not Scalpels, for Raw Task Vector Model Editing \ LLMs increasingly require surgical model editing to enhance domain-specific capabilities without incurring the computational cost or catastrophic forgetting associated with full fine-tuning. Sparse Autoencoders (SAEs) have emerged as a pro… \ Source • arXiv cs.CL • 17:52
- Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents \ Recent advancements in multimodal large language models (MLLMs) have shown exceptional potential in enabling mobile-using agents to autonomously execute human instructions. However, fully automated agents often try to execute tasks even wh… \ Source • arXiv cs.CL • 17:37
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.
Don't miss what's next. Subscribe to Richard G: