GenAI Daily for Practitioners — 28 May 2026 (12 items)

No items today.

        May 28, 2026

GenAI Daily for Practitioners — 28 May 2026 (12 items)

        GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection:
• + Achieves 94.5% data selection accuracy with a single rollout.
• + No additional training or hyperparameter tuning required.
• + Potential applications in robotics, autonomous systems.
• SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving:
• + Improves collision prediction accuracy by 12.3% compared to existing methods.
Research

Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection  \
  Reinforcement learning with verifiable rewards (RLVR) can yield large reasoning gains from very few training instances, yet its strong sensitivity to which instances are used makes data selection a central bottleneck. Most existing selecti…  \
  Source • arXiv cs.LG • 17:38
SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving  \
  Ensuring both safety and efficiency in decision-making for autonomous driving systems remains a fundamental challenge. Traditional Deep Reinforcement Learning (DRL) suffers from unsafe random exploration and slow convergence, while Large L…  \
  Source • arXiv cs.LG • 17:06
PEAR: Pairwise Evaluation for Automatic Relative Scoring in Machine Translation  \
  We present PEAR (Pairwise Evaluation for Automatic Relative Scoring), a supervised quality estimation (QE) metric family that reframes reference-free machine translation (MT) evaluation as a graded pairwise comparison. Given a source segme…  \
  Source • arXiv cs.CL • 17:28
Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution  \
  Modern information systems require autonomous agents capable of navigating complex workflows, yet current methodologies often struggle with the transition from structured metadata parsing to general environmental perception. While the inte…  \
  Source • arXiv cs.CL • 17:23
PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective  \
  Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy while overlooking the retention of pretrained capabilities. We argue that PEFT…  \
  Source • arXiv cs.CL • 19:59
Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models  \
  Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Since modern embedding models are distilled from LLM backbones, a frozen encoder should benefit from extra inference…  \
  Source • arXiv cs.CL • 19:49
The Abstraction Gap in Vision-Language Causal Reasoning  \
  Vision-language models (VLMs) generate fluent causal explanations, but current evaluations cannot distinguish linguistic plausibility from faithful causal reasoning. We introduce a dual-probe methodology that isolates these properties. The…  \
  Source • arXiv cs.CL • 19:38
Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference  \
  Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model peri…  \
  Source • arXiv cs.CL • 19:13
MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems  \
  Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is …  \
  Source • arXiv cs.CL • 18:53
The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic  \
  The GSM-Symbolic benchmark (Mirzadeh et al., 2025) reported consistent performance drops across 25 Large Language Models (LLMs) when tested on template-generated variants of GSM8K problems, concluding that the models lack genuine reasoning…  \
  Source • arXiv cs.CL • 18:25
Interpretability-Guided Layer Selection over Subspace Projection: SAEs as Stethoscopes, Not Scalpels, for Raw Task Vector Model Editing  \
  LLMs increasingly require surgical model editing to enhance domain-specific capabilities without incurring the computational cost or catastrophic forgetting associated with full fine-tuning. Sparse Autoencoders (SAEs) have emerged as a pro…  \
  Source • arXiv cs.CL • 17:52
Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents  \
  Recent advancements in multimodal large language models (MLLMs) have shown exceptional potential in enabling mobile-using agents to autonomously execute human instructions. However, fully automated agents often try to execute tasks even wh…  \
  Source • arXiv cs.CL • 17:37

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

                                Don't miss what's next. Subscribe to Richard G:

            Email address (required)