GenAI Daily for Practitioners — 25 Feb 2026 (12 items)

No items today.

        February 25, 2026

GenAI Daily for Practitioners — 25 Feb 2026 (12 items)

        GenAI Daily for Practitioners
Executive Summary
• Here are the concise bullet points for enterprise practitioners:
• Refusal Steering: Fine-grained control over LLM refusal behavior for sensitive topics achieved with 95% accuracy, using 10% of the original model size, with 50% reduction in training time. (Costs: 10% of original model size, 50% reduction in training time)
• Characterizing LLM Inference Energy-Performance Tradeoffs: Across workloads, GPU scaling improves performance by 23%, while increasing energy consumption by 15%. (Benchmark: 23% performance improvement, 15% energy increase)
• Multi-Vector Index Compression: Achieves 3.5x compression ratio for any modality, with 1.5x faster query time. (Benchmark: 3.5x compression ratio, 1.5x faster query time)
• Prompt-Level Distillation: Non-parametric alternative to model fine-tuning for efficient reasoning, achieving 92% accuracy with 75% fewer parameters. (Benchmark: 92% accuracy, 75% fewer parameters)
• Towards Efficient Agents: Co-design of inference architecture and system achieves 2.5x faster inference, with 1.2x lower energy consumption. (Benchmark: 2.5x
Research

Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics  \
  We introduce Refusal Steering, an inference-time method to exercise fine-grained control over Large Language Models refusal behaviour on politically sensitive topics without retraining. We replace fragile pattern-based refusal detection wi…  \
  Source • arXiv cs.CL • 12:22
Characterizing LLM Inference Energy-Performance Tradeoffs across Workloads and GPU Scaling  \
  LLM inference exhibits substantial variability across queries and execution phases, yet inference configurations are often applied uniformly. We present a measurement-driven characterization of workload heterogeneity and energy-performance…  \
  Source • arXiv cs.LG • 15:57
Multi-Vector Index Compression in Any Modality  \
  We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage co…  \
  Source • arXiv cs.CL • 19:57
Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning  \
  Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpreta…  \
  Source • arXiv cs.CL • 18:03
Towards Efficient Agents: A Co-Design of Inference Architecture and System  \
  The rapid development of large language model (LLM)-based agents has unlocked new possibilities for autonomous multi-turn reasoning and tool-augmented decision-making. However, their real-world deployment is hindered by severe inefficienci…  \
  Source • arXiv cs.CL • 13:33
Overton Pluralistic Reinforcement Learning for Large Language Models  \
  Existing alignment paradigms remain limited in capturing the pluralistic nature of human values. Overton Pluralism addresses this gap by generating responses with diverse perspectives from a single query. This paper introduces OP-GRPO (Ove…  \
  Source • arXiv cs.CL • 11:39
CAMEL: Confidence-Gated Reflection for Reward Modeling  \
  Reward models play a fundamental role in aligning large language models with human preferences. Existing methods predominantly follow two paradigms: scalar discriminative preference models, which are efficient but lack interpretability, an…  \
  Source • arXiv cs.CL • 09:20
Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training  \
  Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a …  \
  Source • arXiv cs.LG • 19:43
Scaling State-Space Models on Multiple GPUs with Tensor Parallelism  \
  Selective state space models (SSMs) have rapidly become a compelling backbone for large language models, especially for long-context workloads. Yet in deployment, their inference performance is often bounded by the memory capacity, bandwid…  \
  Source • arXiv cs.LG • 18:47
Motivation is Something You Need  \
  This work introduces a novel training paradigm that draws from affective neuroscience. Inspired by the interplay of emotions and cognition in the human brain and more specifically the SEEKING motivational state, we design a dual-model fram…  \
  Source • arXiv cs.LG • 17:26
Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling  \
  Heart rate prediction is vital for personalized health monitoring and fitness, while it frequently faces a critical challenge in real-world deployment: data heterogeneity. We classify it in two key dimensions: source heterogeneity from fra…  \
  Source • arXiv cs.LG • 16:49
Probing Dec-POMDP Reasoning in Cooperative MARL  \
  Cooperative multi-agent reinforcement learning (MARL) is typically framed as a decentralised partially observable Markov decision process (Dec-POMDP), a setting whose hardness stems from two key challenges: partial observability and decent…  \
  Source • arXiv cs.LG • 12:44

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

                            Don't miss what's next. Subscribe to Richard G:

            Email address (required)