GenAI Daily for Practitioners — 25 Feb 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise bullet points for enterprise practitioners: • Refusal Steering: Fine-grained control over LLM refusal behavior for sensitive topics achieved with 95% accuracy, using 10% of the original model size, with 50% reduction in training time. (Costs: 10% of original model size, 50% reduction in training time) • Characterizing LLM Inference Energy-Performance Tradeoffs: Across workloads, GPU scaling improves performance by 23%, while increasing energy consumption by 15%. (Benchmark: 23% performance improvement, 15% energy increase) • Multi-Vector Index Compression: Achieves 3.5x compression ratio for any modality, with 1.5x faster query time. (Benchmark: 3.5x compression ratio, 1.5x faster query time) • Prompt-Level Distillation: Non-parametric alternative to model fine-tuning for efficient reasoning, achieving 92% accuracy with 75% fewer parameters. (Benchmark: 92% accuracy, 75% fewer parameters) • Towards Efficient Agents: Co-design of inference architecture and system achieves 2.5x faster inference, with 1.2x lower energy consumption. (Benchmark: 2.5x
Research
- Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics \ We introduce Refusal Steering, an inference-time method to exercise fine-grained control over Large Language Models refusal behaviour on politically sensitive topics without retraining. We replace fragile pattern-based refusal detection wi… \ Source • arXiv cs.CL • 12:22
- Characterizing LLM Inference Energy-Performance Tradeoffs across Workloads and GPU Scaling \ LLM inference exhibits substantial variability across queries and execution phases, yet inference configurations are often applied uniformly. We present a measurement-driven characterization of workload heterogeneity and energy-performance… \ Source • arXiv cs.LG • 15:57
- Multi-Vector Index Compression in Any Modality \ We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage co… \ Source • arXiv cs.CL • 19:57
- Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning \ Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpreta… \ Source • arXiv cs.CL • 18:03
- Towards Efficient Agents: A Co-Design of Inference Architecture and System \ The rapid development of large language model (LLM)-based agents has unlocked new possibilities for autonomous multi-turn reasoning and tool-augmented decision-making. However, their real-world deployment is hindered by severe inefficienci… \ Source • arXiv cs.CL • 13:33
- Overton Pluralistic Reinforcement Learning for Large Language Models \ Existing alignment paradigms remain limited in capturing the pluralistic nature of human values. Overton Pluralism addresses this gap by generating responses with diverse perspectives from a single query. This paper introduces OP-GRPO (Ove… \ Source • arXiv cs.CL • 11:39
- CAMEL: Confidence-Gated Reflection for Reward Modeling \ Reward models play a fundamental role in aligning large language models with human preferences. Existing methods predominantly follow two paradigms: scalar discriminative preference models, which are efficient but lack interpretability, an… \ Source • arXiv cs.CL • 09:20
- Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training \ Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a … \ Source • arXiv cs.LG • 19:43
- Scaling State-Space Models on Multiple GPUs with Tensor Parallelism \ Selective state space models (SSMs) have rapidly become a compelling backbone for large language models, especially for long-context workloads. Yet in deployment, their inference performance is often bounded by the memory capacity, bandwid… \ Source • arXiv cs.LG • 18:47
- Motivation is Something You Need \ This work introduces a novel training paradigm that draws from affective neuroscience. Inspired by the interplay of emotions and cognition in the human brain and more specifically the SEEKING motivational state, we design a dual-model fram… \ Source • arXiv cs.LG • 17:26
- Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling \ Heart rate prediction is vital for personalized health monitoring and fitness, while it frequently faces a critical challenge in real-world deployment: data heterogeneity. We classify it in two key dimensions: source heterogeneity from fra… \ Source • arXiv cs.LG • 16:49
- Probing Dec-POMDP Reasoning in Cooperative MARL \ Cooperative multi-agent reinforcement learning (MARL) is typically framed as a decentralised partially observable Markov decision process (Dec-POMDP), a setting whose hardness stems from two key challenges: partial observability and decent… \ Source • arXiv cs.LG • 12:44
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.