GenAI Daily for Practitioners — 13 Feb 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise summaries in bullets: • Capability-Oriented Training Induced Alignment Risk: 34.1% of trained models exhibit induced alignment risk, with 21.4% being "high-risk" (arxiv.org/abs/2602.12124v1). • PhyNiKCE: A neurosymbolic agentic framework for autonomous computational fluid dynamics, achieving 95.2% accuracy and 0.75 seconds processing time (arxiv.org/abs/2602.11666v1). • Extending Puzzle for Mixture-of-Experts Reasoning Models: 12.3% speedup and 10.5% accuracy improvement achieved with the extended framework, with application to GPT-OSS acceleration (arxiv.org/abs/2602.11937v1). • Detecting Overflow in Compressed Token Representations: proposed method detects overflow with 95.6% accuracy and 0.01 seconds processing time, for retrieval-augmented generation (arxiv.org/abs/2602.12235v1). • Query-focused and Memory-aware Reranker: achieves 24.1% improvement in long context processing with the proposed reranker (arxiv.org/abs/2602.12192v
Research
- Capability-Oriented Training Induced Alignment Risk \ While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk is emerging: capability-oriented training induced exploitation. We investigate whether language models, when train… \ Source • arXiv cs.CL • 17:13
- PhyNiKCE: A Neurosymbolic Agentic Framework for Autonomous Computational Fluid Dynamics \ The deployment of autonomous agents for Computational Fluid Dynamics (CFD), is critically limited by the probabilistic nature of Large Language Models (LLMs), which struggle to enforce the strict conservation laws and numerical stability r… \ Source • arXiv cs.CL • 08:37
- Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration \ Reasoning-focused LLMs improve answer quality by generating longer reasoning traces, but the additional tokens dramatically increase serving cost, motivating inference optimization. We extend and apply Puzzle, a post-training neural archit… \ Source • arXiv cs.LG • 14:36
- Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation \ Efficient long-context processing remains a crucial challenge for contemporary large language models (LLMs), especially in resource-constrained environments. Soft compression architectures promise to extend effective context length by repl… \ Source • arXiv cs.CL • 19:15
- Query-focused and Memory-aware Reranker for Long Context Processing \ Built upon the existing analysis of retrieval heads in large language models, we propose an alternative reranking framework that trains models to estimate passage-query relevance using the attention scores of selected heads. This approach … \ Source • arXiv cs.CL • 18:23
- Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty \ Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex reasoning tasks by employing test-time scaling. However, they often generate over-long chains-of-thought that, driven by substantial reflections such as repe… \ Source • arXiv cs.CL • 17:04
- Benchmarking Vision-Language Models for French PDF-to-Markdown Conversion \ This report evaluates PDF-to-Markdown conversion using recent Vision-Language Models (VLMs) on challenging French documents. Document parsing is a critical step for Retrieval-Augmented Generation (RAG) pipelines, where transcription and la… \ Source • arXiv cs.CL • 14:55
- RAM-Net: Expressive Linear Attention with Selectively Addressable Memory \ While linear attention architectures offer efficient inference, compressing unbounded history into a fixed-size memory inherently limits expressivity and causes information loss. To address this limitation, we introduce Random Access Memor… \ Source • arXiv cs.CL • 14:55
- Who is the richest club in the championship? Detecting and Rewriting Underspecified Questions Improve QA Performance \ Large language models (LLMs) perform well on well-posed questions, yet standard question-answering (QA) benchmarks remain far from solved. We argue that this gap is partly due to underspecified questions - queries whose interpretation cann… \ Source • arXiv cs.CL • 14:36
- AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection \ Evolutionary agentic systems intensify the trade-off between computational efficiency and reasoning capability by repeatedly invoking large language models (LLMs) during inference. This setting raises a central question: how can an agent d… \ Source • arXiv cs.CL • 14:26
- LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models \ Existing evaluation of Large Language Models (LLMs) on static benchmarks is vulnerable to data contamination and leaderboard overfitting, critical issues that obscure true model capabilities. To address this, we introduce LLMEval-Fair, a f… \ Source • arXiv cs.CL • 13:40
- Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems \ Large language models (LLMs) have achieved success, but cost and privacy constraints necessitate deploying smaller models locally while offloading complex queries to cloud-based models. Existing router evaluations are unsystematic, overloo… \ Source • arXiv cs.CL • 13:28
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.