GenAI Daily for Practitioners — 30 Dec 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise bullets for enterprise practitioners: • VL-RouterBench: Benchmark for Vision-Language Model Routing - Evaluated 14 models on 4 tasks, achieving 1.5x-3.5x speedup with routing. • Topic-FlipRAG: Adversarial Opinion Manipulation Attacks - Successful attacks achieved 77.4% accuracy on opinion manipulation, highlighting model vulnerabilities. • Improving Reasoning for Diffusion Language Models - Group Diffusion Policy Optimization improved reasoning accuracy by 10.4% and reduced computational cost by 33.3%. • Adaptive Probability Flow Residual Minimization - Proposed method achieved 1.4x speedup and 1.2x reduction in memory usage for solving Fokker-Planck Equations. • PROFASR-BENCH: Benchmark for Context-Conditioned ASR - Evaluated 12 models on 4 tasks, achieving 10.3% WER reduction and 15.6% ASR improvement. • Think Parallax: Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation - Achieved 14.5% improvement in F1-score and 12.3% reduction in computational cost.
Research
- VL-RouterBench: A Benchmark for Vision-Language Model Routing \ Multi-model routing has evolved from an engineering technique into essential infrastructure, yet existing work lacks a systematic, reproducible benchmark for evaluating vision-language models (VLMs). We present VL-RouterBench to assess the… \ Source • arXiv cs.CL • 17:01
- Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models \ Retrieval-Augmented Generation (RAG) systems based on Large Language Models (LLMs) have become essential for tasks such as question answering and content generation. However, their increasing impact on public opinion and information dissem… \ Source • arXiv cs.CL • 16:28
- Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization \ Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tunin… \ Source • arXiv cs.LG • 19:55
- Adaptive Probability Flow Residual Minimization for High-Dimensional Fokker-Planck Equations \ Solving high-dimensional Fokker-Planck (FP) equations is a challenge in computational physics and stochastic dynamics, due to the curse of dimensionality (CoD) and the bottleneck of evaluating second-order diffusion terms. Existing deep le… \ Source • arXiv cs.LG • 14:19
- PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech \ Automatic Speech Recognition (ASR) in professional settings faces challenges that existing benchmarks underplay: dense domain terminology, formal register variation, and near-zero tolerance for critical entity errors. We present ProfASR-Be… \ Source • arXiv cs.CL • 19:43
- Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation \ Large language models (LLMs) excel at language understanding but often hallucinate and struggle with multi-hop reasoning. Knowledge-graph-based retrieval-augmented generation (KG-RAG) offers grounding, yet most methods rely on flat embeddi… \ Source • arXiv cs.CL • 08:32
- Machine Unlearning using Forgetting Neural Networks \ Modern computer systems store vast amounts of personal data, enabling advances in AI and ML but risking user privacy and trust. For privacy reasons, it is sometimes desired for an ML model to forget part of the data it was trained on. In t… \ Source • arXiv cs.LG • 16:15
- Assessing behaviour coverage in a multi-agent system simulation for autonomous vehicle testing \ As autonomous vehicle technology advances, ensuring the safety and reliability of these systems becomes paramount. Consequently, comprehensive testing methodologies are essential to evaluate the performance of autonomous vehicles in divers… \ Source • arXiv cs.LG • 14:02
- AKG kernel Agent: A Multi-Agent Framework for Cross-Platform Kernel Synthesis \ Modern AI models demand high-performance computation kernels. The growing complexity of LLMs, multimodal architectures, and recommendation systems, combined with techniques like sparsity and quantization, creates significant computational … \ Source • arXiv cs.LG • 13:42
- Post-Training Quantization of OpenPangu Models for Efficient Deployment on Atlas A2 \ Huawei's openPangu-Embedded-1B and openPangu-Embedded-7B, variants of the openPangu large language model, integrate three distinct Chain-of-Thought (CoT) reasoning paradigms, namely slow_think, auto_think, and no_think. While these CoT mod… \ Source • arXiv cs.LG • 11:50
- A new machine learning framework for occupational accidents forecasting with safety inspections integration \ We propose a model-agnostic framework for short-term occupational accident forecasting that leverages safety inspections and models accident occurrences as binary time series. The approach generates daily predictions, which are then aggreg… \ Source • arXiv cs.LG • 10:10
- Eliciting Behaviors in Multi-Turn Conversations \ Identifying specific and often complex behaviors from large language models (LLMs) in conversational settings is crucial for their evaluation. Recent work proposes novel techniques to find natural language prompts that induce specific beha… \ Source • arXiv cs.CL • 19:57
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.