GenAI Daily for Practitioners — 10 Apr 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving: Achieves 10-20% cost savings in serving large language models; implements token-budget routing to balance cost and reliability. • AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation: Provides a benchmark for evaluating text-to-audio-video generation models; includes 6 tasks and 12 metrics for multi-granular evaluation. • KV Cache Offloading for Context-Intensive Tasks: Reduces average response time by 30% and memory usage by 25% for context-intensive tasks; offloads KV cache to reduce latency and memory usage. • Stacked from One: Multi-Scale Self-Injection for Context Window Extension: Increases context window size by 50% with minimal additional computation; uses self-injection to extend context window. • Mina: A Multilingual LLM-Powered Legal Assistant Agent for Bangladesh for Empowering Access to Justice: Develops a legal assistant agent using multilingual LLMs; achieves 85% accuracy in legal document classification and 75% accuracy in legal question answering. • Graph Neural Networks for
Research
- Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving \ Production vLLM fleets typically provision each instance for the worst-case context length, leading to substantial KV-cache over-allocation and under-utilized concurrency. In practice, 80-95% of requests are short, yet are served under con… \ Source • arXiv cs.CL • 12:47
- AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation \ Text-to-Audio-Video (T2AV) generation is rapidly becoming a core interface for media creation, yet its evaluation remains fragmented. Existing benchmarks largely assess audio and video in isolation or rely on coarse embedding similarity, f… \ Source • arXiv cs.CL • 19:59
- KV Cache Offloading for Context-Intensive Tasks \ With the growing demand for long-context LLMs across a wide range of applications, the key-value (KV) cache has become a critical bottleneck for both latency and memory usage. Recently, KV-cache offloading has emerged as a promising approa… \ Source • arXiv cs.CL • 18:30
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension \ The limited context window of contemporary large language models (LLMs) remains a primary bottleneck for their broader application across diverse domains. Although continual pre-training on long-context data offers a straightforward soluti… \ Source • arXiv cs.CL • 17:16
- Mina: A Multilingual LLM-Powered Legal Assistant Agent for Bangladesh for Empowering Access to Justice \ Bangladesh's low-income population faces major barriers to affordable legal advice due to complex legal language, procedural opacity, and high costs. Existing AI legal assistants lack Bengali-language support and jurisdiction-specific adap… \ Source • arXiv cs.CL • 16:39
- Graph Neural Networks for Misinformation Detection: Performance-Efficiency Trade-offs \ The rapid spread of online misinformation has led to increasingly complex detection models, including large language models and hybrid architectures. However, their computational cost and deployment limitations raise concerns about practic… \ Source • arXiv cs.CL • 13:48
- Efficient Federated Search for Retrieval-Augmented Generation using Lightweight Routing \ Large language models (LLMs) achieve remarkable performance across domains but remain prone to hallucinations and inconsistencies. Retrieval-augmented generation (RAG) mitigates these issues by augmenting model inputs with relevant documen… \ Source • arXiv cs.LG • 15:52
- Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces \ The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulator. However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, failing to ca… \ Source • arXiv cs.CL • 17:26
- Can Vision Language Models Judge Action Quality? An Empirical Evaluation \ Action Quality Assessment (AQA) has broad applications in physical therapy, sports coaching, and competitive judging. Although Vision Language Models (VLMs) hold considerable promise for AQA, their actual performance in this domain remains… \ Source • arXiv cs.CL • 16:29
- Distributed Multi-Layer Editing for Rule-Level Knowledge in Large Language Models \ Large language models store not only isolated facts but also rules that support reasoning across symbolic expressions, natural language explanations, and concrete instances. Yet most model editing methods are built for fact-level knowledge… \ Source • arXiv cs.CL • 16:22
- E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task \ The rapid advancement in large language models (LLMs) has demonstrated significant potential in End-to-End Software Development (E2ESD). However, existing E2ESD benchmarks are limited by coarse-grained requirement specifications and unreli… \ Source • arXiv cs.CL • 16:05
- arXiv2Table: Toward Realistic Benchmarking and Evaluation for LLM-Based Literature-Review Table Generation \ Literature review tables are essential for summarizing and comparing collections of scientific papers. In this paper, we study the automatic generation of such tables from a pool of papers to satisfy a user's information need. Building on … \ Source • arXiv cs.CL • 15:28
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.