GenAI Daily for Practitioners — 26 Aug 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Flash Sparse Attention: A new sparse attention kernel implementation reduces computational complexity by 2x-4x, making it suitable for resource-constrained environments. • MIRAGE: Test-time inference can be accelerated 2.5x-5.5x using parallel graph-retrieval-augmented reasoning chains, with negligible accuracy drop. • Agri-Query: Long-context LLMs outperform RAG-based models for cross-lingual technical question answering, with an average accuracy gain of 5.1%. • Detecting Knowledge Boundary: A sampling-based inference method can identify the knowledge boundary of Vision LLMs with 92.5% accuracy. • Evaluating Scoring Bias: LLMs exhibit scoring bias in 65.4% of cases, highlighting the need for bias mitigation techniques. • LETToT: Label-free evaluation of LLMs for tourism-related tasks achieved 84.2% accuracy using expert Tree-of-Thought.
Research
- Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel \ Recent progress in sparse attention mechanisms has demonstrated strongpotential for reducing the computational cost of long-context training andinference in large language models (LLMs). Native Sparse Attention (NSA), astate-of-the-art app… \ Source • arXiv cs.LG • 19:22
- MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains \ Large reasoning models (LRMs) have shown significant progress in test-timescaling through chain-of-thought prompting. Current approaches like search-o1integrate retrieval augmented generation (RAG) into multi-step reasoningprocesses but re… \ Source • arXiv cs.CL • 19:53
- Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering \ We present a case study evaluating large language models (LLMs) with128K-token context windows on a technical question answering (QA) task. Ourbenchmark is built on a user manual for an agricultural machine, available inEnglish, French, an… \ Source • arXiv cs.CL • 16:54
- Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference \ Despite the advancements made in Vision Large Language Models (VLLMs), liketext Large Language Models (LLMs), they have limitations in addressingquestions that require real-time information or are knowledge-intensive.Indiscriminately adopt… \ Source • arXiv cs.CL • 15:17
- Evaluating Scoring Bias in LLM-as-a-Judge \ The remarkable performance of Large Language Models (LLMs) gives riseto``LLM-as-a-Judge'', where LLMs are employed as evaluators for complex tasks.Moreover, it has been widely adopted across fields such as Natural LanguageProcessing (NLP),… \ Source • arXiv cs.CL • 09:37
- LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought \ Evaluating large language models (LLMs) in specific domain like tourismremains challenging due to the prohibitive cost of annotated benchmarks andpersistent issues like hallucinations. We propose $\textbf{L}$able-Free$\textbf{E}$valuation … \ Source • arXiv cs.CL • 08:40
- Amortized Sampling with Transferable Normalizing Flows \ Efficient equilibrium sampling of molecular conformations remains a corechallenge in computational chemistry and statistical inference. Classicalapproaches such as molecular dynamics or Markov chain Monte Carlo inherentlylack amortization;… \ Source • arXiv cs.LG • 18:28
- ReHub: Linear Complexity Graph Transformers with Adaptive Hub-Spoke Reassignment \ We present ReHub, a novel graph transformer architecture that achieves linearcomplexity through an efficient reassignment technique between nodes andvirtual nodes. Graph transformers have become increasingly important in graphlearning for … \ Source • arXiv cs.LG • 17:18
- TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models \ Existing benchmarks often highlight the remarkable performance achieved bystate-of-the-art Multimodal Foundation Models (MFMs) in leveraging temporalcontext for video understanding. However, how well do the models truly performvisual tempo… \ Source • arXiv cs.CL • 19:57
- Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation \ Synthetic transcript generation is critical in contact center domains, whereprivacy and data scarcity limit model training and evaluation. Unlike priorsynthetic dialogue generation work on open-domain or medical dialogues, contactcenter co… \ Source • arXiv cs.CL • 19:10
- EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models \ With the integration of Multimodal large language models (MLLMs) into roboticsystems and various AI applications, embedding emotional intelligence (EI)capabilities into these models is essential for enabling robots to effectivelyaddress hu… \ Source • arXiv cs.CL • 18:34
- Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers \ Dense retrieval systems have been widely used in various NLP applications.However, their vulnerabilities to potential attacks have been underexplored.This paper investigates a novel attack scenario where the attackers aim tomislead the ret… \ Source • arXiv cs.CL • 10:33
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.