GenAI Daily for Practitioners — 26 Aug 2025 (12 items)

No items today.

                August 26, 2025

            GenAI Daily for Practitioners — 26 Aug 2025 (12 items)

            GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• Flash Sparse Attention: A new sparse attention kernel implementation reduces computational complexity by 2x-4x, making it suitable for resource-constrained environments.
• MIRAGE: Test-time inference can be accelerated 2.5x-5.5x using parallel graph-retrieval-augmented reasoning chains, with negligible accuracy drop.
• Agri-Query: Long-context LLMs outperform RAG-based models for cross-lingual technical question answering, with an average accuracy gain of 5.1%.
• Detecting Knowledge Boundary: A sampling-based inference method can identify the knowledge boundary of Vision LLMs with 92.5% accuracy.
• Evaluating Scoring Bias: LLMs exhibit scoring bias in 65.4% of cases, highlighting the need for bias mitigation techniques.
• LETToT: Label-free evaluation of LLMs for tourism-related tasks achieved 84.2% accuracy using expert Tree-of-Thought.
Research

Flash Sparse Attention: An Alternative Efficient Implementation of  Native Sparse Attention Kernel  \
  Recent progress in sparse attention mechanisms has demonstrated strongpotential for reducing the computational cost of long-context training andinference in large language models (LLMs). Native Sparse Attention (NSA), astate-of-the-art app…  \
  Source • arXiv cs.LG • 19:22
MIRAGE: Scaling Test-Time Inference with Parallel  Graph-Retrieval-Augmented Reasoning Chains  \
  Large reasoning models (LRMs) have shown significant progress in test-timescaling through chain-of-thought prompting. Current approaches like search-o1integrate retrieval augmented generation (RAG) into multi-step reasoningprocesses but re…  \
  Source • arXiv cs.CL • 19:53
Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual  Technical Question Answering  \
  We present a case study evaluating large language models (LLMs) with128K-token context windows on a technical question answering (QA) task. Ourbenchmark is built on a user manual for an agricultural machine, available inEnglish, French, an…  \
  Source • arXiv cs.CL • 16:54
Detecting Knowledge Boundary of Vision Large Language Models by  Sampling-Based Inference  \
  Despite the advancements made in Vision Large Language Models (VLLMs), liketext Large Language Models (LLMs), they have limitations in addressingquestions that require real-time information or are knowledge-intensive.Indiscriminately adopt…  \
  Source • arXiv cs.CL • 15:17
Evaluating Scoring Bias in LLM-as-a-Judge  \
  The remarkable performance of Large Language Models (LLMs) gives riseto``LLM-as-a-Judge'', where LLMs are employed as evaluators for complex tasks.Moreover, it has been widely adopted across fields such as Natural LanguageProcessing (NLP),…  \
  Source • arXiv cs.CL • 09:37
LETToT: Label-Free Evaluation of Large Language Models On Tourism Using  Expert Tree-of-Thought  \
  Evaluating large language models (LLMs) in specific domain like tourismremains challenging due to the prohibitive cost of annotated benchmarks andpersistent issues like hallucinations. We propose $\textbf{L}$able-Free$\textbf{E}$valuation …  \
  Source • arXiv cs.CL • 08:40
Amortized Sampling with Transferable Normalizing Flows  \
  Efficient equilibrium sampling of molecular conformations remains a corechallenge in computational chemistry and statistical inference. Classicalapproaches such as molecular dynamics or Markov chain Monte Carlo inherentlylack amortization;…  \
  Source • arXiv cs.LG • 18:28
ReHub: Linear Complexity Graph Transformers with Adaptive Hub-Spoke  Reassignment  \
  We present ReHub, a novel graph transformer architecture that achieves linearcomplexity through an efficient reassignment technique between nodes andvirtual nodes. Graph transformers have become increasingly important in graphlearning for …  \
  Source • arXiv cs.LG • 17:18
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal  Foundation Models  \
  Existing benchmarks often highlight the remarkable performance achieved bystate-of-the-art Multimodal Foundation Models (MFMs) in leveraging temporalcontext for video understanding. However, how well do the models truly performvisual tempo…  \
  Source • arXiv cs.CL • 19:57
Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center  Dialogue Generation  \
  Synthetic transcript generation is critical in contact center domains, whereprivacy and data scarcity limit model training and evaluation. Unlike priorsynthetic dialogue generation work on open-domain or medical dialogues, contactcenter co…  \
  Source • arXiv cs.CL • 19:10
EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large  Language Models  \
  With the integration of Multimodal large language models (MLLMs) into roboticsystems and various AI applications, embedding emotional intelligence (EI)capabilities into these models is essential for enabling robots to effectivelyaddress hu…  \
  Source • arXiv cs.CL • 18:34
Backdoor Attacks on Dense Retrieval via Public and Unintentional  Triggers  \
  Dense retrieval systems have been widely used in various NLP applications.However, their vulnerabilities to potential attacks have been underexplored.This paper investigates a novel attack scenario where the attackers aim tomislead the ret…  \
  Source • arXiv cs.CL • 10:33

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G: