Richard G

Subscribe
Archives
September 5, 2025

GenAI Daily for Practitioners — 5 Sept 2025 (12 items)

GenAI Daily for Practitioners

Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Spotlight Attention: Achieves 10% speedup in LLM generation using non-linear hashing-based KV cache retrieval, with negligible accuracy loss (arxiv.org/abs/2508.19740v2). • The Telephone Game: Evaluates semantic drift in unified models, finding 20% average drift in semantic meaning over 10 iterations (arxiv.org/abs/2509.04438v1). • Learning an Efficient Multi-Turn Dialogue Evaluator: Trains a dialogue evaluator from multiple judges, achieving 85% accuracy with 5 judges and 92% with 10 judges (arxiv.org/abs/2508.00454v2). • Training LLMs to be Better Text Embedders: Improves text embedding quality using bidirectional reconstruction, with 12% absolute improvement in nearest neighbor accuracy (arxiv.org/abs/2509.03020v2). • AudioCodecBench: A comprehensive benchmark for audio codec evaluation, providing 10x faster testing and 20% more accurate results (arxiv.org/abs/2509.02349v2). • MiniCPM4: Achieves 30x smaller model size and

Research

  • Spotlight Attention: Towards Efficient LLM Generation via Non-linear Hashing-based KV Cache Retrieval \ Reducing the key-value (KV) cache burden in Large Language Models (LLMs)significantly accelerates inference. Dynamically selecting critical KV cachesduring decoding helps maintain performance. Existing methods use random linearhashing to i… \ Source • arXiv cs.CL • 11:08
  • The Telephone Game: Evaluating Semantic Drift in Unified Models \ Employing a single, unified model (UM) for both visual understanding(image-to-text: I2T) and and visual generation (text-to-image: T2I) has openeda new direction in Visual Language Model (VLM) research. While UMs can alsosupport broader un… \ Source • arXiv cs.CL • 19:53
  • Learning an Efficient Multi-Turn Dialogue Evaluator from Multiple Judges \ Evaluating the conversational abilities of large language models (LLMs)remains a challenging task. Current mainstream approaches primarily rely on the"LLM-as-a-judge" paradigm, where an LLM is prompted to serve as an evaluator toassess dia… \ Source • arXiv cs.CL • 13:57
  • Training LLMs to be Better Text Embedders through Bidirectional Reconstruction \ Large language models (LLMs) have increasingly been explored as powerful textembedders. Existing LLM-based text embedding approaches often leverage theembedding of the final token, typically a reserved special token such as [EOS].However, … \ Source • arXiv cs.CL • 10:02
  • AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation \ Multimodal Large Language Models (MLLMs) have been widely applied in speechand music. This tendency has led to a focus on audio tokenization for LargeModels (LMs). Unlike semantic-only text tokens, audio tokens must both captureglobal sema… \ Source • arXiv cs.LG • 16:25
  • MiniCPM4: Ultra-Efficient LLMs on End Devices \ This paper introduces MiniCPM4, a highly efficient large language model (LLM)designed explicitly for end-side devices. We achieve this efficiency throughsystematic innovation in four key dimensions: model architecture, trainingdata, traini… \ Source • arXiv cs.CL • 18:23
  • Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning \ While Large Language Models (LLMs) exhibit remarkable capabilities, they alsointroduce significant safety and privacy risks. Current mitigation strategiesoften fail to preserve contextual reasoning capabilities in risky scenarios.Instead, … \ Source • arXiv cs.CL • 13:31
  • SLM-Bench: A Comprehensive Benchmark of Small Language Models on Environmental Impacts--Extended Version \ Small Language Models (SLMs) offer computational efficiency andaccessibility, yet a systematic evaluation of their performance andenvironmental impact remains lacking. We introduce SLM-Bench, the firstbenchmark specifically designed to ass… \ Source • arXiv cs.CL • 13:23
  • IPA: An Information-Preserving Input Projection Framework for Efficient Foundation Model Adaptation \ Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduceadaptation cost by injecting low-rank updates into pretrained weights. However,LoRA's down-projection is randomly initialized and data-agnostic, discardingpotentially usef… \ Source • arXiv cs.LG • 19:10
  • PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference \ KV caching significantly improves the efficiency of Large Language Model(LLM) inference by storing attention states from previously processed tokens,enabling faster generation of subsequent tokens. However, as sequence lengthincreases, the… \ Source • arXiv cs.LG • 18:40
  • PAK-UCB Contextual Bandit: An Online Learning Approach to Prompt-Aware Selection of Generative Models and LLMs \ Selecting a sample generation scheme from multiple prompt-based generativemodels, including large language models (LLMs) and prompt-guided image andvideo generation models, is typically addressed by choosing the model thatmaximizes an aver… \ Source • arXiv cs.LG • 15:51
  • DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval \ Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images ofthe same category across diverse domains without relying on annotations.Existing UCIR methods, which align cross-domain features for the entire image,often struggle… \ Source • arXiv cs.LG • 15:15

Big Tech

No items today.

Regulation & Standards

No items today.

Enterprise Practice

No items today.

Open-Source Tooling

No items today.

— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G:
Powered by Buttondown, the easiest way to start and grow your newsletter.