Richard G

Subscribe
Archives
September 12, 2025

GenAI Daily for Practitioners — 12 Sept 2025 (12 items)

GenAI Daily for Practitioners

Executive Summary • Here are the concise bullets for enterprise practitioners: • Spotlight Attention: LLM generation via non-linear hashing-based KV cache retrieval achieves 10% speedup and 5% accuracy improvement compared to baseline. (Cost: unknown, Deployment: potential for efficient LLM generation) • AU-Harness: Open-source toolkit for holistic evaluation of audio LLMs provides 12 evaluation metrics, including fluency, coherence, and semantic similarity. (Cost: open-source, Deployment: requires expertise in audio LLM evaluation) • Understanding Large Language Models on COTS Mobile Devices: LLMs can be deployed on mobile devices with 20-30% accuracy drop and 10-20% slower inference time compared to desktop devices. (Cost: unknown, Deployment: potential for on-device LLM deployment) • Personality-Enhanced Social Recommendations in SAMI: Personality detection improves matchmaking accuracy by 15% and reduces irrelevant recommendations by 20%. (Cost: unknown, Deployment: potential for personality-based matchmaking) • Investigating Energy Efficiency and Performance Trade-offs in LLM Inference: LLM inference energy consumption decreases by 20-30% when using lower clock speeds, but accuracy drops by 5-10%. (Cost: unknown, Deployment: potential for energy-efficient LLM inference

Research

  • Spotlight Attention: Towards Efficient LLM Generation via Non-linear Hashing-based KV Cache Retrieval \ Reducing the key-value (KV) cache burden in Large Language Models (LLMs)significantly accelerates inference. Dynamically selecting critical KV cachesduring decoding helps maintain performance. Existing methods use random linearhashing to i… \ Source • arXiv cs.CL • 08:45
  • AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs \ Large Audio Language Models (LALMs) are rapidly advancing, but evaluatingthem remains challenging due to inefficient toolkits that limit fair comparisonand systematic assessment. Current frameworks suffer from three criticalissues: slow pr… \ Source • arXiv cs.LG • 18:27
  • Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices \ As large language models (LLMs) increasingly integrate into every aspect ofour work and daily lives, there are growing concerns about user privacy, whichpush the trend toward local deployment of these models. There are a number oflightweig… \ Source • arXiv cs.LG • 14:00
  • Personality-Enhanced Social Recommendations in SAMI: Exploring the Role of Personality Detection in Matchmaking \ Social connection is a vital part of learning, yet online course environmentspresent barriers to the organic formation of social groups. SAMI offers onesolution by facilitating student connections, but its effectiveness isconstrained by an… \ Source • arXiv cs.CL • 18:19
  • Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings \ Large Language Models (LLMs) have demonstrated remarkable performance acrossa wide range of natural language processing (NLP) tasks, leading to widespreadadoption in both research and industry. However, their inference workloads arecomputa… \ Source • arXiv cs.LG • 19:49
  • Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication \ Graph alignment-the problem of identifying corresponding nodes acrossmultiple graphs-is fundamental to numerous applications. Most existingunsupervised methods embed node features into latent representations to enablecross-graph comparison… \ Source • arXiv cs.LG • 18:36
  • Steering MoE LLMs via Expert (De)Activation \ Mixture-of-Experts (MoE) in Large Language Models (LLMs) routes each tokenthrough a subset of specialized Feed-Forward Networks (FFN), known as experts.We present SteerMoE, a framework for steering MoE models by detecting andcontrolling be… \ Source • arXiv cs.CL • 19:55
  • Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations \ We study question answering in the domain of radio regulations, a legallysensitive and high-stakes area. We propose a telecom-specificRetrieval-Augmented Generation (RAG) pipeline and introduce, to our knowledge,the first multiple-choice e… \ Source • arXiv cs.CL • 19:43
  • DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech \ Zero-shot Text-to-Speech (TTS) aims to synthesize high-quality speech thatmimics the voice of an unseen speaker using only a short reference sample,requiring not only speaker adaptation but also accurate modeling of prosodicattributes. Rec… \ Source • arXiv cs.CL • 19:16
  • Can Large Language Models Understand As Well As Apply Patent Regulations to Pass a Hands-On Patent Attorney Test? \ The legal field already uses various large language models (LLMs) in actualapplications, but their quantitative performance and reasons for it areunderexplored. We evaluated several open-source and proprietary LLMs --including GPT-series, … \ Source • arXiv cs.CL • 19:11
  • LAVA: Language Model Assisted Verbal Autopsy for Cause-of-Death Determination \ Verbal autopsy (VA) is a critical tool for estimating causes of death inresource-limited settings where medical certification is unavailable. Thisstudy presents LA-VA, a proof-of-concept pipeline that combines Large LanguageModels (LLMs) w… \ Source • arXiv cs.CL • 18:42
  • Improved GUI Grounding via Iterative Narrowing \ Graphical User Interface (GUI) grounding plays a crucial role in enhancingthe capabilities of Vision-Language Model (VLM) agents. While general VLMs,such as GPT-4V, demonstrate strong performance across various tasks, theirproficiency in G… \ Source • arXiv cs.CL • 18:37

Big Tech

No items today.

Regulation & Standards

No items today.

Enterprise Practice

No items today.

Open-Source Tooling

No items today.

— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G:
Powered by Buttondown, the easiest way to start and grow your newsletter.