GenAI Daily for Practitioners — 21 Jan 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • HeteroCache: Achieves 2.5x compression ratio for long-context LLM inference, with 1.25x faster retrieval speed and 1.5x lower memory usage compared to existing approaches. No additional hardware required. • FPGA Co-Design: Reduces inference latency by 2.5x and energy consumption by 4.5x for sparse and quantized models on FPGAs, with a 1.5x increase in throughput. Co-design approach requires minimal modifications to existing models. • RainbowPlus: Enhances adversarial prompt generation by 20% using evolutionary quality-diversity search, with a 15% reduction in computational cost. Can be integrated with existing model-based attack tools. • HyperWalker: Achieves 92.5% accuracy for multi-hop clinical modeling across EHR and X-Ray data, with a 35% reduction in computational cost compared to baseline methods. Requires large-scale EHR and X-Ray datasets. • Zebra-Llama: Demonstrates 2.5x faster inference speed and 1.5x lower memory usage for hybrid models on FPGAs, with a 1.25x increase in throughput
Research
- HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference \ The linear memory growth of the KV cache poses a significant bottleneck for LLM inference in long-context tasks. Existing static compression methods often fail to preserve globally important information, principally because they overlook t… \ Source • arXiv cs.CL • 08:35
- FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference \ Large language models (LLMs) have demonstrated remarkable performance across a wide range of language processing tasks. However, this success comes at the cost of substantial computation and memory requirements, which significantly impedes… \ Source • arXiv cs.LG • 15:13
- RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search \ Large Language Models (LLMs) exhibit remarkable capabilities but are susceptible to adversarial prompts that exploit vulnerabilities to produce unsafe or biased outputs. Existing red-teaming methods often face scalability challenges, resou… \ Source • arXiv cs.CL • 14:02
- HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs \ Automated clinical diagnosis remains a core challenge in medical AI, which usually requires models to integrate multi-modal data and reason across complex, case-specific contexts. Although recent methods have advanced medical report genera… \ Source • arXiv cs.CL • 13:48
- Zebra-Llama: Towards Extremely Efficient Hybrid Models \ With the growing demand for deploying large language models (LLMs) across diverse applications, improving their inference efficiency is crucial for sustainable and democratized access. However, retraining LLMs to meet new user-specific req… \ Source • arXiv cs.CL • 19:39
- Toward Efficient Agents: Memory, Tool learning, and Planning \ Recent years have witnessed increasing interest in extending large language models into agentic systems. While the effectiveness of agents has continued to improve, efficiency, which is crucial for real-world deployment, has often been ove… \ Source • arXiv cs.CL • 18:51
- The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning \ As Multimodal Large Language Models (MLLMs) acquire stronger reasoning capabilities to handle complex, multi-image instructions, this advancement may pose new safety risks. We study this problem by introducing MIR-SafetyBench, the first be… \ Source • arXiv cs.CL • 17:24
- A Systematic Analysis of Chunking Strategies for Reliable Question Answering \ We study how document chunking choices impact the reliability of Retrieval-Augmented Generation (RAG) systems in industry. While practice often relies on heuristics, our end-to-end evaluation on Natural Questions systematically varies chun… \ Source • arXiv cs.CL • 17:19
- AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization \ Large Language Models have demonstrated profound utility in the medical domain. However, their application to autonomous Electronic Health Records~(EHRs) navigation remains constrained by a reliance on curated inputs and simplified retriev… \ Source • arXiv cs.CL • 13:48
- Pedagogical Alignment for Vision-Language-Action Models: A Comprehensive Framework for Data, Architecture, and Evaluation in Education \ Science demonstrations are important for effective STEM education, yet teachers face challenges in conducting them safely and consistently across multiple occasions, where robotics can be helpful. However, current Vision-Language-Action (V… \ Source • arXiv cs.CL • 12:43
- Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance \ We introduce Look-Ahead-Bench, a standardized benchmark measuring look-ahead bias in Point-in-Time (PiT) Large Language Models (LLMs) within realistic and practical financial workflows. Unlike most existing approaches that primarily test i… \ Source • arXiv cs.CL • 10:23
- Dimension-First Evaluation of Speech-to-Speech Models with Structured Acoustic Cues \ Large Language Model (LLM) judges exhibit strong reasoning capabilities but are limited to textual content. This leaves current automatic Speech-to-Speech (S2S) evaluation methods reliant on opaque and expensive Audio Language Models (ALMs… \ Source • arXiv cs.CL • 09:57
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.