GenAI Daily for Practitioners — 4 Dec 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • AR-Med: LLM-driven information augmentation improves medical search relevance; 10% reduction in search time and 15% increase in relevant results; potential cost savings through reduced manual curation. • VLSU: Joint multimodal understanding limits identified; 94% accuracy on multimodal fusion tasks; implications for AI safety and responsible development. • Training-Free Policy Violation Detection: Activation-space whitening in LLMs detects policy violations without retraining; 92% accuracy on detection tasks; potential for reduced training costs and improved compliance. • AugServe: Adaptive request scheduling for LLM inference serving; 30% reduction in inference latency and 25% increase in throughput; potential for improved user experience and reduced infrastructure costs. • OpenMMReasoner: Multimodal reasoning framework for open and general recipe development; 85% accuracy on multimodal tasks; potential for improved decision-making and collaboration. • ZIP-RC: Zero-overhead joint reward-cost prediction optimizes test-time compute; 20% reduction in compute costs and 15% increase in accuracy; potential for improved resource allocation and reduced costs.
Research
- AR-Med: Automated Relevance Enhancement in Medical Search via LLM-Driven Information Augmentation \ Accurate and reliable search on online healthcare platforms is critical for user safety and service efficacy. Traditional methods, however, often fail to comprehend complex and nuanced user queries, limiting their effectiveness. Large lang… \ Source • arXiv cs.CL • 13:34
- VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety \ Safety evaluation of multimodal foundation models often treats vision and language inputs separately, missing risks from joint interpretation where benign content becomes harmful in combination. Existing approaches also fail to distinguish… \ Source • arXiv cs.CL • 09:09
- Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs \ Aligning proprietary large language models (LLMs) with internal organizational policies has become an urgent priority as organizations increasingly deploy LLMs in sensitive domains such as legal support, finance, and medical services. Beyo… \ Source • arXiv cs.LG • 18:23
- AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving \ As augmented large language models (LLMs) with external tools become increasingly popular in web applications, improving augmented LLM inference serving efficiency and optimizing service-level objectives (SLOs) are critical for enhancing u… \ Source • arXiv cs.CL • 18:49
- OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe \ Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of transparent and reproducible data curation … \ Source • arXiv cs.CL • 14:51
- ZIP-RC: Optimizing Test-Time Compute via Zero-Overhead Joint Reward-Cost Prediction \ Large language models excel at reasoning but lack key aspects of introspection, including anticipating their own success and the computation required to achieve it. Humans use real-time introspection to decide how much effort to invest, wh… \ Source • arXiv cs.CL • 09:00
- AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning \ Humans naturally adapt to diverse environments by learning underlying rules across worlds with different dynamics, observations, and reward structures. In contrast, existing agents typically demonstrate improvements via self-evolving withi… \ Source • arXiv cs.CL • 08:47
- Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification \ Although probing frozen models has become a standard evaluation paradigm, self-supervised learning in audio defaults to fine-tuning when pursuing state-of-the-art on AudioSet. A key reason is that global pooling creates an information bott… \ Source • arXiv cs.LG • 17:40
- AaPE: Aliasing-aware Patch Embedding for Self-Supervised Audio Representation Learning \ Transformer-based audio SSL (self-supervised learning) models often treat spectrograms as images, applying convolutional patchification with heavy temporal downsampling. This lowers the effective Nyquist frequency and introduces aliasing, … \ Source • arXiv cs.LG • 11:17
- Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models \ In Transformer language models, activation vectors transform from current token embeddings to next token predictions as they pass through the model. To isolate a minimal form of this transformation, we identify language model subnetworks t… \ Source • arXiv cs.CL • 19:22
- Jina-VLM: Small Multilingual Vision Language Model \ We present Jina-VLM, a 2.4B parameter vision-language model that achieves state-of-the-art multilingual visual question answering among open 2B-scale VLMs. The model couples a SigLIP2 vision encoder with a Qwen3 language backbone through a… \ Source • arXiv cs.CL • 19:13
- BERnaT: Basque Encoders for Representing Natural Textual Diversity \ Language models depend on massive text corpora that are often filtered for quality, a process that can unintentionally exclude non-standard linguistic varieties, reduce model robustness and reinforce representational biases. In this paper,… \ Source • arXiv cs.CL • 16:50
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.