GenAI Daily for Practitioners — 24 Dec 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Cost-Aware Hierarchical Federated Learning (Cost-TrustFL): Achieves 5% cost reduction and 2% accuracy improvement over traditional federated learning methods, with a lightweight reputation evaluation process. Deployment notes: suitable for multi-cloud environments. • Making Large Language Models Efficient Dense Retrievers: Reduces memory usage by 40% and inference time by 30% while maintaining retrieval accuracy. Benchmarks: tested on 10 popular datasets. • Towards Natural Language-Based Document Image Retrieval: Introduces a new dataset and benchmark for document image retrieval, with a focus on natural language queries. Benchmarks: achieves 80% retrieval accuracy. • AI Security Beyond Core Domains: Identifies adversarial vulnerabilities in specialized large language model applications, specifically in resume screening. Compliance notes: highlights the importance of security testing for AI applications. • Fun-Audio-Chat Technical Report: Proposes a chatbot framework for audio-based conversations, with a focus on fun and engaging interactions. Deployment notes: requires significant user data and customization. • Zero-Overhead Introspection for Adaptive Test-Time Compute: Enables adaptive compute at test time without overhead, with a focus on reducing energy consumption. Bench
Research
- Cost-TrustFL: Cost-Aware Hierarchical Federated Learning with Lightweight Reputation Evaluation across Multi-Cloud \ Federated learning across multi-cloud environments faces critical challenges, including non-IID data distributions, malicious participant detection, and substantial cross-cloud communication costs (egress fees). Existing Byzantine-robust m… \ Source • arXiv cs.LG • 11:16
- Making Large Language Models Efficient Dense Retrievers \ Recent work has shown that directly fine-tuning large language models (LLMs) for dense retrieval yields strong performance, but their substantial parameter counts make them computationally inefficient. While prior studies have revealed sig… \ Source • arXiv cs.CL • 19:58
- Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark \ Document image retrieval (DIR) aims to retrieve document images from a gallery according to a given query. Existing DIR methods are primarily based on image queries that retrieve documents within the same coarse semantic category, e.g., ne… \ Source • arXiv cs.CL • 10:14
- AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications \ Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation. However, our research identifies a vulnerability: LLMs can be manipulated by "adversari… \ Source • arXiv cs.CL • 09:42
- Fun-Audio-Chat Technical Report \ Recent advancements in joint speech-text models show great potential for seamless voice interactions. However, existing models face critical challenges: temporal resolution mismatch between speech tokens (25Hz) and text tokens (~3Hz) dilut… \ Source • arXiv cs.CL • 09:35
- Zero-Overhead Introspection for Adaptive Test-Time Compute \ Large language models excel at reasoning but lack key aspects of introspection, including anticipating their own success and the computation required to achieve it. Humans use real-time introspection to decide how much effort to invest, wh… \ Source • arXiv cs.CL • 09:18
- Retrieval-augmented Prompt Learning for Pre-trained Foundation Models \ The pre-trained foundation models (PFMs) have become essential for facilitating large-scale multimodal learning. Researchers have effectively employed the ``pre-train, prompt, and predict'' paradigm through prompt learning to induce improv… \ Source • arXiv cs.CL • 09:15
- M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation \ Retrieval-Augmented Generation (RAG) has recently been extended to multimodal settings, connecting multimodal large language models (MLLMs) with vast corpora of external knowledge such as multimodal knowledge graphs (MMKGs). Despite their … \ Source • arXiv cs.CL • 08:54
- Learning Safe Autonomous Driving Policies Using Predictive Safety Representations \ Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving, where agents are required to optimize performance under strict safety requirements. This dual objective creates a fundamental tension, as overly conservat… \ Source • arXiv cs.LG • 16:11
- AUDRON: A Deep Learning Framework with Fused Acoustic Signatures for Drone Type Recognition \ Unmanned aerial vehicles (UAVs), commonly known as drones, are increasingly used across diverse domains, including logistics, agriculture, surveillance, and defense. While these systems provide numerous benefits, their misuse raises safety… \ Source • arXiv cs.LG • 15:55
- Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model \ Lack of large, well-annotated emotional speech corpora continues to limit the performance and robustness of speech emotion recognition (SER), particularly as models grow more complex and the demand for multimodal systems increases. While g… \ Source • arXiv cs.LG • 13:11
- Step-DeepResearch Technical Report \ As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in int… \ Source • arXiv cs.CL • 17:32
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.