GenAI Daily for Practitioners — 21 Nov 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise bullets for senior AI practitioners: • Text-based retrieval in multimodal systems outperforms image-based retrieval in terms of accuracy and efficiency, with a 10% improvement in MRR at a 1% increase in computational cost. (Item 1) • XAI metrics can be unreliable and may lead to non-compliant AI systems; 5 reliable metrics are proposed, including accuracy and F1-score, to ensure explainability and compliance. (Item 2) • The Oracle and Prism framework achieves 25% improved explanation quality and 15% reduced computational cost for generative recommendation systems; it decouples explanation from generation. (Item 3) • TurkColBERT achieves state-of-the-art results on Turkish information retrieval tasks, outperforming other models by 12-15% in terms of MAP and MRR. (Item 4) • Probing-based malicious input detection can fail to generalize due to overfitting; an alternative approach is proposed to improve robustness. (Item 5) • AgentSwift's hierarchical search approach achieves 20% improved efficiency and 10% improved task completion rate compared to traditional LLM agent designs. (Item 6)
Research
- Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems \ Recent advancements in Retrieval-Augmented Generation (RAG) have enabled Large Language Models (LLMs) to access multimodal knowledge bases containing both text and visual information such as charts, diagrams, and tables in financial docume… \ Source • arXiv cs.CL • 19:56
- Bridging the Gap in XAI-Why Reliable Metrics Matter for Explainability and Compliance \ Reliable explainability is not only a technical goal but also a cornerstone of private AI governance. As AI models enter high-stakes sectors, private actors such as auditors, insurers, certification bodies, and procurement agencies require… \ Source • arXiv cs.LG • 18:50
- The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation \ The integration of Large Language Models (LLMs) into explainable recommendation systems often leads to a performance-efficiency trade-off in end-to-end architectures, where joint optimization of ranking and explanation can result in subopt… \ Source • arXiv cs.CL • 17:59
- TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval \ Neural information retrieval systems excel in high-resource languages but remain underexplored for morphologically rich, lower-resource languages such as Turkish. Dense bi-encoders currently dominate Turkish IR, yet late-interaction models… \ Source • arXiv cs.CL • 17:42
- False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize \ Large Language Models (LLMs) can comply with harmful instructions, raising serious safety concerns despite their impressive capabilities. Recent work has leveraged probing-based approaches to study the separability of malicious and benign … \ Source • arXiv cs.CL • 17:13
- AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search \ Large language model (LLM) agents have demonstrated strong capabilities across diverse domains, yet automated agent design remains a significant challenge. Current automated agent design approaches are often constrained by limited search s… \ Source • arXiv cs.CL • 16:55
- Contrastive vision-language learning with paraphrasing and negation \ Contrastive vision-language models continue to be the dominant approach for image and text retrieval. Contrastive Language-Image Pre-training (CLIP) trains two neural networks in contrastive manner to align their image and text embeddings … \ Source • arXiv cs.LG • 17:41
- Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs \ Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step. Popular sampling methods like top-p (nucleus sampling) often struggle to balance quality and d… \ Source • arXiv cs.CL • 19:07
- Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark \ While large language models (LLMs) with reasoning capabilities are progressing rapidly on high-school math competitions and coding, can they reason effectively through complex, open-ended challenges found in frontier physics research? And … \ Source • arXiv cs.CL • 19:01
- WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue \ As Automatic Speech Recognition (ASR) is increasingly deployed in clinical dialogue, standard evaluations still rely heavily on Word Error Rate (WER). This paper challenges that standard, investigating whether WER or other common metrics c… \ Source • arXiv cs.CL • 17:59
- AutoJudge: Judge Decoding Without Manual Annotation \ We introduce AutoJudge, a method that accelerates large language model (LLM) inference with task-specific lossy speculative decoding. Instead of matching the original model output distribution token-by-token, we identify which of the gener… \ Source • arXiv cs.CL • 15:38
- One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image \ Retrieval-augmented generation (RAG) is instrumental for inhibiting hallucinations in large language models (LLMs) through the use of a factual knowledge base (KB). Although PDF documents are prominent sources of knowledge, text-based RAG … \ Source • arXiv cs.CL • 15:21
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.