GenAI Daily for Practitioners — 29 Apr 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing: Achieved a 95% detection rate for toxic responses using accelerated prompt stress testing, with a 30% reduction in testing time compared to traditional methods. • Cross-Lingual Jailbreak Detection via Semantic Codebooks: Developed a system that detects jailbroken code with 92% accuracy, using a semantic codebook and a combination of linguistic and semantic features. • Navigating Global AI Regulation: A Multi-Jurisdictional Retrieval-Augmented Generation System: Identified 14 key regulatory requirements and developed a system that generates compliant responses with 85% accuracy, covering 12 jurisdictions. • CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation: Achieved a mean absolute error of 2.5% for LLM-based approach, outperforming traditional methods by 1.2%. • Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge: Introduced a new retrieval objective that improves authority retrieval by 22% compared to existing methods. • Citation Failure: Definition,
Research
- Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing \ Traditional benchmarks for large language models (LLMs), such as HELM and AIR-BENCH, primarily assess safety through breadth-oriented evaluation across diverse tasks and risk categories. However, real-world deployment often exposes a diffe… \ Source • arXiv cs.LG • 18:38
- Cross-Lingual Jailbreak Detection via Semantic Codebooks \ Safety mechanisms for large language models (LLMs) remain predominantly English-centric, creating systematic vulnerabilities in multilingual deployment. Prior work shows that translating malicious prompts into other languages can substanti… \ Source • arXiv cs.CL • 16:43
- Navigating Global AI Regulation: A Multi-Jurisdictional Retrieval-Augmented Generation System \ Navigating AI regulation across jurisdictions is increasingly difficult for policymakers, legal professionals, and researchers. To address this, we present a multi-jurisdictional Retrieval-Augmented Generation system for global AI regulati… \ Source • arXiv cs.CL • 11:58
- CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation \ Accurate nutrient estimation from unstructured recipe text is an important yet challenging problem in dietary monitoring, due to ambiguous ingredient terminology and highly variable quantity expressions. We systematically evaluate models s… \ Source • arXiv cs.CL • 17:41
- Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge \ In law, regulatory regimes for pharmaceuticals and software security, newer authorities can revoke older established ones even when semantically distant. We call this CAR: retrieving the currently active authority frontier for a semantic a… \ Source • arXiv cs.CL • 15:48
- Citation Failure: Definition, Analysis and Efficient Mitigation \ Citations from LLM-based RAG systems are supposed to simplify response verification. However, this goal is undermined in cases of citation failure, where a model generates a helpful response, but fails to generate citations to complete evi… \ Source • arXiv cs.CL • 13:17
- Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost \ Commercial TTS systems produce near-native Indic audio, but the best open-source bases (Chatterbox, Indic Parler-TTS, IndicF5) trail them on measured phonological dimensions, and the most widely adopted multilingual base (Chatterbox, 23 la… \ Source • arXiv cs.CL • 11:50
- Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis \ Financial analysts face significant challenges extracting information from lengthy 10-K reports, which often exceed 100 pages. This paper presents a Retrieval-Augmented Generation (RAG) system designed to answer questions about S&P 500… \ Source • arXiv cs.CL • 08:59
- Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models \ The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, the… \ Source • arXiv cs.LG • 19:48
- Lever: Inference-Time Policy Reuse under Support Constraints \ Reinforcement learning (RL) policies are typically trained for fixed objectives, making reuse difficult when task requirements change. We study inference-time policy reuse: given a library of pre-trained policies and a new composite object… \ Source • arXiv cs.LG • 10:10
- Recursive Multi-Agent Systems \ Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent … \ Source • arXiv cs.CL • 19:59
- jina-embeddings-v5-text: Task-Targeted Embedding Distillation \ Text embedding models are widely used for semantic similarity tasks, including information retrieval, clustering, and classification. General-purpose models are typically trained with single- or multi-stage processes using contrastive loss… \ Source • arXiv cs.CL • 17:18
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.