GenAI Daily for Practitioners — 7 Nov 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG • + Benchmarks: Evaluates RAG models using human-aligned evaluation metrics (e.g., human evaluation, ROUGE-L) • + Cost: Not specified • + Compliance: N/A • + Deployment: Potential application in AI-powered writing assistants and content generation tools • From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting
Research
- RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG \ Retrieval-Augmented Generation (RAG) is a critical technique for groundingLarge Language Models (LLMs) in factual evidence, yet evaluating RAG systems inspecialized, safety-critical domains remains a significant challenge. Existingevaluati… \ Source • arXiv cs.CL • 17:22
- From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting \ As the role of Large Language Models (LLM)-based coding assistants insoftware development becomes more critical, so does the role of the bugs theygenerate in the overall cybersecurity landscape. While a number of LLM codesecurity benchmark… \ Source • arXiv cs.CL • 17:52
- Black-Box Guardrail Reverse-engineering Attack \ Large language models (LLMs) increasingly employ guardrails to enforceethical, legal, and application-specific constraints on their outputs. Whileeffective at mitigating harmful responses, these guardrails introduce a newclass of vulnerabi… \ Source • arXiv cs.CL • 10:24
- RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability \ Recent advancements in multimodal models have significantly improvedvision-language (VL) alignment in radiology. However, existing approachesstruggle to effectively utilize complex radiology reports for learning andoffer limited interpreta… \ Source • arXiv cs.CL • 10:22
- CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation \ The National Comprehensive Cancer Network (NCCN) provides evidence-basedguidelines for cancer treatment. Translating complex patient presentations intoguideline-compliant treatment recommendations is time-intensive, requiresspecialized exp… \ Source • arXiv cs.LG • 19:38
- Approximate non-linear model predictive control with safety-augmented neural networks \ Model predictive control (MPC) achieves stability and constraint satisfactionfor general nonlinear systems, but requires computationally expensive onlineoptimization. This paper studies approximations of such MPC controllers vianeural netw… \ Source • arXiv cs.LG • 17:33
- GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs \ LLMs have shown impressive capabilities across various natural languageprocessing tasks, yet remain vulnerable to input prompts, known as jailbreakattacks, carefully designed to bypass safety guardrails and elicit harmfulresponses. Traditi… \ Source • arXiv cs.LG • 13:34
- When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection \ The proliferation of misinformation necessitates robust yet computationallyefficient fact verification systems. While current state-of-the-art approachesleverage Large Language Models (LLMs) for generating explanatory rationales,these meth… \ Source • arXiv cs.CL • 19:35
- Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm \ "Thinking with Text" and "Thinking with Images" paradigm significantlyimprove the reasoning ability of large language models (LLMs) and VisionLanguage Models (VLMs). However, these paradigms have inherent limitations. (1)Images capture onl… \ Source • arXiv cs.CL • 18:25
- Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models \ State Space Models (SSMs) are emerging as a compelling alternative toTransformers because of their consistent memory usage and high performance.Despite this, scaling up SSMs on cloud services or limited-resource devices ischallenging due t… \ Source • arXiv cs.CL • 17:22
- LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge \ Evaluating large language models (LLMs) on question answering often relies onstatic benchmarks that reward memorization and understate the role ofretrieval, failing to capture the dynamic nature of world knowledge. We presentLiveSearchBenc… \ Source • arXiv cs.CL • 16:57
- Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards \ Retrieval-augmented generation (RAG) aims to reduce hallucinations bygrounding responses in external context, yet large language models (LLMs) stillfrequently introduce unsupported information or contradictions even whenprovided with relev… \ Source • arXiv cs.CL • 16:46
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.
Don't miss what's next. Subscribe to Richard G: