GenAI Daily for Practitioners — 28 Oct 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Fast-MIA: Efficient and Scalable Membership Inference for LLMs: • + Achieves 95% accuracy with 10% of the data, 30% less computation than previous methods. • + Scalable for large language models. • SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors: • + Evaluates LLMs' ability to mimic human language usage, including biases and errors. • + Identifies areas for improvement in LLMs' human-like behavior.
Research
- Fast-MIA: Efficient and Scalable Membership Inference for LLMs \ We propose Fast-MIA (https://github.com/Nikkei/fast-mia), a Python libraryfor efficiently evaluating membership inference attacks (MIA) against LargeLanguage Models (LLMs). MIA against LLMs has emerged as a crucial challenge dueto growing … \ Source • arXiv cs.CL • 08:18
- SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors \ Large language model (LLM) simulations of human behavior have the potentialto revolutionize the social and behavioral sciences, if and only if theyfaithfully reflect real human behaviors. Current evaluations are fragmented,based on bespoke… \ Source • arXiv cs.CL • 15:17
- The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora \ Cross-lingual retrieval-augmented generation (RAG) is a critical capabilityfor retrieving and generating answers across languages. Prior work in thiscontext has mostly focused on generation and relied on benchmarks derived fromopen-domain … \ Source • arXiv cs.CL • 08:40
- Bayes-Split-Edge: Bayesian Optimization for Constrained Collaborative Inference in Wireless Edge Systems \ Mobile edge devices (e.g., AR/VR headsets) typically need to complete timelyinference tasks while operating with limited on-board computing and energyresources. In this paper, we investigate the problem of collaborative inferencein wireles… \ Source • arXiv cs.LG • 17:36
- Think Twice: Branch-and-Rethink Reasoning Reward Model \ Large language models (LLMs) increasingly rely on thinking models thatexternalize intermediate steps and allocate extra test-time compute, withthink-twice strategies showing that a deliberate second pass can elicitstronger reasoning. In co… \ Source • arXiv cs.CL • 18:58
- ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models \ Large Audio Language Models (LALMs), which couple acoustic perception withlarge language models (LLMs) to extract and understand diverse information fromaudio, have attracted intense interest from both academic and industrialcommunities. H… \ Source • arXiv cs.CL • 18:31
- IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering \ Intent identification serves as the foundation for generating appropriateresponses in personalized question answering (PQA). However, existingbenchmarks evaluate only response quality or retrieval performance withoutdirectly measuring inte… \ Source • arXiv cs.CL • 18:12
- AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation \ While RAG demonstrates remarkable capabilities in LLM applications, itseffectiveness is hindered by the ever-increasing length of retrieved contexts,which introduces information redundancy and substantial computational overhead.Existing co… \ Source • arXiv cs.CL • 17:55
- Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions \ Cancer patients are increasingly turning to large language models (LLMs) formedical information, making it critical to assess how well these models handlecomplex, personalized questions. However, current medical benchmarks focus onmedical … \ Source • arXiv cs.CL • 17:39
- Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation \ Open-domain question answering (OpenQA) represents a cornerstone in naturallanguage processing (NLP), primarily focused on extracting answers fromunstructured textual data. With the rapid advancements in Large Language Models(LLMs), LLM-ba… \ Source • arXiv cs.CL • 15:08
- Quality-Aware Translation Tagging in Multilingual RAG system \ Multilingual Retrieval-Augmented Generation (mRAG) often retrieves Englishdocuments and translates them into the query language for low-resourcesettings. However, poor translation quality degrades response generationperformance. Existing a… \ Source • arXiv cs.CL • 08:11
- RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation \ The pursuit of robot generalists - instructable agents capable of performingdiverse tasks across diverse environments - demands rigorous and scalableevaluation. Yet real-world testing of robot policies remains fundamentallyconstrained: it … \ Source • arXiv cs.LG • 18:41
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.
Don't miss what's next. Subscribe to Richard G: