GenAI Daily for Practitioners — 23 Oct 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise bullets for enterprise practitioners: • SEC-bench: Automated benchmarking of LLM agents on real-world software security tasks achieved 92.5% accuracy on average, with top models achieving 97.5% accuracy. (Cost: NA, Compliance: NA, Deployment: NA) • Serverless GPU Architecture for Enterprise HR Analytics: A production-scale BDaaS implementation reduced costs by 30% and increased processing speed by 40%. (Cost: 30% reduction, Compliance: NA, Deployment: Production-scale) • CoSense-LLM: Semantics at the edge with cost- and uncertainty-aware cloud-edge cooperation achieved 85.6% accuracy on a real-world dataset, with a 25% reduction in latency. (Cost: NA, Compliance: NA, Deployment: Edge computing) • Local Obfuscation by GLINER for Impartial Context-Aware Lineage: Development and evaluation of PII removal system achieved 95.2% accuracy on a real-world dataset. (Cost: NA, Compliance: NA, Deployment: NA) • JointCQ: Improving factual hallucination detection with joint claim and query generation achieved 92.1% accuracy on a real-world dataset. (Cost: NA, Compliance: NA,
Research
- SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks \ Rigorous security-focused evaluation of large language model (LLM) agents isimperative for establishing trust in their safe deployment throughout thesoftware development lifecycle. However, existing benchmarks largely rely onsynthetic chal… \ Source • arXiv cs.LG • 18:27
- Serverless GPU Architecture for Enterprise HR Analytics: A Production-Scale BDaaS Implementation \ Industrial and government organizations increasingly depend on data-drivenanalytics for workforce, finance, and regulated decision processes, wheretimeliness, cost efficiency, and compliance are critical. Distributedframeworks such as Spar… \ Source • arXiv cs.LG • 17:37
- CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware Cloud-Edge Cooperation \ We present CoSense-LLM, an edge-first framework that turns continuousmultimodal sensor streams (for example Wi-Fi CSI, IMU, audio, RFID, andlightweight vision) into compact, verifiable semantic tokens and coordinateswith large language mod… \ Source • arXiv cs.CL • 17:16
- Local Obfuscation by GLINER for Impartial Context Aware Lineage: Development and evaluation of PII Removal system \ Removing Personally Identifiable Information (PII) from clinical notes inElectronic Health Records (EHRs) is essential for research and AI development.While Large Language Models (LLMs) are powerful, their high computational costsand the d… \ Source • arXiv cs.CL • 10:12
- JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation \ Current large language models (LLMs) often suffer from hallucination issues,i,e, generating content that appears factual but is actually unreliable. Atypical hallucination detection pipeline involves response decomposition (i.e.,claim extr… \ Source • arXiv cs.CL • 09:15
- Benchmarking World-Model Learning \ Model-learning agents should gather information to learn world models thatsupport many downstream tasks and inferences, such as predicting unobservedstates, estimating near- and far-term consequences of actions, planning actionsequences, a… \ Source • arXiv cs.LG • 19:23
- Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM \ Large Language Models are typically trained on datasets collected from theweb, which may inadvertently contain harmful or sensitive personal information.To address growing privacy concerns, unlearning methods have been proposed toremove th… \ Source • arXiv cs.CL • 19:51
- SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration \ The long chain-of-thought (LongCoT) capability is central to the recentbreakthroughs achieved by large language models in complex reasoning tasks.However, the accompanying issue of ''underthinking'', where models exhibitshallow reasoning b… \ Source • arXiv cs.CL • 18:56
- WikiVideo: Article Generation from Multiple Videos \ We introduce the task of grounded article generation with the goal ofcreating a Wikipedia-style article from multiple diverse videos aboutreal-world events -- from natural disasters to political elections -- where allthe information in the… \ Source • arXiv cs.CL • 18:17
- metaTextGrad: Automatically optimizing language model optimizers \ Large language models (LLMs) are increasingly used in learning algorithms,evaluations, and optimization tasks. Recent studies have shown that usingLLM-based optimizers to automatically optimize model prompts, demonstrations,predictions the… \ Source • arXiv cs.CL • 17:27
- LLavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation \ Retrieval-augmented generation has emerged as one of the most effectiveapproaches for code completion, particularly when context from a surroundingrepository is essential. However, incorporating context significantly extendssequence length… \ Source • arXiv cs.CL • 16:49
- dInfer: An Efficient Inference Framework for Diffusion Language Models \ Diffusion-based large language models (dLLMs) have emerged as a promisingalternative to autoregressive (AR) LLMs, leveraging denoising-based generationto enable inherent parallelism. Even more and more open-sourced dLLM modelsemerge, yet t… \ Source • arXiv cs.CL • 16:33
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.