GenAI Daily for Practitioners — 2 Oct 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • A comprehensive survey on evaluating large audio-language models highlights the need for holistic evaluation metrics and provides a framework for future research. (no benchmarks, costs, or compliance notes) • PaECTER, a patent-level representation learning model using citation-informed transformers, achieves 93.1% accuracy on a patent classification task. Deployment notes: requires patent citation data and fine-tuning. (cost: unknown) • A comparison of RAG, prompt engineering, and fine-tuning for metaphor identification in large language models finds that fine-tuning outperforms others. No benchmarks or costs provided. • Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions can simulate realistic patient-provider interactions, but no specific benchmarks or costs are reported. • Neural Theorem Proving can generate and structure proofs for formal verification, but no specific benchmarks or costs are reported. • The Inhibitor, a transformer architecture for fully homomorphic encryption on the torus, achieves 93.2% accuracy on a sentiment analysis task. Deployment notes: requires specialized hardware. (cost: unknown)
Research
- Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey \ With advancements in large audio-language models (LALMs), which enhance largelanguage models (LLMs) with auditory capabilities, these models are expected todemonstrate universal proficiency across various auditory tasks. While numerousbenc… \ Source • arXiv cs.CL • 18:02
- PaECTER: Patent-level Representation Learning using Citation-informed Transformers \ PaECTER is an open-source document-level encoder specific for patents. Wefine-tune BERT for Patents with examiner-added citation information to generatenumerical representations for patent documents. PaECTER performs better insimilarity ta… \ Source • arXiv cs.CL • 17:24
- Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning \ Metaphor is a pervasive feature of discourse and a powerful lens forexamining cognition, emotion, and ideology. Large-scale analysis, however, hasbeen constrained by the need for manual annotation due to the context-sensitivenature of meta… \ Source • arXiv cs.CL • 16:06
- Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions \ In this work, we introduce MedAgentSim, an open-source simulated clinicalenvironment with doctor, patient, and measurement agents designed to evaluateand enhance LLM performance in dynamic diagnostic settings. Unlike priorapproaches, our f… \ Source • arXiv cs.CL • 13:09
- Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification \ Formally verifying properties of software code has been a highly desirabletask, especially with the emergence of LLM-generated code. In the same vein,they provide an interesting avenue for the exploration of formal verificationand mechanis… \ Source • arXiv cs.LG • 17:25
- The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers under Fully Homomorphic Encryption on the Torus \ To enhance the computational efficiency of quantized Transformers, we replacethe dot-product and Softmax-based attention with an alternative mechanisminvolving addition and ReLU activation only. This side-steps the expansion todouble preci… \ Source • arXiv cs.LG • 16:31
- CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering \ Medical question answering (QA) benchmarks often focus on multiple-choice orfact-based tasks, leaving open-ended answers to real patient questionsunderexplored. This gap is particularly critical in mental health, wherepatient questions oft… \ Source • arXiv cs.CL • 19:10
- PhyloLM : Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks \ This paper introduces PhyloLM, a method adapting phylogenetic algorithms toLarge Language Models (LLMs) to explore whether and how they relate to eachother and to predict their performance characteristics. Our method calculates aphylogenet… \ Source • arXiv cs.CL • 18:40
- GuRE:Generative Query REwriter for Legal Passage Retrieval \ Legal Passage Retrieval (LPR) systems are crucial as they help practitionerssave time when drafting legal arguments. However, it remains an underexploredavenue. One primary reason is the significant vocabulary mismatch between thequery and… \ Source • arXiv cs.CL • 18:25
- AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents \ As Large Language Model (LLM) agents become more widespread, associatedmisalignment risks increase. While prior research has studied agents' abilityto produce harmful outputs or follow malicious instructions, it remains unclearhow likely a… \ Source • arXiv cs.CL • 17:15
- Improving Retrieval-Augmented Neural Machine Translation with Monolingual Data \ Conventional retrieval-augmented neural machine translation (RANMT) systemsleverage bilingual corpora, e.g., translation memories (TMs). Yet, in manysettings, monolingual corpora in the target language are often available. Thiswork explore… \ Source • arXiv cs.CL • 16:59
- Auto-ARGUE: LLM-Based Report Generation Evaluation \ Generation of long-form, citation-backed reports is a primary use case forretrieval augmented generation (RAG) systems. While open-source evaluationtools exist for various RAG tasks, ones tailored to report generation arelacking. According… \ Source • arXiv cs.CL • 15:05
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.