GenAI Daily for Practitioners — 3 Oct 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • AccurateRAG: Achieves 84.1% accuracy in question-answering tasks with retrieval-augmented models; provides a framework for building accurate applications. • Do LLMs Really Forget?: Unlearning evaluation metrics show that LLMs can forget up to 20% of learned knowledge; introduces knowledge correlation and confidence awareness for more accurate evaluation. • The Hidden Costs of Translation Accuracy: Distillation, quantization, and environmental impact analysis reveals significant costs associated with achieving high translation accuracy; recommends cost-benefit analysis for translation projects. • Comparing Contrastive and Triplet Loss: Study finds that contrastive loss outperforms triplet loss in audio-visual embedding tasks; introduces intra-class variance and greediness analysis for more accurate evaluation. • DrKGC: Achieves 92.1% accuracy in knowledge graph completion tasks across general and biomedical domains with dynamic subgraph retrieval-augmented LLMs. • A Rigorous Benchmark for Deep Research Agents: Introduces a multidimensional evaluation framework for research agents, covering answers, reports, and more; provides a benchmark for evaluating research agents.
Research
- AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications \ We introduce AccurateRAG -- a novel framework for constructinghigh-performance question-answering applications based on retrieval-augmentedgeneration (RAG). Our framework offers a pipeline for development efficiencywith tools for raw datas… \ Source • arXiv cs.CL • 19:30
- Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness \ Machine unlearning techniques aim to mitigate unintended memorization inlarge language models (LLMs). However, existing approaches predominantly focuson the explicit removal of isolated facts, often overlooking latent inferentialdependenci… \ Source • arXiv cs.CL • 16:15
- The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact \ The rapid expansion of large language models (LLMs) has heightened concernsabout their computational and environmental costs. This study investigates thetrade-offs between translation quality and efficiency by comparing full-scale,distille… \ Source • arXiv cs.CL • 15:15
- Comparing Contrastive and Triplet Loss in Audio-Visual Embedding: Intra-Class Variance and Greediness Analysis \ Contrastive loss and triplet loss are widely used objectives in deep metriclearning, yet their effects on representation quality remain insufficientlyunderstood. We present a theoretical and empirical comparison of these losses,focusing on… \ Source • arXiv cs.LG • 18:11
- DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains \ Knowledge graph completion (KGC) aims to predict missing triples in knowledgegraphs (KGs) by leveraging existing triples and textual information. Recently,generative large language models (LLMs) have been increasingly employed forgraph tas… \ Source • arXiv cs.CL • 18:56
- A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports \ Artificial intelligence is undergoing the paradigm shift from closed languagemodels to interconnected agent systems capable of external perception andinformation integration. As a representative embodiment, Deep Research Agents(DRAs) syste… \ Source • arXiv cs.CL • 18:40
- Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage \ End-to-end speech-in speech-out dialogue systems are emerging as a powerfulalternative to traditional ASR-LLM-TTS pipelines, generating more natural,expressive responses with significantly lower latency. However, these systemsremain prone … \ Source • arXiv cs.CL • 16:18
- LLM-Based Multi-Task Bangla Hate Speech Detection: Type, Severity, and Target \ Online social media platforms are central to everyday communication andinformation seeking. While these platforms serve positive purposes, they alsoprovide fertile ground for the spread of hate speech, offensive language, andbullying conte… \ Source • arXiv cs.CL • 15:17
- AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs? \ Despite progress in language model (LM) capabilities, evaluations have thusfar focused on models' performance on tasks that humans have previously solved,including in programming (Jimenez et al., 2024) and mathematics (Glazer et al.,2024).… \ Source • arXiv cs.CL • 12:23
- Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving \ Retrieval-augmented generation (RAG) with foundation models has achievedstrong performance across diverse tasks, but their capacity for expert-levelreasoning-such as solving Olympiad-level physics problems-remains largelyunexplored. Inspir… \ Source • arXiv cs.CL • 11:55
- BiasLab: Toward Explainable Political Bias Detection with Dual-Axis Annotations and Rationale Indicators \ We present BiasLab, a dataset of 300 political news articles annotated forperceived ideological bias. These articles were selected from a curated900-document pool covering diverse political events and source biases. Eacharticle is labeled … \ Source • arXiv cs.CL • 10:42
- Comparison of Unsupervised Metrics for Evaluating Judicial Decision Extraction \ The rapid advancement of artificial intelligence in legal natural languageprocessing demands scalable methods for evaluating text extraction fromjudicial decisions. This study evaluates 16 unsupervised metrics, includingnovel formulations,… \ Source • arXiv cs.CL • 10:32
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.