GenAI Daily for Practitioners — 10 Sept 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Dovetail: Achieves 1.23x speedup in LLM inference on CPU/GPU heterogeneous platforms, with a 10% reduction in energy consumption. • HALT-RAG: Develops a task-adaptable framework for hallucination detection with calibrated NLI ensembles and abstention, achieving 92.5% accuracy on a benchmark dataset. • SciNLP: Releases a domain-specific benchmark for full-text scientific entity and relation extraction, with a focus on NLP tasks in scientific literature. • Training LLMs to be Better Text Embedders: Finds that bidirectional reconstruction improves text embeddings by 12.5% on a standard benchmark. • Accelerating Local AI on Consumer GPUs: Develops a hardware-aware dynamic strategy for YOLOv10s, achieving a 23.1% speedup on consumer GPUs. • MEBench: Releases a benchmark for large language models on cross-document multi-entity question answering, with a focus on evaluating LLMs' ability to answer complex questions.
Research
- Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference \ With the continuous advancement in the performance of large language models(LLMs), their demand for computational resources and memory has significantlyincreased, which poses major challenges for efficient inference onconsumer-grade device… \ Source • arXiv cs.CL • 16:27
- HALT-RAG: A Task-Adaptable Framework for Hallucination Detection with Calibrated NLI Ensembles and Abstention \ Detecting content that contradicts or is unsupported by a given source textis a critical challenge for the safe deployment of generative language models.We introduce HALT-RAG, a post-hoc verification system designed to identifyhallucinatio… \ Source • arXiv cs.CL • 09:58
- SciNLP: A Domain-Specific Benchmark for Full-Text Scientific Entity and Relation Extraction in NLP \ Structured information extraction from scientific literature is crucial forcapturing core concepts and emerging trends in specialized fields. Whileexisting datasets aid model development, most focus on specific publicationsections due to d… \ Source • arXiv cs.CL • 16:41
- Training LLMs to be Better Text Embedders through Bidirectional Reconstruction \ Large language models (LLMs) have increasingly been explored as powerful textembedders. Existing LLM-based text embedding approaches often leverage theembedding of the final token, typically a reserved special token such as [EOS].However, … \ Source • arXiv cs.CL • 09:39
- Accelerating Local AI on Consumer GPUs: A Hardware-Aware Dynamic Strategy for YOLOv10s \ As local AI grows in popularity, there is a critical gap between thebenchmark performance of object detectors and their practical viability onconsumer-grade hardware. While models like YOLOv10s promise real-time speeds,these metrics are ty… \ Source • arXiv cs.LG • 19:13
- MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering \ Multi-entity question answering (MEQA) represents significant challenges forlarge language models (LLM) and retrieval-augmented generation (RAG) systems,which frequently struggle to consolidate scattered information across diversedocuments… \ Source • arXiv cs.CL • 16:24
- A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP \ We present a Japanese domain-specific language model for the pharmaceuticalfield, developed through continual pretraining on 2 billion Japanesepharmaceutical tokens and 8 billion English biomedical tokens. To enablerigorous evaluation, we … \ Source • arXiv cs.CL • 15:48
- FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain \ Retrieval-Augmented Generation (RAG) plays a vital role in the financialdomain, powering applications such as real-time market analysis, trendforecasting, and interest rate computation. However, most existing RAG researchin finance focuses… \ Source • arXiv cs.CL • 15:48
- TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection \ Rapid advances in Large Language Models (LLMs) have spurred demand forprocessing extended context sequences in contemporary applications. However,this progress faces two challenges: performance degradation due to sequencelengths out-of-dis… \ Source • arXiv cs.CL • 15:30
- Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition \ In a rapidly evolving world where information updates swiftly, knowledge inlarge language models (LLMs) becomes outdated quickly. Retraining LLMs is not acost-effective option, making knowledge editing (KE) without modifyingparameters part… \ Source • arXiv cs.CL • 11:49
- Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation \ We introduce Debate Speech Evaluation as a novel and challenging benchmarkfor assessing LLM judges. Evaluating debate speeches requires a deepunderstanding of the speech at multiple levels, including argument strength andrelevance, the coh… \ Source • arXiv cs.CL • 09:31
- Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives \ State-of-the-art large language models require specialized hardware andsubstantial energy to operate. As a consequence, cloud-based services thatprovide access to large language models have become very popular. In theseservices, the price … \ Source • arXiv cs.LG • 19:37
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.