GenAI Daily for Practitioners — 17 Sept 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking: • + Improved retrieval-augmented generation performance by 10-20% using hierarchical chunking. • + No significant increase in computational cost or memory usage. • The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations: • + Developed a method to estimate question difficulty using LLM's hidden representations with 85% accuracy. • + Potential application in question answering and dialogue systems.
Research
- HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking \ Retrieval-Augmented Generation (RAG) enhances the response capabilities oflanguage models by integrating external knowledge sources. However, documentchunking as an important part of RAG system often lacks effective evaluationtools. This p… \ Source • arXiv cs.CL • 14:36
- The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations \ Estimating the difficulty of input questions as perceived by large languagemodels (LLMs) is essential for accurate performance evaluation and adaptiveinference. Existing methods typically rely on repeated response sampling,auxiliary models… \ Source • arXiv cs.CL • 11:38
- Zero-shot Graph Reasoning via Retrieval Augmented Framework with LLMs \ We propose a new, training-free method, Graph Reasoning via RetrievalAugmented Framework (GRRAF), that harnesses retrieval-augmented generation(RAG) alongside the code-generation capabilities of large language models(LLMs) to address a wid… \ Source • arXiv cs.CL • 08:58
- Evaluating LLM Alignment on Personality Inference from Real-World Interview Data \ Large Language Models (LLMs) are increasingly deployed in roles requiringnuanced psychological understanding, such as emotional support agents,counselors, and decision-making assistants. However, their ability to interprethuman personality… \ Source • arXiv cs.CL • 18:54
- From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models \ Iterative evaluation of LLMs during training is essential to ensure expectedcapability development, but can be time- and compute-intensive. While NLUtasks, where the model selects from fixed answer choices, are cheap toevaluate, essential … \ Source • arXiv cs.CL • 15:10
- Conan-Embedding-v2: Training an LLM from Scratch for Text Embeddings \ Large language models (LLMs) have recently demonstrated excellent performancein text embedding tasks. Previous work usually use LoRA to fine-tune existingLLMs, which are limited by the data and training gap between LLMs and embeddingmodels… \ Source • arXiv cs.CL • 11:48
- InfoGain-RAG: Boosting Retrieval-Augmented Generation via Document Information Gain-based Reranking and Filtering \ Retrieval-Augmented Generation (RAG) has emerged as a promising approach toaddress key limitations of Large Language Models (LLMs), such as hallucination,outdated knowledge, and lacking reference. However, current RAG frameworksoften strug… \ Source • arXiv cs.CL • 09:28
- Learning from Heterophilic Graphs: A Spectral Theory Perspective on the Impact of Self-Loops and Parallel Edges \ Graph heterophily poses a formidable challenge to the performance ofMessage-passing Graph Neural Networks (MP-GNNs). The familiar low-pass filterslike Graph Convolutional Networks (GCNs) face performance degradation, whichcan be attributed… \ Source • arXiv cs.LG • 16:54
- EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models \ With the development and widespread application of large language models(LLMs), the new paradigm of "Model as Product" is rapidly evolving, and demandshigher capabilities to address complex user needs, often requiring preciseworkflow execu… \ Source • arXiv cs.CL • 15:19
- Multi-Model Synthetic Training for Mission-Critical Small Language Models \ Large Language Models (LLMs) have demonstrated remarkable capabilities acrossmany domains, yet their appli- cation to specialized fields remains constrainedby the scarcity and complexity of domain-specific training data. We present anovel … \ Source • arXiv cs.CL • 15:04
- ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions \ Most existing Theory of Mind (ToM) benchmarks for foundation models rely onvariations of the Sally-Anne test, offering only a very limited perspective onToM and neglecting the complexity of human social interactions. To address thisgap, we… \ Source • arXiv cs.CL • 14:22
- Jailbreaking Large Language Models Through Content Concretization \ Large Language Models (LLMs) are increasingly deployed for task automationand content generation, yet their safety mechanisms remain vulnerable tocircumvention through different jailbreaking techniques. In this paper, weintroduce \textit{C… \ Source • arXiv cs.CL • 12:34
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.