Weekly GitHub Report for Llama.cpp: November 17, 2025 - November 24, 2025 (12:01:55)

        November 24, 2025

Weekly GitHub Report for Llama.cpp: November 17, 2025 - November 24, 2025 (12:01:55)

            Weekly GitHub Report for Llama.cpp
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
The current version of this repository is b4991
1.2 Version Information:
The version released on March 29, 2025, introduces key updates that enhance overall performance and stability, with notable improvements in user interface responsiveness and security features. This release reflects a continued focus on optimizing user experience and safeguarding data integrity.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

Eval bug: Vulkan - Gemma3n-E2B-Q4_K_M model crashes llama-cli during evaluation [Intel IGPU]: This issue reports a crash occurring when running the llama-cli tool with the Gemma3n E2B Q4_K_M model on Intel integrated graphics using the Vulkan backend, specifically during longer token generation tasks. The problem appears to be linked to Intel GPU drivers or hardware, as the crash does not occur on NVIDIA or AMD GPUs, and setting an environment variable to disable integer dot product operations mitigates the crash in some versions, suggesting a driver or Vulkan interaction bug.

The discussion explores whether the crash is a GPU error or application fault, confirms it is Intel-specific, and rules out NVIDIA and AMD GPUs. Various versions and environment variable settings were tested, identifying a particular commit as a potential cause. Attempts to reproduce the issue on similar hardware yielded mixed results, with some systems stable and others crashing intermittently. Driver reinstallations and system cleanups were performed, and the issue remains open pending further investigation and potential fixes.
Number of comments this week: 13

Eval bug: Vulkan - Integer Dot is automatically disabled on the server docker image: This issue reports a problem where the integer dot product feature is automatically disabled when running the Vulkan backend inside the llama.cpp server Docker image, despite being enabled in official zip releases of the same version. The user traced the regression to changes in the Dockerfile around version 6999 and identified that the root cause is an outdated glslang compiler inside the Docker container, which lacks support for the required Vulkan extensions for integer dot product and bfloat16.

The discussion focused on verifying build configurations and Vulkan device capabilities, confirming the issue only occurs in the Docker environment. Investigation revealed the Docker image uses an older glslang version missing necessary Vulkan extension support, and upgrading the base Ubuntu version in the Dockerfile to 26.04 resolved the problem by providing updated shader compiler tools with the required features.
Number of comments this week: 13

Eval bug: Vulkan regression with -fa 1: This issue reports a Vulkan backend regression triggered by using the -fa 1 flag, which causes a crash with a vk::DeviceLostError on Intel integrated graphics when running a specific model benchmark. Additionally, after the crash was fixed, users observed a significant performance slowdown on Intel GPUs with -fa 1 enabled, while Nvidia GPUs showed a speedup, and this slowdown was confirmed by comparing benchmark results across different software versions.

The comments discuss a probable duplicate issue, confirm the crash fix, and highlight the unexpected performance degradation on Intel GPUs with -fa 1 enabled, supported by detailed benchmark comparisons between versions; further testing on newer packages is suggested to verify if the problem persists.
Number of comments this week: 8

Eval bug: AMD driver 25.11.1 crashes with llamacpp built with Vulkan SDK 1.4.328.1: This issue reports a crash occurring with the AMD driver version 25.11.1 when running llamacpp built with Vulkan SDK 1.4.328.1 on Windows, specifically triggered by using a context size greater than 512. The user provides detailed logs showing that enabling Vulkan validation layers prevents the crash, suggesting the problem may be related to a race condition or timeout in the AMD driver when commands are submitted too quickly without validation.

The discussion explores whether this is a regression and attempts to bisect the cause. It is noted that enabling Vulkan validation layers acts as a "speed bump" that prevents the crash by slowing command submission, implying a driver bug. The user confirms the issue occurs on Windows 11 with the latest Vulkan SDK and is asked if the problem persists on Vulkan SDK 26.04 packages, but no further resolution is provided.
Number of comments this week: 5

Refactor: reduce compile time of common/chat.cpp: This issue addresses the lengthy compile time of the common/chat.cpp file, which currently takes about 20 seconds to compile, causing a poor developer experience for contributors working on the chat parsing system. The author suggests refactoring approaches such as moving parsing functions to dedicated files or splitting the chat-template.hpp into separate header and implementation files to reduce compile time or improve code maintainability.

The comments discuss the possibility of splitting the Minja header and implementation files, with one user asking if this change would be accepted and another suggesting an optional split similar to another project. Additionally, a contributor expresses willingness to attempt the separation despite potential challenges.
Number of comments this week: 3

2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

Kompute-based Vulkan backend shows an GGML_OP_GET_ROWS error: This issue reports an error related to the Kompute-based Vulkan backend, specifically a GGML_OP_GET_ROWS error that does not occur with the other Vulkan backend. The problem has been open for over 600 days, indicating a persistent and unresolved bug affecting this particular implementation.
Question: How to generate an MPS gputrace: This issue is about a user seeking guidance on how to generate an MPS gputrace for the llama.cpp project during model inference, specifically to aid in improving the Metal backend. The user is looking for a documented or known method to produce debugger output similar to what is provided by Apple's Metal debugger in Xcode.
common: download from URL, improve parallel download progress status: This issue addresses the problem of conflicting progress indicators when downloading multiple files in parallel for sharded models, which was introduced in a previous update. It proposes improving the implementation of the download progress status by properly utilizing the CURLOPT_NOPROGRESS option in libcurl to ensure accurate and non-conflicting progress reporting during parallel downloads.
kubernetes example: This issue discusses the creation of a Kubernetes example for deploying the llama.cpp server using a Helm chart, aiming to provide the community with a scalable and standardized deployment method. The original poster has made initial progress but is seeking additional contributions and support to continue development when time permits.
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf: This issue reports a problem with loading the microsoft/bitnet-b1.58-2B-4T-gguf model using the llama-cli on a Windows system with an NVIDIA GeForce RTX 3060 GPU. The error occurs because a tensor in the model file has a number of elements per row that is not a multiple of the expected block size, causing the model loader to fail when reading tensor information.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 22
Summarized Issues:

GPU Backend Crashes and Errors: Multiple issues report crashes and errors related to GPU backends including Vulkan, CUDA, SYCL, and ROCm on various hardware such as Intel integrated graphics, AMD Radeon GPUs, and NVIDIA Volta GPUs. These problems manifest as device lost errors, memory allocation failures, invalid device functions, and driver crashes, severely impacting model execution and stability.  
[issues/17312, issues/17336, issues/17364, issues/17366, issues/17389, issues/17429, issues/17432, issues/17438]

llama-server Stability and API Issues: Several issues describe instability and bugs in the llama-server module, including HTTP 500 errors from inconsistent KV cache token positions, hanging POST requests, keep-alive connection header corruption, segmentation faults under heavy load, and crashes during image slice encoding. These problems cause server crashes, timeouts, and failed request handling, reducing reliability in production environments.  
[issues/17316, issues/17384, issues/17387, issues/17388, issues/17391, issues/17422]

Model and Feature Malfunctions: Some issues highlight malfunctioning model features such as the Qwen3-vl-4B high-resolution image token parameter not working as intended and tool engagement failures in retrieval-augmented generation setups with gpt-oss:20b. These bugs result in incorrect model outputs or no answers being returned, limiting usability for specific tasks.  
[issues/17345, issues/17410]

Build and Compilation Challenges: There are reports of long compile times due to large source files and the need for refactoring, as well as issues with Vulkan SDK and Docker environments causing feature regressions or driver crashes. These challenges affect developer experience and deployment consistency across platforms.  
[issues/17321, issues/17329, issues/17414]

Grammar Parsing and JSON Schema Bugs: Issues include improper escaping of backslash characters in JSON schema literals causing parsing failures, and incorrect handling of grammar-based generation continuation in the WebUI, leading to malformed outputs. These bugs disrupt expected parsing and generation workflows.  
[issues/17306, issues/17435]

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 19
Summarized Issues:

Vulkan Backend GPU Issues: Multiple issues report problems with the Vulkan backend on Intel GPUs, including severe slowdowns or hangs when using flash attention on Intel Iris Xe GPUs, gibberish output on Intel DG1 GPUs due to async tensor reads, and vk::DeviceLostError crashes during queue submission with flash attention enabled. These problems appear to be related to recent commits and driver or async code support, affecting stability and output correctness.  
issues/17297, issues/17302, issues/17334

Model Loading and Multimodal Support Errors: Several issues describe failures in loading models due to missing tensors or incomplete files, particularly affecting multimodal projection tensors and causing errors in llama-server but not in llama-cli. Additionally, users face problems uploading images without specifying required multimodal GGUF files, highlighting the need for proper multimodal support and file inclusion.  
issues/17327, issues/17367, issues/17407

Regression and Crash Bugs on CUDA and GPU Models: There are regression bugs causing crashes with Gemma 3n models on Nvidia RTX 4090/5090 GPUs using CUDA, and segmentation faults loading Gemma 2 series models on RTX 3060 GPUs, both resolved by disabling kernel fusion or reducing GPU layers. These issues indicate instability introduced in recent updates affecting CUDA backend performance and reliability.  
issues/17322, issues/17426

Concurrency and Slot Initialization Behavior: The llama server incorrectly defaults to starting four parallel slots instead of one when launched with -np 1 due to an internal override that sets n_parallel to 4 unless disabled by -kvu. This behavior causes unexpected concurrency and resource usage unless explicitly configured.  
issues/17300

Compilation and Build Failures: Multiple issues report compilation errors including undeclared variables causing build failures on Linux with CUDA and CPU backends, missing CURL development libraries causing build failures on Fedora, and ARM64 compilation failures due to unsupported SME CPU feature flags in GNU 13.3.0 compiler. These problems require code fixes, proper dependency installation, or compiler upgrades to resolve.  
issues/17341, issues/17372, issues/17403

Script and Conversion Tool Errors: Conversion scripts fail due to missing or incorrectly handled fields, such as a TypeError from a None model directory in convert_lora_to_gguf.py and a KeyError in convert_hf_to_gguf.py caused by fallback to a missing "num_hidden_layers" field instead of "num_layers". These issues require improved input validation and support for newer model configurations.  
issues/17350, issues/17358

Integer Overflow and Memory Allocation Vulnerabilities: Critical bugs in the grammar syntax parser and llama-server module involve unvalidated large numeric ranges causing excessive resource usage and signed integer overflow in memory allocation size calculations. These vulnerabilities can lead to denial-of-service or server errors and require safer numeric types and input validation to mitigate.  
issues/17352, issues/17355

Vision-Language Model Evaluation Regressions: A regression introduced by commit 4db5641 causes incorrect and unrelated image descriptions during vision-language model evaluation on Android devices using OpenCL on Qualcomm Adreno GPUs, severely impacting the accuracy of image captioning with the Qwen2.5-VL-3B model.  
issues/17351

Web UI Context Retention Bug: The new web UI for the Qwen3-30B-A3B-Thinking model fails to retain prior context when interrupting the model’s thinking process and sending a new prompt, causing loss of previously generated thought tokens. This contrasts with the old web UI which preserved and continued reasoning across interactions.  
issues/17430

Build Process Inquiry: A user discusses the simplicity of building the Linux version via a bash script and inquires about the possibility of building a Windows version from within Linux, indicating interest in cross-platform build support.  
issues/17373

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 37
Key Open Pull Requests
1. server/public_simplechat alternate web client ui update - uncompressed 300KB - built in client side tool calls with 0 setup - reasoning - vision - ai calling into itself: This pull request introduces a lightweight, alternate pure HTML/CSS/JS web client UI for the llama.cpp project that rebases to the latest master, cleans up the UI to free vertical space, supports multiple images per message, and integrates built-in client-side tool calling, reasoning, and vision features with zero setup, all within an uncompressed 300KB source size, aiming to provide a simple, flexible, and functional option alongside the default heavier Svelte-based web UI.

URL: pull/17415

Merged: No

Associated Commits: 9953d, bb6b3, 8f460, 25b0e, 83786, d0156, 946af, 0a4d6, 2c69c, 1df42, 4276f, dfc31, 374af, a4721, 1e43c, b3187, 73fea, eaff5, 0273e, b97b3, 2894e, e29db, f52b9, 94a81, ee4b4, 7b9fd, d4350, d486e, 004b3, 83cfc, a4366, 48142, a40a8, 87726, af0b3, 1fa3c, ddc2e, 1ce5e, 69c6a, 22cca, 1e580, d865b, bbf6e, f9c62, 95f75, af38f, 64ccf, 5354a, 6a44d, d97f2, e15f5, d9b35, ce71a, c0b42, 00c3d, 3cbc9, 1452c, f77b2, 4db7b, 1c5fd, c0d7d, f20ac, 800d3, 48c82, 4ed20, b4a72, 95787, 4d9b9, b6660, a9a3f, 2e5d7, f8bce, af004, e173a, 7e9df, efb63, fe89c, e7167, 52f55, 921f5, 20a38, 27a9f, edaeb, 5fb94, 0f2e0, 308ee, 6d26b, 32b3c, 66804, 4292e, 5a60b, bb4f0, d2281, ce0e7, f1fdc, a3248, 10c72, 16499, 94f37, 64439, 4a071, cfee2, 361d6, 7c4d4, 40829, dfb15, 93057, 136dd, bd85d, 8650a, 6f52e, 21845, 6c50b, ac716, 6e1d0, 0d06d, 772d2, 50fa2, ed391, 9a8ff, bf890, 0ec8f, fc8f1, 714c1, 5ea26, 1cb0d, 22809, 2097c, 45a8c, c30a2, 14aa4, eee88, 619a9, 1efff, 84a83, ba68d, 44dfe, 37ddd, 389ab, 295df, 9d704, 11109, 24ea7, f64d3, 33370, 76fde, 8ef05, 0f876, b14c5, 65e0d, b780a, 06e88, 4f004, 9ee86, 34347, 067a1, e0274, 17748, 98674, e7888, ef326, d1886, 02fd8, e1880, 99f2b, ffb9f, 7987a, 4d0f0, 323c1, 16f43, 2c95f, 7a8cb, 42ce8, 2acc2, bd6fe, 3ada2, 7765c, 60fae, 09220, 7a640, ef5e7, 919ff, 9305e, 5d9a9, 5208e, 512b8, dc637, c8d33, 14e49, 9936c, c71cc, 6f2f7, 70fc4, 716a7, 818e2, 8f2df, 722e5, 6e5ea, 280f6, 705f7, 4b0f3, 317fc, 82187, cbd87, 1d911, 2175b, 4d62e, e518b, 854ab, 0d564, 62bb6, 9319e, c5c25, cc805, 202d4, 4a36e, 667b8, c0fac, 21520, 082a9, 61095, d11b7, 5830a, 9c652, 04be0, e5275, 7df43, 2c3a6, 3d1ee, 2ff76, bb4a3, b4ba7, 0e0ae, 34064, 3eda5, 1a824, f8c50, ea25b, c711b, 9fbd5, a709b, 48111, 06663, 533c8, a9c7f, b90dc, d6537, a04fe, 81106, 12295

2. mtmd: Add DeepSeekOCR Support: This pull request adds support for DeepSeekOCR to the llama.cpp project, including implementation of the DeepSeek3B-MoE-A570M language model component, vision model processing fixes, and integration of related image encoding and attention mechanisms.

URL: pull/17400

Merged: No

Associated Commits: 43a13, b6b9f, 85c7c, 578c8, 2aab5, eab28, 76305, 2de34, e8b26, 97e09, 13dc6, b32bb, 790bb, cec9a, 8b3d3, 1e081, 331ce, 6c071, a65dd, 63a04, 89afd, 88032, 1268d, 68b20, 8bce6, 5e6cf, 7e9fb, 0f558, 7b8d7, 86f11, effe6, 3fcfc, 4cfa1

3. [model] Add support for Plamo3: This pull request adds support for the PLaMo-3 series of base models by integrating their hybrid architecture, which includes Sliding Window Attention and custom feed-forward network layouts, into llama.cpp to enable conversion of official checkpoints to GGUF format and compatibility with existing backends.

URL: pull/17304

Merged: No

Associated Commits: f61ed, c3b61, ce7a9, 1ab3b, d9854, 8dbbe, 96781, 74fa9, 33910, 037d8, 4d0be, 80c34, 9cecb, 3873e, 8b928, 0df52, cdb1d, 527c6, 0f9d0, 5d52f, dab7a, 9bd33, 67a6d, d965f

Other Open Pull Requests

Diffusion Language Model Support: This pull request adds support for the RND1 Diffusion Language Model to the llama.cpp project, including conversion to GGUF weights and implementation of diffusion-based text generation. It also provides detailed instructions and performance benchmarks across various hardware platforms.  
pull/17433

Vulkan Backend Enhancements: Multiple pull requests improve Vulkan backend functionality by implementing a top-k selection algorithm with workgroup-based sorting and filtering, adding support for the GGML_OP_GET_REL_POS operator for F16 and F32 data types, and introducing a precision fix in the compute shader. These changes include related support in ggml and metal backends, cleanup, testing enhancements, and depend on prior testing improvements.  
pull/17418, pull/17417

Intel GPU Performance Optimization: A pull request proposes changing the default subgroup and block size from 32 to 16 specifically for Intel GPUs, resulting in a 1.2 to 1.5 times performance improvement on certain models and GPU generations. This includes several commits to implement, test, and refine the adjustment.  
pull/17374

RISC-V Vector Floating-Point Support: This pull request extends the existing RISC-V Vector (RVV) floating-point support in the ggml-cpu module by adding a BF16 RVV flag to enable the zvfbfwma extension and introducing six new floating-point kernels. The new kernels were functionally tested on QEMU across various vector lengths and input sizes.  
pull/17318

Window Operations and Relative Position Embeddings: Enhancements to window operations and relative position embeddings in CPU and CUDA backends include batching support, extended data type compatibility to F16/BF16, CUDA support, scaling for different query/key lengths, and new tests. These improvements are critical for SAM and DeepSeek-OCR functionality.  
pull/17383

CUDA Backend Copy Support and Refactoring: This pull request adds support for copying non-contiguous 32-bit integer data to 32-bit integer destinations in the CUDA backend, includes related tests, and refactors the copy function for better clarity.  
pull/17326

Server Code Refactoring: The server.cpp file is refactored by splitting its code into smaller components—server-common, server-task, and server-queue—to modularize utility functions, task serialization/deserialization, and mutex-related task queue management. This improves code structure and maintainability.  
pull/17362

Kimi-K2 Model Tool-Call Parsing Fixes: This draft pull request implements multiple fixes and tests addressing ongoing issues with tool-call parsing in the Kimi-K2 model. It remains in draft status to gather community bug reports and reproduction cases due to hardware limitations preventing full reproduction or testing by the author.  
pull/17376

Hexagon DSP Support and Backend Fixes: Initial support for Hexagon DSP versions v68 and v69 is introduced in the ggml backend, enabling model execution on these platforms despite slow performance. The pull request also fixes build errors, VTCM acquire failure checks, and adjusts for memory page size constraints.  
pull/17394

Android Binding Rewrite: The Android binding for llama.cpp is rewritten by removing the cpu_features dependency and implementing dynamic native library loading for advanced acceleration on Aarch64 and x86_64 architectures. The redesign includes a new C++ layer and JNI bridge supporting features like automatic message role formatting, system prompt injection, context overflow handling, batch decoding, engine state exposure, new APIs, GGUF metadata parsing utilities, and performance optimizations.  
pull/17413

Anthropic Messages API Integration: Support for the Anthropic Messages API is added to llama-server by implementing endpoints for chat completions and token counting with streaming, tool use, vision support, system prompts, and extended parameters. The pull request converts Anthropic's message format to an OpenAI-compatible internal format to reuse the existing inference pipeline.  
pull/17425

Hexagon Backend README Improvements: The Hexagon backend README is improved by adding a step to explicitly create the target directory on the Android device before pushing GGUF files and correcting the example command to properly escape double quotes around the prompt string. These changes enhance clarity and accuracy for deploying and running the software on Snapdragon-based Android devices.  
pull/17370

JSON Schema and Locale Fixes: Two pull requests fix issues related to data formatting: one corrects the JSON schema to properly escape the backslash character in literals, and another fixes locale-dependent float printing in GGUF metadata by replacing std::to_string with std::ostringstream using std::locale::classic() to ensure consistent decimal separators.  
pull/17307, pull/17331

Eagle2-VL Multimodal Model Support: Initial support for the Eagle2-VL multimodal models (1B and 2B) is added to the MTMD pipeline with a dedicated converter, runtime builder, and loader enhancements for the Eagle2-VL vision tower and its 2-layer projector. This integration does not affect existing model architectures.  
pull/17360

Performance Graph and Test Configuration Updates: Updates to the worst-case performance graph for the unified cache address related issues and improve test configurations by disabling operation offload in certain tests.  
pull/17379

Metrics Endpoint Fix: The /metrics endpoint issue where Prometheus-format text was incorrectly JSON-escaped and wrapped in double quotes is fixed by adding and modifying an ok() method overload to ensure proper formatting for Prometheus parsing.  
pull/17386

Multiple Checkpoints in llama-server: This pull request introduces the creation of multiple checkpoints during prompt processing in llama-server to improve usability and prevent loss of progress, replacing the previous approach of generating only a single checkpoint after processing many tokens.  
pull/17428

s390x Architecture Fixes for convert_hf_to_gguf.py: The convert_hf_to_gguf.py script is fixed on s390x architecture by correctly handling byte order conversions, assuming little-endian model data and performing necessary byteswaps after reading. GGUFWriter is modified to accept tensors in native endianness to avoid redundant byteswaps, and inplace byteswap calls on lazy tensor and array wrappers are replaced with copying byteswaps.  
pull/17431

RDNA4 GPU Optimization: Support for the mul_mat_f operation is enabled for RDNA4 GPUs, and performance is optimized by moving workloads with n >= 3 from the mmvf backend to the mmf backend. A rarely executed branch prompts the ROCm compiler to generate more efficient code, with performance improvements demonstrated through extensive benchmarking on an RX 9070 XT GPU.  
pull/17437

Core Scaling and Synchronization Improvements: A core scaling issue caused by cache-line contention when running llama-bench on large core count machines with small batch sizes is addressed by implementing synchronization and partitioning improvements. These changes result in throughput gains of 2% to 44% for the Qwen3 30B parameter model.  
pull/17342

Facebook Nougat OCR Model Support: Support for Facebook's Nougat OCR model is added by implementing the mBART encoder/decoder and Swin Transformer vision encoder architectures with cross-attention. The pull request includes model conversion scripts, a CLI tool for OCR processing with multiple output formats, GPU acceleration, and comprehensive documentation for academic document understanding including formulas, tables, and complex layouts.  
pull/17398

CANN Backend Enhancements: Optimized caching logic for rope_cache_init and support for mRoPE and i-mRoPE are added in the CANN backend, with notes on specific configuration requirements and ongoing investigations for Ascend 910B devices. Additionally, support for the out_prod operator handling both F32 and F16 floating-point product calculations is introduced.  
pull/17401, pull/17406

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 48
Key Closed Pull Requests
1. ggml-hexagon: fix swiglu failure at test-backend-ops: This pull request fixes failures in the Hexagon backend's swiglu and silu operations by adding overflow-guarded HVX primitives, improving NaN/Inf handling in exponential and inverse functions, and correcting implementation mistakes to enhance the accuracy and stability of these operations during testing.

URL: pull/17344

Merged: Yes

Associated Commits: ab752, 5aa4a, a6415, a8cdb, ae42f, 39445, a589b, 6f57b, db9e9, ce48a, fc5f3, 57073, 54235, 38594, 014ad, 8c374, 33a05, 83884, f7662, 5f553, f6d7f, 37e9a, 185dc, 6d887, e07cb, 55cea

2. vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC: This pull request implements several Vulkan operations including ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, and TRUNC in the llama.cpp project, with mostly mechanical changes except for ROUND which lacks a direct Vulkan equivalent.

URL: pull/17319

Merged: Yes

Associated Commits: a1695, 5fb63, e8b2b, 0db53, fd3da, bb121, b28b7, 1c67b, cfb8a, 39cbf, f68ed

3. common : more accurate sampling timing: This pull request improves the accuracy of timing measurements in the common sampling code by separately reporting the time spent in sampling, the llama_sampler, and unaccounted time, thereby providing more detailed performance insights.

URL: pull/17382

Merged: Yes

Associated Commits: 81f23, cecc8, bc858, 22e99, c0518, 96566, 4c266, f99ce

Other Closed Pull Requests

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

hanishkvc
625
7
1
3

hanyin-arm
250
1
0
1

ngxson
116
10
4
66

ggerganov
89
18
2
82

pwilkin
46
5
3
81

jeffbolznv
39
11
0
68

CISC
25
2
0
78

0cc4m
30
6
0
60

aldehir
63
1
1
26

am17an
30
5
1
46

                            Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
hanishkvc	625	7	1	3
hanyin-arm	250	1	0	1
ngxson	116	10	4	66
ggerganov	89	18	2	82
pwilkin	46	5	3	81
jeffbolznv	39	11	0	68
CISC	25	2	0	78
0cc4m	30	6	0	60
aldehir	63	1	1	26
am17an	30	5	1	46