Weekly GitHub Report for Llama.cpp: November 17, 2025 - November 24, 2025 (12:01:55)
Weekly GitHub Report for Llama.cpp
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
The current version of this repository is b4991
1.2 Version Information:
The version released on March 29, 2025, introduces key updates that enhance overall performance and stability, with notable improvements in user interface responsiveness and security features. This release reflects a continued focus on optimizing user experience and safeguarding data integrity.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
Eval bug: Vulkan - Gemma3n-E2B-Q4_K_M model crashes llama-cli during evaluation [Intel IGPU]: This issue reports a crash occurring when running the llama-cli tool with the Gemma3n E2B Q4_K_M model on Intel integrated graphics using the Vulkan backend, specifically during longer token generation tasks. The problem appears to be linked to Intel GPU drivers or hardware, as the crash does not occur on NVIDIA or AMD GPUs, and setting an environment variable to disable integer dot product operations mitigates the crash in some versions, suggesting a driver or Vulkan interaction bug.
- The discussion explores whether the crash is a GPU error or application fault, confirms it is Intel-specific, and rules out NVIDIA and AMD GPUs. Various versions and environment variable settings were tested, identifying a particular commit as a potential cause. Attempts to reproduce the issue on similar hardware yielded mixed results, with some systems stable and others crashing intermittently. Driver reinstallations and system cleanups were performed, and the issue remains open pending further investigation and potential fixes.
- Number of comments this week: 13
-
Eval bug: Vulkan - Integer Dot is automatically disabled on the server docker image: This issue reports a problem where the integer dot product feature is automatically disabled when running the Vulkan backend inside the llama.cpp server Docker image, despite being enabled in official zip releases of the same version. The user traced the regression to changes in the Dockerfile around version 6999 and identified that the root cause is an outdated glslang compiler inside the Docker container, which lacks support for the required Vulkan extensions for integer dot product and bfloat16.
- The discussion focused on verifying build configurations and Vulkan device capabilities, confirming the issue only occurs in the Docker environment. Investigation revealed the Docker image uses an older glslang version missing necessary Vulkan extension support, and upgrading the base Ubuntu version in the Dockerfile to 26.04 resolved the problem by providing updated shader compiler tools with the required features.
- Number of comments this week: 13
-
Eval bug: Vulkan regression with -fa 1: This issue reports a Vulkan backend regression triggered by using the
-fa 1flag, which causes a crash with avk::DeviceLostErroron Intel integrated graphics when running a specific model benchmark. Additionally, after the crash was fixed, users observed a significant performance slowdown on Intel GPUs with-fa 1enabled, while Nvidia GPUs showed a speedup, and this slowdown was confirmed by comparing benchmark results across different software versions.- The comments discuss a probable duplicate issue, confirm the crash fix, and highlight the unexpected performance degradation on Intel GPUs with
-fa 1enabled, supported by detailed benchmark comparisons between versions; further testing on newer packages is suggested to verify if the problem persists. - Number of comments this week: 8
- The comments discuss a probable duplicate issue, confirm the crash fix, and highlight the unexpected performance degradation on Intel GPUs with
-
Eval bug: AMD driver 25.11.1 crashes with llamacpp built with Vulkan SDK 1.4.328.1: This issue reports a crash occurring with the AMD driver version 25.11.1 when running llamacpp built with Vulkan SDK 1.4.328.1 on Windows, specifically triggered by using a context size greater than 512. The user provides detailed logs showing that enabling Vulkan validation layers prevents the crash, suggesting the problem may be related to a race condition or timeout in the AMD driver when commands are submitted too quickly without validation.
- The discussion explores whether this is a regression and attempts to bisect the cause. It is noted that enabling Vulkan validation layers acts as a "speed bump" that prevents the crash by slowing command submission, implying a driver bug. The user confirms the issue occurs on Windows 11 with the latest Vulkan SDK and is asked if the problem persists on Vulkan SDK 26.04 packages, but no further resolution is provided.
- Number of comments this week: 5
-
Refactor: reduce compile time of common/chat.cpp: This issue addresses the lengthy compile time of the
common/chat.cppfile, which currently takes about 20 seconds to compile, causing a poor developer experience for contributors working on the chat parsing system. The author suggests refactoring approaches such as moving parsing functions to dedicated files or splitting thechat-template.hppinto separate header and implementation files to reduce compile time or improve code maintainability.- The comments discuss the possibility of splitting the Minja header and implementation files, with one user asking if this change would be accepted and another suggesting an optional split similar to another project. Additionally, a contributor expresses willingness to attempt the separation despite potential challenges.
- Number of comments this week: 3
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- Kompute-based Vulkan backend shows an GGML_OP_GET_ROWS error: This issue reports an error related to the Kompute-based Vulkan backend, specifically a GGML_OP_GET_ROWS error that does not occur with the other Vulkan backend. The problem has been open for over 600 days, indicating a persistent and unresolved bug affecting this particular implementation.
- Question: How to generate an MPS gputrace: This issue is about a user seeking guidance on how to generate an MPS gputrace for the llama.cpp project during model inference, specifically to aid in improving the Metal backend. The user is looking for a documented or known method to produce debugger output similar to what is provided by Apple's Metal debugger in Xcode.
- common: download from URL, improve parallel download progress status: This issue addresses the problem of conflicting progress indicators when downloading multiple files in parallel for sharded models, which was introduced in a previous update. It proposes improving the implementation of the download progress status by properly utilizing the CURLOPT_NOPROGRESS option in libcurl to ensure accurate and non-conflicting progress reporting during parallel downloads.
- kubernetes example: This issue discusses the creation of a Kubernetes example for deploying the
llama.cppserver using a Helm chart, aiming to provide the community with a scalable and standardized deployment method. The original poster has made initial progress but is seeking additional contributions and support to continue development when time permits. - Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf: This issue reports a problem with loading the microsoft/bitnet-b1.58-2B-4T-gguf model using the llama-cli on a Windows system with an NVIDIA GeForce RTX 3060 GPU. The error occurs because a tensor in the model file has a number of elements per row that is not a multiple of the expected block size, causing the model loader to fail when reading tensor information.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 22
Summarized Issues:
- GPU Backend Crashes and Errors: Multiple issues report crashes and errors related to GPU backends including Vulkan, CUDA, SYCL, and ROCm on various hardware such as Intel integrated graphics, AMD Radeon GPUs, and NVIDIA Volta GPUs. These problems manifest as device lost errors, memory allocation failures, invalid device functions, and driver crashes, severely impacting model execution and stability.
- [issues/17312, issues/17336, issues/17364, issues/17366, issues/17389, issues/17429, issues/17432, issues/17438]
- llama-server Stability and API Issues: Several issues describe instability and bugs in the llama-server module, including HTTP 500 errors from inconsistent KV cache token positions, hanging POST requests, keep-alive connection header corruption, segmentation faults under heavy load, and crashes during image slice encoding. These problems cause server crashes, timeouts, and failed request handling, reducing reliability in production environments.
- [issues/17316, issues/17384, issues/17387, issues/17388, issues/17391, issues/17422]
- Model and Feature Malfunctions: Some issues highlight malfunctioning model features such as the Qwen3-vl-4B high-resolution image token parameter not working as intended and tool engagement failures in retrieval-augmented generation setups with gpt-oss:20b. These bugs result in incorrect model outputs or no answers being returned, limiting usability for specific tasks.
- [issues/17345, issues/17410]
- Build and Compilation Challenges: There are reports of long compile times due to large source files and the need for refactoring, as well as issues with Vulkan SDK and Docker environments causing feature regressions or driver crashes. These challenges affect developer experience and deployment consistency across platforms.
- [issues/17321, issues/17329, issues/17414]
- Grammar Parsing and JSON Schema Bugs: Issues include improper escaping of backslash characters in JSON schema literals causing parsing failures, and incorrect handling of grammar-based generation continuation in the WebUI, leading to malformed outputs. These bugs disrupt expected parsing and generation workflows.
- [issues/17306, issues/17435]
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 19
Summarized Issues:
- Vulkan Backend GPU Issues: Multiple issues report problems with the Vulkan backend on Intel GPUs, including severe slowdowns or hangs when using flash attention on Intel Iris Xe GPUs, gibberish output on Intel DG1 GPUs due to async tensor reads, and
vk::DeviceLostErrorcrashes during queue submission with flash attention enabled. These problems appear to be related to recent commits and driver or async code support, affecting stability and output correctness. - issues/17297, issues/17302, issues/17334
- Model Loading and Multimodal Support Errors: Several issues describe failures in loading models due to missing tensors or incomplete files, particularly affecting multimodal projection tensors and causing errors in llama-server but not in llama-cli. Additionally, users face problems uploading images without specifying required multimodal GGUF files, highlighting the need for proper multimodal support and file inclusion.
- issues/17327, issues/17367, issues/17407
- Regression and Crash Bugs on CUDA and GPU Models: There are regression bugs causing crashes with Gemma 3n models on Nvidia RTX 4090/5090 GPUs using CUDA, and segmentation faults loading Gemma 2 series models on RTX 3060 GPUs, both resolved by disabling kernel fusion or reducing GPU layers. These issues indicate instability introduced in recent updates affecting CUDA backend performance and reliability.
- issues/17322, issues/17426
- Concurrency and Slot Initialization Behavior: The llama server incorrectly defaults to starting four parallel slots instead of one when launched with
-np 1due to an internal override that setsn_parallelto 4 unless disabled by-kvu. This behavior causes unexpected concurrency and resource usage unless explicitly configured. - issues/17300
- Compilation and Build Failures: Multiple issues report compilation errors including undeclared variables causing build failures on Linux with CUDA and CPU backends, missing CURL development libraries causing build failures on Fedora, and ARM64 compilation failures due to unsupported SME CPU feature flags in GNU 13.3.0 compiler. These problems require code fixes, proper dependency installation, or compiler upgrades to resolve.
- issues/17341, issues/17372, issues/17403
- Script and Conversion Tool Errors: Conversion scripts fail due to missing or incorrectly handled fields, such as a TypeError from a None model directory in
convert_lora_to_gguf.pyand a KeyError inconvert_hf_to_gguf.pycaused by fallback to a missing"num_hidden_layers"field instead of"num_layers". These issues require improved input validation and support for newer model configurations. - issues/17350, issues/17358
- Integer Overflow and Memory Allocation Vulnerabilities: Critical bugs in the grammar syntax parser and
llama-servermodule involve unvalidated large numeric ranges causing excessive resource usage and signed integer overflow in memory allocation size calculations. These vulnerabilities can lead to denial-of-service or server errors and require safer numeric types and input validation to mitigate. - issues/17352, issues/17355
- Vision-Language Model Evaluation Regressions: A regression introduced by commit 4db5641 causes incorrect and unrelated image descriptions during vision-language model evaluation on Android devices using OpenCL on Qualcomm Adreno GPUs, severely impacting the accuracy of image captioning with the Qwen2.5-VL-3B model.
- issues/17351
- Web UI Context Retention Bug: The new web UI for the Qwen3-30B-A3B-Thinking model fails to retain prior context when interrupting the model’s thinking process and sending a new prompt, causing loss of previously generated thought tokens. This contrasts with the old web UI which preserved and continued reasoning across interactions.
- issues/17430
- Build Process Inquiry: A user discusses the simplicity of building the Linux version via a bash script and inquires about the possibility of building a Windows version from within Linux, indicating interest in cross-platform build support.
- issues/17373
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 37
Key Open Pull Requests
1. server/public_simplechat alternate web client ui update - uncompressed 300KB - built in client side tool calls with 0 setup - reasoning - vision - ai calling into itself: This pull request introduces a lightweight, alternate pure HTML/CSS/JS web client UI for the llama.cpp project that rebases to the latest master, cleans up the UI to free vertical space, supports multiple images per message, and integrates built-in client-side tool calling, reasoning, and vision features with zero setup, all within an uncompressed 300KB source size, aiming to provide a simple, flexible, and functional option alongside the default heavier Svelte-based web UI.
- URL: pull/17415
- Merged: No
- Associated Commits: 9953d, bb6b3, 8f460, 25b0e, 83786, d0156, 946af, 0a4d6, 2c69c, 1df42, 4276f, dfc31, 374af, a4721, 1e43c, b3187, 73fea, eaff5, 0273e, b97b3, 2894e, e29db, f52b9, 94a81, ee4b4, 7b9fd, d4350, d486e, 004b3, 83cfc, a4366, 48142, a40a8, 87726, af0b3, 1fa3c, ddc2e, 1ce5e, 69c6a, 22cca, 1e580, d865b, bbf6e, f9c62, 95f75, af38f, 64ccf, 5354a, 6a44d, d97f2, e15f5, d9b35, ce71a, c0b42, 00c3d, 3cbc9, 1452c, f77b2, 4db7b, 1c5fd, c0d7d, f20ac, 800d3, 48c82, 4ed20, b4a72, 95787, 4d9b9, b6660, a9a3f, 2e5d7, f8bce, af004, e173a, 7e9df, efb63, fe89c, e7167, 52f55, 921f5, 20a38, 27a9f, edaeb, 5fb94, 0f2e0, 308ee, 6d26b, 32b3c, 66804, 4292e, 5a60b, bb4f0, d2281, ce0e7, f1fdc, a3248, 10c72, 16499, 94f37, 64439, 4a071, cfee2, 361d6, 7c4d4, 40829, dfb15, 93057, 136dd, bd85d, 8650a, 6f52e, 21845, 6c50b, ac716, 6e1d0, 0d06d, 772d2, 50fa2, ed391, 9a8ff, bf890, 0ec8f, fc8f1, 714c1, 5ea26, 1cb0d, 22809, 2097c, 45a8c, c30a2, 14aa4, eee88, 619a9, 1efff, 84a83, ba68d, 44dfe, 37ddd, 389ab, 295df, 9d704, 11109, 24ea7, f64d3, 33370, 76fde, 8ef05, 0f876, b14c5, 65e0d, b780a, 06e88, 4f004, 9ee86, 34347, 067a1, e0274, 17748, 98674, e7888, ef326, d1886, 02fd8, e1880, 99f2b, ffb9f, 7987a, 4d0f0, 323c1, 16f43, 2c95f, 7a8cb, 42ce8, 2acc2, bd6fe, 3ada2, 7765c, 60fae, 09220, 7a640, ef5e7, 919ff, 9305e, 5d9a9, 5208e, 512b8, dc637, c8d33, 14e49, 9936c, c71cc, 6f2f7, 70fc4, 716a7, 818e2, 8f2df, 722e5, 6e5ea, 280f6, 705f7, 4b0f3, 317fc, 82187, cbd87, 1d911, 2175b, 4d62e, e518b, 854ab, 0d564, 62bb6, 9319e, c5c25, cc805, 202d4, 4a36e, 667b8, c0fac, 21520, 082a9, 61095, d11b7, 5830a, 9c652, 04be0, e5275, 7df43, 2c3a6, 3d1ee, 2ff76, bb4a3, b4ba7, 0e0ae, 34064, 3eda5, 1a824, f8c50, ea25b, c711b, 9fbd5, a709b, 48111, 06663, 533c8, a9c7f, b90dc, d6537, a04fe, 81106, 12295
2. mtmd: Add DeepSeekOCR Support: This pull request adds support for DeepSeekOCR to the llama.cpp project, including implementation of the DeepSeek3B-MoE-A570M language model component, vision model processing fixes, and integration of related image encoding and attention mechanisms.
- URL: pull/17400
- Merged: No
- Associated Commits: 43a13, b6b9f, 85c7c, 578c8, 2aab5, eab28, 76305, 2de34, e8b26, 97e09, 13dc6, b32bb, 790bb, cec9a, 8b3d3, 1e081, 331ce, 6c071, a65dd, 63a04, 89afd, 88032, 1268d, 68b20, 8bce6, 5e6cf, 7e9fb, 0f558, 7b8d7, 86f11, effe6, 3fcfc, 4cfa1
3. [model] Add support for Plamo3: This pull request adds support for the PLaMo-3 series of base models by integrating their hybrid architecture, which includes Sliding Window Attention and custom feed-forward network layouts, into llama.cpp to enable conversion of official checkpoints to GGUF format and compatibility with existing backends.
- URL: pull/17304
- Merged: No
- Associated Commits: f61ed, c3b61, ce7a9, 1ab3b, d9854, 8dbbe, 96781, 74fa9, 33910, 037d8, 4d0be, 80c34, 9cecb, 3873e, 8b928, 0df52, cdb1d, 527c6, 0f9d0, 5d52f, dab7a, 9bd33, 67a6d, d965f
Other Open Pull Requests
- Diffusion Language Model Support: This pull request adds support for the RND1 Diffusion Language Model to the llama.cpp project, including conversion to GGUF weights and implementation of diffusion-based text generation. It also provides detailed instructions and performance benchmarks across various hardware platforms.
- Vulkan Backend Enhancements: Multiple pull requests improve Vulkan backend functionality by implementing a top-k selection algorithm with workgroup-based sorting and filtering, adding support for the
GGML_OP_GET_REL_POSoperator for F16 and F32 data types, and introducing a precision fix in the compute shader. These changes include related support in ggml and metal backends, cleanup, testing enhancements, and depend on prior testing improvements.
- Intel GPU Performance Optimization: A pull request proposes changing the default subgroup and block size from 32 to 16 specifically for Intel GPUs, resulting in a 1.2 to 1.5 times performance improvement on certain models and GPU generations. This includes several commits to implement, test, and refine the adjustment.
- RISC-V Vector Floating-Point Support: This pull request extends the existing RISC-V Vector (RVV) floating-point support in the ggml-cpu module by adding a BF16 RVV flag to enable the zvfbfwma extension and introducing six new floating-point kernels. The new kernels were functionally tested on QEMU across various vector lengths and input sizes.
- Window Operations and Relative Position Embeddings: Enhancements to window operations and relative position embeddings in CPU and CUDA backends include batching support, extended data type compatibility to F16/BF16, CUDA support, scaling for different query/key lengths, and new tests. These improvements are critical for SAM and DeepSeek-OCR functionality.
- CUDA Backend Copy Support and Refactoring: This pull request adds support for copying non-contiguous 32-bit integer data to 32-bit integer destinations in the CUDA backend, includes related tests, and refactors the copy function for better clarity.
- Server Code Refactoring: The server.cpp file is refactored by splitting its code into smaller components—server-common, server-task, and server-queue—to modularize utility functions, task serialization/deserialization, and mutex-related task queue management. This improves code structure and maintainability.
- Kimi-K2 Model Tool-Call Parsing Fixes: This draft pull request implements multiple fixes and tests addressing ongoing issues with tool-call parsing in the Kimi-K2 model. It remains in draft status to gather community bug reports and reproduction cases due to hardware limitations preventing full reproduction or testing by the author.
- Hexagon DSP Support and Backend Fixes: Initial support for Hexagon DSP versions v68 and v69 is introduced in the ggml backend, enabling model execution on these platforms despite slow performance. The pull request also fixes build errors, VTCM acquire failure checks, and adjusts for memory page size constraints.
- Android Binding Rewrite: The Android binding for llama.cpp is rewritten by removing the cpu_features dependency and implementing dynamic native library loading for advanced acceleration on Aarch64 and x86_64 architectures. The redesign includes a new C++ layer and JNI bridge supporting features like automatic message role formatting, system prompt injection, context overflow handling, batch decoding, engine state exposure, new APIs, GGUF metadata parsing utilities, and performance optimizations.
- Anthropic Messages API Integration: Support for the Anthropic Messages API is added to llama-server by implementing endpoints for chat completions and token counting with streaming, tool use, vision support, system prompts, and extended parameters. The pull request converts Anthropic's message format to an OpenAI-compatible internal format to reuse the existing inference pipeline.
- Hexagon Backend README Improvements: The Hexagon backend README is improved by adding a step to explicitly create the target directory on the Android device before pushing GGUF files and correcting the example command to properly escape double quotes around the prompt string. These changes enhance clarity and accuracy for deploying and running the software on Snapdragon-based Android devices.
- JSON Schema and Locale Fixes: Two pull requests fix issues related to data formatting: one corrects the JSON schema to properly escape the backslash character in literals, and another fixes locale-dependent float printing in GGUF metadata by replacing std::to_string with std::ostringstream using std::locale::classic() to ensure consistent decimal separators.
- Eagle2-VL Multimodal Model Support: Initial support for the Eagle2-VL multimodal models (1B and 2B) is added to the MTMD pipeline with a dedicated converter, runtime builder, and loader enhancements for the Eagle2-VL vision tower and its 2-layer projector. This integration does not affect existing model architectures.
- Performance Graph and Test Configuration Updates: Updates to the worst-case performance graph for the unified cache address related issues and improve test configurations by disabling operation offload in certain tests.
- Metrics Endpoint Fix: The
/metricsendpoint issue where Prometheus-format text was incorrectly JSON-escaped and wrapped in double quotes is fixed by adding and modifying anok()method overload to ensure proper formatting for Prometheus parsing.
- Multiple Checkpoints in llama-server: This pull request introduces the creation of multiple checkpoints during prompt processing in llama-server to improve usability and prevent loss of progress, replacing the previous approach of generating only a single checkpoint after processing many tokens.
- s390x Architecture Fixes for convert_hf_to_gguf.py: The convert_hf_to_gguf.py script is fixed on s390x architecture by correctly handling byte order conversions, assuming little-endian model data and performing necessary byteswaps after reading. GGUFWriter is modified to accept tensors in native endianness to avoid redundant byteswaps, and inplace byteswap calls on lazy tensor and array wrappers are replaced with copying byteswaps.
- RDNA4 GPU Optimization: Support for the
mul_mat_foperation is enabled for RDNA4 GPUs, and performance is optimized by moving workloads withn >= 3from themmvfbackend to themmfbackend. A rarely executed branch prompts the ROCm compiler to generate more efficient code, with performance improvements demonstrated through extensive benchmarking on an RX 9070 XT GPU.
- Core Scaling and Synchronization Improvements: A core scaling issue caused by cache-line contention when running llama-bench on large core count machines with small batch sizes is addressed by implementing synchronization and partitioning improvements. These changes result in throughput gains of 2% to 44% for the Qwen3 30B parameter model.
- Facebook Nougat OCR Model Support: Support for Facebook's Nougat OCR model is added by implementing the mBART encoder/decoder and Swin Transformer vision encoder architectures with cross-attention. The pull request includes model conversion scripts, a CLI tool for OCR processing with multiple output formats, GPU acceleration, and comprehensive documentation for academic document understanding including formulas, tables, and complex layouts.
- CANN Backend Enhancements: Optimized caching logic for rope_cache_init and support for mRoPE and i-mRoPE are added in the CANN backend, with notes on specific configuration requirements and ongoing investigations for Ascend 910B devices. Additionally, support for the out_prod operator handling both F32 and F16 floating-point product calculations is introduced.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 48
Key Closed Pull Requests
1. ggml-hexagon: fix swiglu failure at test-backend-ops: This pull request fixes failures in the Hexagon backend's swiglu and silu operations by adding overflow-guarded HVX primitives, improving NaN/Inf handling in exponential and inverse functions, and correcting implementation mistakes to enhance the accuracy and stability of these operations during testing.
- URL: pull/17344
- Merged: Yes
- Associated Commits: ab752, 5aa4a, a6415, a8cdb, ae42f, 39445, a589b, 6f57b, db9e9, ce48a, fc5f3, 57073, 54235, 38594, 014ad, 8c374, 33a05, 83884, f7662, 5f553, f6d7f, 37e9a, 185dc, 6d887, e07cb, 55cea
2. vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC: This pull request implements several Vulkan operations including ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, and TRUNC in the llama.cpp project, with mostly mechanical changes except for ROUND which lacks a direct Vulkan equivalent.
- URL: pull/17319
- Merged: Yes
3. common : more accurate sampling timing: This pull request improves the accuracy of timing measurements in the common sampling code by separately reporting the time spent in sampling, the llama_sampler, and unaccounted time, thereby providing more detailed performance insights.
- URL: pull/17382
- Merged: Yes
Other Closed Pull Requests
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| hanishkvc | 625 | 7 | 1 | 3 |
| hanyin-arm | 250 | 1 | 0 | 1 |
| ngxson | 116 | 10 | 4 | 66 |
| ggerganov | 89 | 18 | 2 | 82 |
| pwilkin | 46 | 5 | 3 | 81 |
| jeffbolznv | 39 | 11 | 0 | 68 |
| CISC | 25 | 2 | 0 | 78 |
| 0cc4m | 30 | 6 | 0 | 60 |
| aldehir | 63 | 1 | 1 | 26 |
| am17an | 30 | 5 | 1 | 46 |