Weekly GitHub Report for Llama.cpp: March 16, 2026 - March 23, 2026 (19:49:23)
Weekly GitHub Report for Llama.cpp
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
The current version of this repository is b4991
1.2 Version Information:
The version released on March 29, 2025, introduces key updates that enhance overall performance and stability, with notable improvements in user interface responsiveness and security features. This release reflects a continued focus on optimizing user experience and safeguarding data integrity.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
[BUG-UNCONFIRMED] Compile bug: c++: error: unrecognized command-line option ‘-fsycl’: This issue reports a compilation failure when building the project with the SYCL backend enabled, specifically encountering an error where the compiler does not recognize the
-fsyclcommand-line option. The user details their environment setup on Debian 13, installation of Intel oneAPI, and attempts to compile with the SYCL backend, but the build fails due to the compiler not being set correctly and issues with GPU driver detection and configuration for SYCL support.- The comments guide the user to explicitly set the SYCL compiler to
icpxin the CMake configuration to resolve the unrecognized-fsycloption error. Further discussion reveals the user’s GPU driver was not correctly installed or configured for SYCL, leading to runtime errors and inability to detect a usable GPU; the user investigates switching from the defaulti915driver to thexedriver on Debian, and the community advises verifying driver installation and using diagnostic tools likesycl-lsto confirm proper SYCL GPU support. - Number of comments this week: 10
- The comments guide the user to explicitly set the SYCL compiler to
-
[BUG-UNCONFIRMED] Eval bug: DeepSeek V3.2 no longer reasons (
) when using jinja template from DeepSeek V3.2 Exp and no reasoning_content is returned by the server : This issue reports a regression in DeepSeek V3.2 where the model no longer performs reasoning (indicated by the absence of thetag) when using a specific jinja chat template, resulting in no reasoning_content being returned by the server. The user observed that older builds of llama.cpp produced correct and longer reasoning outputs with near-perfect accuracy, while recent builds yield shorter, less accurate answers, suggesting the problem lies outside the model itself and may be related to template handling or server parameters. - The comments discuss ruling out recent changes to the model loading architecture, testing different commits to isolate the regression, and identifying that reasoning is disabled due to template or server parameter issues. Suggestions include disabling CUDA fusion, correctly setting the
thinkingflag via server parameters, and merging related pull requests to improve reasoning detection, with users sharing evaluation scripts and expressing frustration over time spent investigating the issue. - Number of comments this week: 10
- The comments discuss ruling out recent changes to the model loading architecture, testing different commits to isolate the regression, and identifying that reasoning is disabled due to template or server parameter issues. Suggestions include disabling CUDA fusion, correctly setting the
-
[BUG-UNCONFIRMED] Misc. bug: TG performance degradation with mixed offload using fused up + gate models: This issue reports a performance degradation in tensor generation (TG) when using mixed offload with fused up + gate models in the Qwen3.5 quantizations, where fused models run significantly slower than unfused ones when partially loaded in VRAM and RAM. The user investigates the problem by comparing fused and unfused models, testing patches, and analyzing tensor placement and buffer type handling, leading to a proposed code fix to improve performance.
- The comments discuss attempts to reproduce and address the issue, including testing existing patches, running with different flags, sharing debug logs, identifying a problem with tensor buffer overrides, and finally proposing a code patch that adjusts tensor pattern matching to fix the performance degradation.
- Number of comments this week: 10
-
[BUG] Misc. bug: Getting 500 - Failed to parse input at pos x when tool calling: This issue reports a server error occurring when running tool calls in the llama-server, specifically a 500 error with the message "Failed to parse input at pos x" related to
<tool_call>tags during code review tasks. The problem appears after a recent commit and affects multiple models, causing failures in parsing tool call inputs such asread_file_diff_md, which prevents successful processing of code review files, particularly XSLT files in this case.- The comments discuss attempts to reproduce and isolate the issue, noting it happens consistently on all tool calls and across different models. Users share logs and test setups, and one contributor identifies the root cause as a bug triggered when tools have no required arguments, suggesting adding a required parameter like
review_idto fix the parsing error. - Number of comments this week: 9
- The comments discuss attempts to reproduce and isolate the issue, noting it happens consistently on all tool calls and across different models. Users share logs and test setups, and one contributor identifies the root cause as a bug triggered when tools have no required arguments, suggesting adding a required parameter like
-
Refactor: Tool registry on server: This issue proposes refactoring the tool registry on the server side of the
llama-serverproject to centralize tool management, which currently resides only on the web UI client side, causing code duplication. The plan includes supporting built-in tools and MCP tools via amcp.jsonfile, enabling both CLI and web UI clients to access these tools uniformly, with a phased implementation involving adding built-in tools first, then MCP support, and finally integrating a C++ MCP client for the CLI.- The comments discuss different approaches to unify tool listing and execution, the separation of skills from MCP tools, UI adaptations for tool selection, and a stepwise implementation plan; contributors share prototype code and agree on API designs, emphasizing simplicity in the initial version and deferring complex features like streaming and session management.
- Number of comments this week: 9
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
As of our latest update, there are no stale issues for the project this week.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 64
Summarized Issues:
- Server 500 Errors and Tool Call Parsing Failures: Multiple issues report server errors with 500 responses caused by failures to parse input during tool calls in llama-server, affecting various models and tool usages. These errors disrupt normal operation and prevent successful tool invocation or web search functionality.
- issues/20650, issues/20761
- Model Loading and Symbol Lookup Errors: There are symbol lookup errors when loading specific models like Qwen3.5-35B-A3B-Q4_K_S.gguf due to undefined symbols in shared libraries, causing failures on Linux systems. This prevents proper model initialization and usage in llama-cli.
- issues/20658
- GPU Backend and Inference Issues: Several issues describe problems with GPU backends including static or incorrect outputs on CUDA, memory allocation failures on OpenVINO with Arc A750 GPUs, Vulkan device loss errors on NVIDIA GPUs, and performance regressions on Intel Arc GPUs. These issues cause crashes, hangs, or degraded inference performance.
- issues/20659, issues/20661, issues/20762, issues/20890, issues/20893, issues/20894
- Model-Specific Bugs and Output Errors: Bugs affecting specific models include repetitive or empty outputs on Apple Metal hardware with Mistral Small 4, tool call formatting errors causing generation to stop in Qwen3.5 9B, and reasoning failures in DeepSeek V3.2 with certain templates. These issues reduce model usability and output correctness.
- issues/20668, issues/20837, issues/20717
- Tool and MCP Integration Refactors: Multiple proposals and implementations focus on refactoring the llama-server and CLI to support centralized tool registries, MCP client integration, and unified tool management across server, CLI, and web UI. These aim to improve tool invocation, configuration, and proxying capabilities.
- issues/20673, issues/20675, issues/20677, issues/20769, issues/20770
- Feature Requests for Model and UI Enhancements: Requests include adding support for Yuan 3.0 models, video file support in the WebUI, export/import of WebUI settings, and toggling reasoning features live in the UI. These features aim to expand model compatibility and improve user experience.
- issues/20683, issues/20741, issues/20695, issues/20798
- Memory and Context Management Improvements: Issues report excessive VRAM usage due to incorrect tensor matching, propose disk-based context checkpoint offloading to reduce RAM usage, and describe bugs in context size calculation and prompt cache restoration performance. These affect memory efficiency and large context handling.
- issues/20703, issues/20697, issues/20878, issues/20854
- Compilation and Build Failures: Several issues describe compilation failures related to SYCL backend configuration, Vulkan shader generation errors due to insufficient memory, and build environment problems on Debian and Android Termux. These prevent successful builds and require environment or configuration fixes.
- issues/20702, issues/20868, issues/20812
- Performance Regressions and Optimization Needs: Reports include decoding slowdowns with flash attention enabled, GPU offloading parameter anomalies causing speed drops, and shader inefficiencies on AMD GPUs due to scalar loads and high register usage. Proposed optimizations target shader code and offload logic to improve throughput.
- issues/20710, issues/20714, issues/20846, issues/20848
- Inference and Model Execution Failures: Issues include segmentation faults in backend functions, indefinite newline generation in CLI with Docker images, and server crashes during warmup with flash attention on Mistral Small 4. These failures disrupt normal model execution and require debugging or workarounds.
- issues/20824, issues/20774, issues/20748
- Quantization and Cache Offloading Limitations: Problems with asymmetric key/value cache quantization not offloading to GPU cause fallback to slower CPU processing, and quantization process inefficiencies create bottlenecks. These issues impact prompt evaluation speed and quantization performance.
- issues/20866, issues/20829
- Grammar and Parsing Failures in Chat Completions: Fixed repetition thresholds cause grammar compilation failures leading to 500 errors during chat completions with tools, and malformed GGUF metadata causes assertion failures during file parsing. These parsing issues cause silent failures and crashes.
- issues/20867, issues/20873
- Web UI and Configuration Bugs: The web UI suffers from nested button element issues causing cursor and event problems, failure to load or accept JSON config files on Windows, and lack of JPEG EXIF orientation handling in vision models. These bugs degrade UI usability and image processing correctness.
- issues/20832, issues/20871, issues/20870
- Vulkan Backend Behavior and Stability Issues: Vulkan backend exhibits differing performance and behavior between row and layer split modes, device loss errors on AMD and NVIDIA GPUs, and crashes with flash attention enabled on certain hardware. These issues affect stability and performance consistency.
- issues/20862, issues/20889, issues/20762
- Model Compatibility and Support Inquiries: Users inquire about support for specific models like Qianfan-OCR and report failures running Qwen3.5 and Qwen3 models on dual AMD GPUs with Vulkan, indicating gaps in hardware or model compatibility.
- issues/20734, issues/20699
- Miscellaneous Issues with Input and Output Handling: Problems include invalid request errors when viewing images with codex and qwen3.5, connection failures registering with Microchip MCP Server, and gibberish output on non-Ubuntu Linux distributions using split-mode row. These affect input/output correctness and connectivity.
- issues/20663, issues/20666, issues/20815
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 36
Summarized Issues:
- Model Naming and Output Issues: Multiple issues report problems with model name ambiguity, incorrect output formatting, and unexpected token insertion. Users face confusion due to identical model names without aliases, models wrapping responses incorrectly in thinking blocks, and corrupted output caused by unwanted tags during resumed completions.
- [issues/20165, issues/20550, issues/20768]
- Server 500 Errors and Parsing Failures: Several issues describe server errors triggered by input parsing failures, autoparser bugs, or corrupted prompt caches. These errors cause crashes or failed completions during chat requests, JSON schema generation, or evaluation runs, often linked to recent commits affecting parsing or caching mechanisms.
- [issues/20281, issues/20344, issues/20532, issues/20814]
- Vulkan and GPU Backend Stability Problems: There are multiple reports of crashes, device lost errors, and performance regressions related to Vulkan backend usage on AMD GPUs. Issues include device lost errors with large contexts, garbage output on Vulkan after specific commits, and significant throughput drops on certain Radeon cards.
- [issues/20439, issues/20462, issues/20597, issues/20610, issues/20651]
- Embedding and Backend Assertion Failures: Bugs causing assertion failures and crashes occur when serving embedding models or handling specific tensor types on various hardware backends. These include failures in ggml_mul_mat during embedding serving and assertion errors with quantized models on aarch64 CPUs.
- [issues/20481, issues/20608, issues/20863, issues/20875]
- Caching and Tool Call Output Bugs: Problems with caching mechanisms lead to incorrect tool call outputs and duplicated chat completions. Disabling cache prompting temporarily resolves these issues but at the cost of performance or duplicated responses in the web UI.
- [issues/20614, issues/20568]
- Model Control and Feature Requests: Users request support for control vectors on Qwen3.5 variants and hardware/memory override features in router mode to optimize resource allocation and prevent crashes. Current attempts to use control vectors cause crashes or no effect, and router mode lacks per-model resource configuration.
- [issues/20541, issues/20851]
- Web UI and Server Functionality Bugs: The web user interface fails to load or behaves incorrectly on Windows and other environments, with JavaScript errors and missing persistence of settings. Additionally, the server mishandles health endpoint requests, causing liveness probe failures and pod restarts under load.
- [issues/20722, issues/20736, issues/20738, issues/20684]
- ROCm and HIP Backend Compatibility Issues: Several issues describe crashes and errors related to ROCm versions and missing support for specific AMD GPU architectures in the HIP backend. These cause users to rely on Vulkan despite HIP being preferable, with regressions introduced in recent ROCm releases.
- [issues/20564, issues/20839]
- Parsing and Reasoning Content Misclassification: The autoparser misclassifies reasoning content and tool calls, causing broken tool invocation and streaming errors. Workarounds involve disabling reasoning detection or excluding certain templates, with fixes applied to improve compatibility with OpenAI clients.
- [issues/20500, issues/20754, issues/20809]
- Multi-Image and Multi-Turn Processing Inefficiencies: The Qwen3.5 model reprocesses previous images during multi-turn interactions, causing slowdowns compared to other models that cache image processing. This inefficiency was identified and resolved in a specific branch update.
- [issues/20755, issues/20720]
- Compilation and Architecture-Specific Build Failures: Building on RISC-V without vector extensions fails due to missing declarations and backend errors introduced after certain commits, blocking compilation on that architecture.
- [issues/20669]
- Backend Operation and Precision Support Limitations: The SYCL backend lacks support for BF16 precision in MUL_MAT operations, causing warnings and suboptimal performance during model execution, which was later fixed.
- [issues/20713]
- Release Process and CI Runner Failures: Random failures in s390x self-hosted runners block release processes, with investigations pointing to communication and permission issues, prompting discussions about temporarily disabling affected jobs.
- [issues/20787]
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 46
Key Open Pull Requests
1. ggml-webgpu: add vectorized flash attention: This pull request introduces a vectorized WebGPU implementation of flash attention (FLASH_ATTN_EXT) in the ggml-webgpu backend, featuring a split pipeline with optional mask tile classification, a vectorized attention kernel, and a merge path for multi-split execution to optimize performance.
- URL: pull/20709
- Associated Commits: 976eb, 94abb, 10330, c307a, f8e31, 52709, df6ef, 83830, d61ec, 042a1, b61e6, 36027, 356d6, 1ae04, 33a54, 638c4, 0abac, 83a42, 2595b, 25096, 3d6bf, 68fa2, 5065d, 03d06, 5dd2a, 1e0d8, 88bf3, 5c2fe, 59aa7, cac85
2. args: refactor mlock/mmap/directio into load-mode: This pull request refactors the three separate loading modes—mlock, mmap, and direct-io—into a single unified --load-mode option to simplify the logic by allowing only one loading mode at a time, deprecating the old flags and updating documentation and tests accordingly.
- URL: pull/20834
3. server: add built-in tools backend support: This pull request adds backend support for built-in tools in the server, including enabling tools with the --tools all argument, handling permission dialogs for tools requiring write access, and updating the API and UI behavior to reflect whether built-in tools are enabled.
- URL: pull/20898
Other Open Pull Requests
- Hugging Face cache support and error handling: This topic includes pull requests that add standard Hugging Face cache support by using the Hugging Face API to locate files and migrate manifests, along with improvements in error handling, API error reporting, and fallback mechanisms for cached files. These changes enhance reliability and integration with Hugging Face infrastructure.
[pull/20775]
- Compilation fixes and header updates: Pull requests under this topic resolve compilation errors on Windows with Clang/LLVM by adding missing
<chrono>headers and fixing pointer type mismatches in BitNet implementation. These fixes ensure successful builds and correctstd::chronofunctionality across platforms.
[pull/20674]
- Shader and GPU performance optimizations: This topic covers adding support for the DP4A integer dot product instruction in Flash Attention shaders, improving GLSL vector macro consistency, and optimizing shader selection and memory indexing on GPUs supporting DP4A. These enhancements improve performance and correctness of GPU computations.
[pull/20797]
- RVV repacking and kernel support for quantization: Pull requests here extend RVV repacking and GEMM/GEMV kernels to support a wider range of vector lengths and introduce new kernels for Q5_K and MXFP4 quantization types. Functional testing and benchmarking on various hardware configurations accompany these additions.
[pull/20723]
- Build system improvements: This topic includes replacing
makewith the Ninja build system in CI builds to improve portability and achieve approximately 1.7x faster build times. The change relies on standard CMake commands for better cross-platform support.
[pull/20742]
- Qwen3-TTS model support and tooling: Pull requests add initial support for the Qwen3-TTS model architecture, including multi-stage pipeline execution, new GGUF schema and conversion tools, C++ inference graph implementations, and CLI tools for text-to-speech and voice cloning. These additions enable advanced audio decoding and speaker encoding features.
[pull/20752]
- Web UI diagnostics and fixes: This topic covers adding diagnostic printouts and CORS warnings to the web UI for troubleshooting fetch errors while redacting sensitive tokens, as well as fixing the edit message form's
<textarea>height to auto-resize and applying consistent height limits for better user experience.
[pull/20753, pull/20830]
- WebGPU and Dawn version updates: Pull requests update the Dawn version used in WebGPU CI from an eight-month-old release to the latest version, ensuring up-to-date dependencies and improved stability.
[pull/20784]
- CANN environment updates: This topic includes upgrading CANN-related docker images to stable version 8.5.0 and revising documentation to clarify device naming and add BF16 support without affecting functionality outside the CANN environment.
[pull/20801]
- Granite chat template enhancements: Pull requests introduce a new Granite 4.0 chat template alongside the existing 3.x version, adding an enum and handler to correctly map the
assistant_tool_callrole and fixing tool calling issues on the C++ template path without--jinja. Backward compatibility and comprehensive tests are maintained.
[pull/20804]
- Pull request template improvements: This topic adds a "Requirements" section to the PR template mandating acknowledgment of contributing guidelines, disclosure of AI usage, and providing an additional guide for AI agents.
[pull/20841]
- API conversion refactoring and testing: Pull requests unify and relocate conversion functions for Chat Completions, Responses, and Anthropic Messages into a common file, add tests for the Responses API, and fix issues with string-only message content.
[pull/20690]
- Shader library JIT compilation port: This topic ports the copy and glu pipelines of ggml-webgpu to a shader library with just-in-time compilation to enhance performance and flexibility.
[pull/20728]
- Multi-GPU pipeline synchronization fixes: Pull requests reintroduce careful synchronization between asynchronous memory copies and graph execution to fix bugs affecting multi-GPU pipeline parallelism while preserving single-GPU CUDA performance improvements.
[pull/20793]
- Router mode slot state persistence: This topic implements automatic saving and restoring of slot state including context checkpoints in router mode with
--models-max 1, enabling seamless model hot-swapping and drastically reducing prompt re-processing time.
[pull/20822]
- Machine translation usage prohibition: Pull requests explicitly prohibit the use of language models for machine translation in contributions to ensure authenticity and reliability, clarifying that machine-translated content is not permitted.
[pull/20838]
- Parser and sampler interaction improvements: This topic improves parser and sampler interaction by maintaining reasoning state to disable grammar triggers within tool calling sections and replacing premature end-of-generation markers with end-of-thinking markers for proper model responses.
[pull/20844]
- NVFP4 per-tensor scaling precision restoration: Pull requests restore precision of NVFP4 per-tensor scaling by applying it during matrix multiplication rather than after accumulation, improving numerical stability and accuracy on GPU, especially for Qwen3.5 and Nemotron models, and enabling basic CUDA support for NVFP4.
[pull/20845]
- Web UI HTML validity fixes: This topic fixes invalid nested
<button>elements in theModelsSelector.sveltedropdown by using bits-ui child snippet patterns, replacing inner buttons with spans, and removing duplicatedisabledattributes to produce valid HTML.
[pull/20853]
- Vulkan/Linux build documentation update: Pull requests add mention of the ANV_SYS_MEM_LIMIT environment variable to the Vulkan/Linux section of build.md to document an out-of-memory issue with Intel integrated GPUs, aiding users configuring Intel Vulkan setups.
[pull/20670]
- Multi-Token Prediction (MTP) speculative decoding: This topic introduces MTP speculative decoding support for Qwen3.5 dense models with a full MTP attention head, FastMTP vocabulary trimming, and fixes for recurrent state handling and two-phase decode options to prevent state corruption.
[pull/20700]
- gguf division by zero fix: Pull requests address a division by zero error in the gguf component by preventing malformed data from causing crashes.
[pull/20716]
- Chat parser regression workaround: This topic implements a server-level workaround for a regression causing
std::runtime_erroron final parse attempts, catching exceptions and falling back to last successful streaming results to ensure stable API behavior despite parser failures.
[pull/20729]
- oneAPI version upgrade: Pull requests upgrade the default oneAPI version in the Intel Dockerfile from
2025.2.2-0-devel-ubuntu24.04to2025.3.2-0-devel-ubuntu24.04, resulting in improved performance demonstrated by benchmarking.
[pull/20731]
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 129
Key Closed Pull Requests
1. agent: add .agent/skills as alternative skill discovery path: This pull request proposes adding .agent/skills/ as an alternative project-local directory for skill discovery alongside the existing .llama-agent/skills/ path, providing a vendor-neutral option for agent configuration with a defined priority order and updating relevant code and documentation to support this new skill search path.
- URL: pull/20691
- Associated Commits: 457fc, dcd28, 860ca, 4a30a, 47aaf, a5464, 5660b, 2d215, a707e, 9dda9, bb089, cc202, 29100, 265ba, 4fc32, cb5c1, 8c96b, 75f39, a8322, 075ea, 48caa, 09f3e, 4dc9b, 7a487, 983f5, eb1b5, 2cd3f, a4446, a226e, adbf4, 439bc, 1293f, 8a35d, a22bb, 4709f, 1003d, 5404a, ecc61, 5ad4a, 6bde0, 8c6e2, 4180c, e0efa, 02394, f47e1, 3a3a1, a7c52, e6f31, 15036, 79614, 95cdb, 19934, ad77f, 35458, b7bd7, eeafe, 09be5, 66e48, c42a7, e5038, 16a8e, 0aa3d, 14cab, d4161, 39e0a, 6ea11, 5f359, be5d7, dee7e, 1354d, 24395, faf3d, c7d23, 53419, 77812, ef2dd, 879a7, d3698, 5c70d, 8e2f4, 5996e, c9b70, cdb7a, 4d03c, d38f0, 18a5c, ee675, 5ad48, c80f9, 8a07d, 731e5, 25078, ecfb8, 86b6f, 07523, 2cd98, c5121, c9c73, defc5, 07453, 61a57, cbb43, af3cf, 8068b, 625f7, 7c048, 647e2, ce3d0, 98682, 83b2c, effd3, 3c6c6, b1954, d3c24, 42949, 01a09, 9bdb9, 84e0c, 5f290, 8357f, 0e8fd, bec97, de4d6, defff, 2b0b9, 4ddb4, 1efc8, 56375, 7370a, 2cc3d, a3918, 0de54, de9dd, 09fcb, a421f, 43128, 687b8, e6080, a2e77, 00569, 1870f, 8f62d, 35c6d
- Associated Commits: 457fc, dcd28, 860ca, 4a30a, 47aaf, a5464, 5660b, 2d215, a707e, 9dda9, bb089, cc202, 29100, 265ba, 4fc32, cb5c1, 8c96b, 75f39, a8322, 075ea, 48caa, 09f3e, 4dc9b, 7a487, 983f5, eb1b5, 2cd3f, a4446, a226e, adbf4, 439bc, 1293f, 8a35d, a22bb, 4709f, 1003d, 5404a, ecc61, 5ad4a, 6bde0, 8c6e2, 4180c, e0efa, 02394, f47e1, 3a3a1, a7c52, e6f31, 15036, 79614, 95cdb, 19934, ad77f, 35458, b7bd7, eeafe, 09be5, 66e48, c42a7, e5038, 16a8e, 0aa3d, 14cab, d4161, 39e0a, 6ea11, 5f359, be5d7, dee7e, 1354d, 24395, faf3d, c7d23, 53419, 77812, ef2dd, 879a7, d3698, 5c70d, 8e2f4, 5996e, c9b70, cdb7a, 4d03c, d38f0, 18a5c, ee675, 5ad48, c80f9, 8a07d, 731e5, 25078, ecfb8, 86b6f, 07523, 2cd98, c5121, c9c73, defc5, 07453, 61a57, cbb43, af3cf, 8068b, 625f7, 7c048, 647e2, ce3d0, 98682, 83b2c, effd3, 3c6c6, b1954, d3c24, 42949, 01a09, 9bdb9, 84e0c, 5f290, 8357f, 0e8fd, bec97, de4d6, defff, 2b0b9, 4ddb4, 1efc8, 56375, 7370a, 2cc3d, a3918, 0de54, de9dd, 09fcb, a421f, 43128, 687b8, e6080, a2e77, 00569, 1870f, 8f62d, 35c6d
2. Add Hexagon Matrix Extensions (HMX) for Hexagon NPU backend: This pull request adds initial support for Hexagon Matrix Extensions (HMX) acceleration to the Hexagon NPU backend in the llama.cpp project, implementing runtime weight conversion during dequantization to maintain compatibility with the existing Q4_0 format, enabling FP16 matrix multiplication for improved performance on Hexagon v73+ (tested on v81), and includes extensive infrastructure, DMA handling, multithreading, and debugging enhancements while supporting Q4_0 and Q8_0 quantization formats.
- URL: pull/20693
- Associated Commits: cca1c, 7e641, 5863f, 2f628, 5426a, 56bbe, 1fa95, 6e53f, ae98c, 334bf, e3a11, a476c, f21a2, d1b53, f7b4c, ee5d6, 41069, 28df9, 76c53, 56c4d, 73efe, e23b3, c65f9, 53131, feb4a, 06ab7, 689f9, 77a16, bb12f, 90e45, 9b633, 48592, 8a653, b4a72, aae15, 1c961, 7c3e7, 6ecb0, 3478b, 95ca3, 8611e, f3a25, 19b68, 3645f, 03dfc, fb937, 3a1dc, f3741, fbeef, 1f0c4, 07e42, 769d3, cd0bb, b120f, 2b061
- Associated Commits: cca1c, 7e641, 5863f, 2f628, 5426a, 56bbe, 1fa95, 6e53f, ae98c, 334bf, e3a11, a476c, f21a2, d1b53, f7b4c, ee5d6, 41069, 28df9, 76c53, 56c4d, 73efe, e23b3, c65f9, 53131, feb4a, 06ab7, 689f9, 77a16, bb12f, 90e45, 9b633, 48592, 8a653, b4a72, aae15, 1c961, 7c3e7, 6ecb0, 3478b, 95ca3, 8611e, f3a25, 19b68, 3645f, 03dfc, fb937, 3a1dc, f3741, fbeef, 1f0c4, 07e42, 769d3, cd0bb, b120f, 2b061
3. Add dynamic high-resolution image preprocessing for InternVL model: This pull request adds support for dynamic high-resolution image preprocessing tailored for the InternVL model, enabling improved handling of high-resolution tiles required by the Qianfan-OCR system within the llama.cpp framework.
- URL: pull/20847
- Associated Commits: bd4d9, d55ce, 0c47a, 7e08c, ae109, 2cf9a, 75c80, 8e660, b0b43, 109ad, c42b5, b260e, 5f675
- Associated Commits: bd4d9, d55ce, 0c47a, 7e08c, ae109, 2cf9a, 75c80, 8e660, b0b43, 109ad, c42b5, b260e, 5f675
Other Closed Pull Requests
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| ggerganov | 112 | 19 | 0 | 29 |
| CISC | 54 | 7 | 1 | 48 |
| ngxson | 54 | 7 | 6 | 41 |
| pwilkin | 66 | 9 | 1 | 12 |
| rodgerhubhay | 84 | 0 | 0 | 0 |
| 0cc4m | 55 | 4 | 0 | 10 |
| allozaur | 57 | 3 | 0 | 9 |
| No author found | 54 | 0 | 0 | 0 |
| matrixportalx | 54 | 0 | 0 | 0 |
| richarddd | 51 | 0 | 0 | 0 |