Weekly GitHub Report for Llama.cpp: March 31, 2025 - April 07, 2025 (12:09:28)

            Weekly GitHub Report for Llama.cpp: March 31, 2025 - April 07, 2025 (12:09:28)

            Weekly GitHub Report for Llama.cpp
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
The current version of this repository is b4991
1.2 Version Information:
The version released on March 29, 2025, introduces key updates and changes, though specific details are not provided in the data. Notable highlights or trends cannot be identified without additional information.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

Cannot compile SYCL backend SYCL_LIBRARY=SYCL_LIBRARY - NOTFOUND as per documentation: This issue involves a user encountering a compilation error when attempting to compile the SYCL backend using the Intel oneAPI Base Toolkit version 2025.1.0, as the SYCL library is not found, contrary to the documentation. The user followed the documented steps for Windows, but the process failed due to missing SYCL library support, leading to errors in the CMake configuration related to the IntelSYCL package.

The comments discuss the issue being specific to oneAPI 2025.1, with suggestions to use oneAPI 2025.0 as a workaround or apply a patch to the IntelSYCLConfig.cmake file. Users share their experiences and solutions, including modifying the SYCL_FEATURE_TEST_EXTRACT function and addressing performance issues. The conversation also touches on potential environment-related causes and the need for a clean build or CMake reinstallation to resolve the problem.
Number of comments this week: 11

When will llama.cpp's vulkan provide support for Intel Arc's matrix core?: This issue is about a user inquiring when the llama.cpp project will provide Vulkan support for Intel Arc's matrix core, highlighting the current lack of support and performance issues with Intel's implementation. The discussion reveals challenges with the VK_KHR_cooperative_matrix extension on Intel hardware, which currently results in reduced performance and incorrect outputs, and mentions ongoing efforts and limitations in addressing these issues.

The comments discuss the performance and implementation challenges of the VK_KHR_cooperative_matrix extension on Intel hardware, with users sharing their experiences and testing results. There is mention of driver issues and incomplete support in the kernel, with some users trying alternative drivers without success. The conversation also touches on the hope for broader adoption of improved extensions like coopmat2, which are currently vendor-specific.
Number of comments this week: 8

Feature Request: llama 4: This issue is a feature request for the integration of Llama 4, a newly released multimodal large language model (LLM), into the ggml-org/llama.cpp project. The request highlights the potential benefits of using Llama 4, such as its improved multimodal capabilities and the availability of its technical details and weights.

The comments discuss the differences between Llama 4 and its predecessor, Llama 3.3, including architectural changes and performance improvements. There is anticipation for more details to be revealed at an upcoming event, LLAMACon. Some comments provide technical insights into the model's architecture, such as interleaved attention layers and chunked attention, while others share links to related resources and forks.
Number of comments this week: 7

Compile bug: compilation warnings (clang) Introduced in #10558: This issue reports compilation warnings generated by the MUSA backend when compiled using Clang, specifically in the file ssm-conv.cu, which were introduced in a previous commit. The warnings include casting from const float * to char * that drops the const qualifier and unused parameters, which are not reported by NVCC by default.

The comments discuss the origin of the issue, with a plan to fix it later in the week. There is a discussion about whether the warnings are specific to the MUSA backend, with a conclusion that it is likely compiler-related since Clang reports these warnings while NVCC does not. A request is made to test a branch for warnings before submitting a fix, and instructions are provided to verify the issue using a Docker container.
Number of comments this week: 6

Eval bug: Jinja not replacing date_string: This issue reports a bug in the Llama project where the Jinja template engine is not replacing the date_string variable as expected. The problem occurs when running the Llama server with specific configurations, and the user suggests using the strftime_now function to address the issue, while also discussing potential workarounds and improvements.

The comments discuss the inability to pass variables simply, suggesting the use of strftime_now for date replacement. There is a conversation about whether this feature will be supported, with a contributor expressing interest in making the time overridable and synced with strftime_now. The discussion also touches on the use of variables in other models and the limitations of strftime_now with certain date formats.
Number of comments this week: 5

2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

Kompute-based Vulkan backend shows an GGML_OP_GET_ROWS error: This issue pertains to a problem with the Kompute-based Vulkan backend in a GitHub project, where it triggers a GGML_OP_GET_ROWS error. The error does not occur with other Vulkan backends, indicating a specific compatibility or implementation issue with the Kompute-based approach.
Feature Request: Task Cancellation on Client Disconnection: This issue is a feature request for the current embedding server setup, aiming to implement task cancellation when a client disconnects to prevent unnecessary processing of queued tasks, which can lead to inefficiencies and potential server overload. The request highlights the need for the server to terminate task processing upon request cancellation, ensuring that new requests are processed promptly without delay, especially during high-load scenarios.
Question: How to generate an MPS gputrace: This issue is about a user seeking guidance on how to generate a Metal Performance Shaders (MPS) gputrace for the llama.cpp project during model inference. The user is working on improving the Metal backend for a project and is looking for documented methods or known practices to obtain debugger output similar to what is provided by the Metal Debugger in Xcode.
common: download from URL, improve parallel download progress status: This issue addresses the need to improve the progress status display for parallel downloads when retrieving sharded models, as the current implementation causes conflicts in the progression indicators. The proposed solution involves properly implementing the CURLOPT_NOPROGRESS option to ensure accurate and non-conflicting progress updates during the download process.
Prompt eval is 5x slower than in Ollama and maxes out the CPU: This issue highlights a significant performance discrepancy between the llama.cpp and ollama implementations when running the same Q4_K_M model on similar hardware, with ollama achieving a prompt evaluation rate that is five times faster than llama.cpp. The user notes that despite both implementations utilizing the GPU, llama.cpp maxes out CPU usage during prompt evaluation, and there are differences in buffer sizes and graph splits that may contribute to the performance gap.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 25
Summarized Issues:

Compilation Errors and Warnings: Users have reported various compilation issues across different environments and compilers. These include errors with g++ on Linux due to invalid type conversions, Clang warnings in the MUSA backend, and MSVC 2022 issues with char8_t conversions. Additionally, there are problems with Intel's oneAPI and the Accelerate backend on Mac, causing build failures. 
ggml-org/llama.cpp/issues/12661, ggml-org/llama.cpp/issues/12685, ggml-org/llama.cpp/issues/12696, ggml-org/llama.cpp/issues/12740, ggml-org/llama.cpp/issues/12765

Feature Requests for Model Support: There are multiple requests for adding support for new models in the llama.cpp project. These include the StarVector-8b/1b model, Qwen2.5-Omni model, and Llama 4, each highlighting the need for enhanced capabilities in handling different data types and improved processing efficiency.
ggml-org/llama.cpp/issues/12666, ggml-org/llama.cpp/issues/12673, ggml-org/llama.cpp/issues/12774

Performance and Optimization Issues: Users have encountered performance degradation in various scenarios, such as the RDNA4 prefill process and Q4_K model weight repacking. These issues are often linked to specific hardware or software configurations, with suggestions for optimization and parallelization to improve speed.
ggml-org/llama.cpp/issues/12759, ggml-org/llama.cpp/issues/12764

Bugs in Execution and Functionality: Several bugs have been reported affecting the execution of models and functions. These include issues with the llama_tokenize function, Vulkan backend memory preferences, and the llama-quantize module causing crashes. Additionally, there are problems with the trim method and Jinja template engine not functioning as expected.
ggml-org/llama.cpp/issues/12743, ggml-org/llama.cpp/issues/12748, ggml-org/llama.cpp/issues/12753, ggml-org/llama.cpp/issues/12710, ggml-org/llama.cpp/issues/12729

GPU and Hardware Compatibility Issues: Users have faced challenges with GPU memory usage and compatibility, such as excessive GPU memory usage with Vulkan and CUDA errors on specific GPUs. These issues often require workarounds or hardware-specific solutions to resolve.
ggml-org/llama.cpp/issues/12663, ggml-org/llama.cpp/issues/12675, ggml-org/llama.cpp/issues/12717, ggml-org/llama.cpp/issues/12690

Backend and Platform Support Issues: There are issues related to backend support and platform compatibility, such as the lack of support for Mac Catalyst and Intel Arc's matrix core. These issues highlight the need for broader platform support and improved backend implementations.
ggml-org/llama.cpp/issues/12751, ggml-org/llama.cpp/issues/12690

Model Execution and Loading Errors: Users have reported errors related to model execution and loading, such as the Qwerky 72B model failing to load with specific options and the llama-server model insisting on GPU usage. These issues often require adjustments in execution parameters or configurations.
ggml-org/llama.cpp/issues/12692, ggml-org/llama.cpp/issues/12675

Feature Requests for API Enhancements: There are requests for enhancements in the LLAVA_API, such as methods to return image token counts, which are crucial for managing complexity in multimodal models. These requests aim to improve the usability and functionality of the API.
ggml-org/llama.cpp/issues/12689

Execution and Performance Bugs: Bugs affecting execution and performance, such as system hangs with long prompts and performance issues with specific settings, have been reported. These issues often require detailed investigation and potential codebase changes to resolve.
ggml-org/llama.cpp/issues/12758

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 14
Summarized Issues:

Segmentation Fault and Runtime Errors in Qwen2-VL Models: Issues have been reported regarding segmentation faults and runtime errors when using the Qwen2-VL models on different platforms. On a Mac with an M3 Max processor, a segmentation fault occurs with the Metal backend, while a runtime error is encountered due to missing metadata in the GGUF format model file.
issues/12405, issues/12658

Bugs in GGML Backend and Llama-CLI Tool: The GGML backend and llama-cli tool have several bugs affecting output and performance. On Mac, the output is a repetitive sequence of '88888888', and on another occasion, repeated log messages indicate issues with KV cache updates, impacting model performance.
issues/12441, issues/12730

Model Loading and Tensor Shape Mismatch Errors: Loading models on different systems has led to errors, such as a tensor shape mismatch in the Qwerky QwQ 32B model on Windows with CUDA, which was resolved by reconverting the model.
issues/12662

Feature Requests and Activation Functions: A feature request has been made to support Scaled ReLU or SwiGLU activation functions in the DeepSeek-V3 model. The lack of these functions causes script failures and is believed to enhance model accuracy.
issues/12653

Build Failures and Configuration Issues: Build failures and configuration issues have been reported, such as RISCV cross-compile warnings requiring a GCC upgrade and a CMake configuration failure with the SYCL backend due to filesystem mount options.
issues/12693, issues/12715

Runtime and Performance Issues on ARM and Vulkan: Runtime issues on ARM processors and performance regressions in Vulkan have been noted. The Q4_0 quantized models fail on ARM, and Vulkan's token processing speed decreased on Iris Xe graphics.
issues/12701, issues/12754

Bugs in LlamaSharp and Vulkan Buffer Allocation: LlamaSharp software has a bug in ubatch preparation on Windows with CUDA, and Vulkan faces buffer allocation failures due to device memory limits, affecting model execution.
issues/12726, issues/12728

Command Option Bugs in Llama.cpp: The examples/gguf-split command has a bug where the --merge operation does not respect the --dry-run option, unlike the --split operation, leading to inconsistencies in command execution.
issues/12680

Tokenization and Special Token Handling: Slow tokenization times in the Gemma 3 model are due to inefficient handling of special tokens, which can be improved by sorting tokens or applying a patch to reduce execution time.
issues/12724

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 18
Key Open Pull Requests
1. DeepSeek V2/V3 with -mla option (final): This pull request introduces the final version of DeepSeek V2/V3 with the -mla option, addressing issues related to tensor separation for attn_k_b_trans and attn_v_b, and includes various fixes and optimizations such as renaming variables, improving code tidiness, and ensuring compatibility with different attention mechanisms, while the author expresses a desire to conclude their involvement after extensive testing and development efforts.

URL: pull/12772

Merged: No

Associated Commits: b4c16, 10207, ea3c0, 1f604, 1de07, 7f92e, 319e3, ee4b3, c00cd, 55ad3, 0c86f, b0c8a, 8c329, 68302, 937a4, 1fd0a, 4fb43, f9a0e, 5fe40, 9b862, 8e23e, b3840, 5dbf9, 01a61, c0ffe, 8d12c, 997a4

2. WIP: Add support for CogAgent: This pull request introduces support for CogAgent, a visual model designed for GUI recognition and visual grounding, by integrating two CLIP encoders—one for standard vision tasks and another for high-resolution images—into the existing infrastructure, while awaiting the completion of a new vision infrastructure to finalize its implementation.

URL: pull/12679

Merged: No

Associated Commits: 2a458, 0a810, 6cabd, d0068, 4a7ab, 431bb, bd071, ad38e, 32daa, 9716c, ba489, c0d93, 8586d, 25a97, c3a65, b986a, b72d7, 0959c, 90eef, e884d, 07f58, 4c7ac, c4cf4, 5c19d, 1343d, b5184

3. cmake : enable curl by default: This pull request proposes enabling curl by default in the llama.cpp project, as it has become integral to the user experience in examples and is already included in most pre-built versions, including Docker images and release binaries, reflecting a shift from the initial decision to keep it disabled due to potential absence of libcurl on target systems.

URL: pull/12761

Merged: No

Associated Commits: 6080f, 64557, 2cc89, 79307, 2238e, 707f2, 79509, 21c42, 9bf42, a8a7e, a9637, 04edd, 64faa, 1c1c2

Other Open Pull Requests

Enhancements to llama_tensor_get_type function: This topic involves modifications to the llama_tensor_get_type function in llama-quant.cpp to improve compatibility with DeepSeek models. The changes focus on optimizing performance for models with varying numbers of experts and improving perplexity metrics.
pull/12727

Support for gguf models from ModelScope: This topic covers the addition of support for downloading and using gguf models from the ModelScope community on multiple platforms. The pull request includes successful tests for Hugging Face and ModelScope downloads, along with various code improvements and fixes.
pull/12664

Introduction of --show-statistics option: This pull request introduces a new --show-statistics option to the imatrix tool. It generates a detailed report on the importance score statistics of tensors and layers, aiding in layer-wise quantization analysis.
pull/12718

Refactoring CPU operations and CUDA/MUSA checks: This topic involves refactoring CPU operations by moving operators into a separate C++ file and addressing warnings. It also includes improvements to the Arm fp16 CPU logic and reintroduces CUDA/MUSA checks.
pull/12732

Chat memory interface implementation: This pull request proposes a proof-of-concept for a chat memory interface inspired by ChatGPT's memory feature. It aims to integrate a simple key/value store for session-specific memory management with minimal code changes.
pull/12698

Integration of Ultravox audio input: This topic covers the integration of Ultravox audio input using a Whisper encoder and a vanilla Llama 3.2 1B model. The goal is to enable an efficient audio-to-summary pipeline, although the current implementation produces incorrect output.
pull/12745

Update to rope_multi function: This pull request proposes an update to the rope_multi function by introducing an in-place version called ggml_rope_multi_inplace. It also replaces a hardcoded value with GGML_MROPE_SECTIONS.
pull/12665

Resolution of Android file access issues: This pull request addresses file access permission problems causing abnormal exits on Android devices. The issue is resolved as detailed in a specific commit.
pull/12712

Refactoring of CANN component: This topic involves refactoring the CANN component to minimize duplicate code. The pull request is open for review and aims to streamline the codebase.
pull/12731

Removal of redundant memory copy operation: This pull request proposes the removal of a redundant memory copy operation in the ggml_backend_sycl_buffer_set_tensor function. The change aligns its logic with the default ggml backend and ggml-cann.
pull/12734

Enhancement of Docker GPU images for CPU compatibility: This topic addresses issue #12500 by adding all CPU variants to Docker GPU images. The enhancement resolves compatibility issues with 'token_embd.weight' processing on CPUs.
pull/12749

Improved identification of Adreno GPUs: This pull request enhances the identification of Adreno GPUs by checking for "Qualcomm" in the device name. It ensures the complete device name is accurately captured.
pull/12760

Removal of unused 'min_compute_capability' code: This pull request proposes the removal of the unused 'min_compute_capability' code from the SYCL component. The code is not utilized anywhere in the codebase.
pull/12768

Performance improvement with direct accumulation: This pull request replaces the traditional accumulate-to-zero pattern with direct accumulation into the output register. The change results in a ~12% speedup in prompt evaluation performance on an AMD Ryzen 9 9950X platform.
pull/12773

Resolution of Android continuous integration issue: This pull request addresses a long-standing continuous integration issue in the Android build. The issue was potentially introduced by a previously approved pull request and has been verified through a specific GitHub Actions run.
pull/12775

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 59
Key Closed Pull Requests
1. ci: add Linux cross-compile build: This pull request introduces a cross-compile build process targeting RISC-V architecture on Linux, aiming to minimize regression issues related to cross-compiling and providing a guide for cross-compilation using Ubuntu, with potential future updates to store artifacts for broader hardware compatibility.

URL: pull/12428

Merged: 2025-04-04T17:05:13Z

Associated Commits: b437b, 6a447, ddd7b, d14ed, 10edb, 344bd, ce528, ce6e5, b52bb, 829bc, d2ac2, d8d3b, 7e276, ef737, 05bfb, 2ec6e, bb935, 6140b, 16aef, ee518, f9235

2. clip : refactor clip_init, add tests: This pull request refactors the clip_init function by introducing a clip_model_loader, adds a testing script llava/tests.sh for evaluating multiple models, implements an enum patch_merge_type to replace string comparisons, removes the bool has_(tensor name) pattern, and includes various code improvements and fixes, such as style adjustments, logging system refactoring, and model-specific updates, with successful test results for several models.

URL: pull/12757

Merged: 2025-04-05T15:17:40Z

Associated Commits: 44adf, 79c56, dd508, 7b9e7, ee1fa, b41ac, 6fe68, eeea3, 17be2, 85370, 376f8, 84b35, 88aec, c4bb0, 9d4ba, 13b2d

3. Fix clang warning in MUSA compiler: This pull request addresses and attempts to fix warnings generated by the MUSA compiler in the project, specifically targeting issues highlighted in a previous pull request (#12685), and includes various commits such as optimizing the ssm_scan function, removing unused comments, applying clang formatting, and modifying unnecessary calculations.

URL: pull/12703

Merged: No

Associated Commits: 65180, 6a6c9, 1e645, 828e4, e52a2, 0dd48, c009e, c9a07, 1e998

Other Closed Pull Requests

Downloading System Refactor: This topic involves refactoring the downloading system by removing JSON usage, adding a --mmproj-url option, and simplifying model path handling. These changes aim to improve usability and address multi-shard download issues and platform compatibility.
pull/12694

KV Cache Refactor: The refactoring of the KV cache guard mechanism simplifies its operation and prepares for a separate recurrent cache implementation. It ensures llama_decode returns 1 when a batch cannot fit and restores the KV cache state upon failure.
pull/12695

Web UI Package Upgrade: This upgrade involves updating daisyui and tailwindcss packages in the server web UI, with code fixes and multiple commits. The changes include switching themes, reverting changes, updating formatting, and adding an index.html.gz file.
pull/12735

SYCL Component Changes: The removal of the ggml_sycl_op_flatten function from the SYCL component is part of a series of changes. These include removing trailing whitespace, fixing the L2 norm, and adding a try-catch block for sycl::exception.
pull/12387

Custom Chat Template Support: This pull request introduces support for a custom chat template to accommodate Yandex's upcoming 8B instruct model. The changes ensure compatibility and functionality, verified by local testing.
pull/12621

CANN Backend Optimization: The optimization of get_rows and dup operators in the CANN backend replaces the AscendC implementation with the aclnn library. This results in improved performance metrics, such as reduced sampling and evaluation times.
pull/12671

Upstream Synchronization: This pull request synchronizes changes from an upstream repository, including file renaming and code modifications. It addresses compatibility issues with the Cosmo STL and adds new files, although it was not merged.
pull/12676

Sesame Support Draft: This draft pull request adds Sesame support by translating safetensor models to gguf format. It includes scripts for splitting and converting models, with translation accuracy still being verified.
pull/12549

Trillion-7B-preview Model Support: Support for the Trillion-7B-preview model is added, a large language model supporting multiple languages. Changes are primarily made to the tokenizer within the Llama architecture.
pull/12556

BailingMoE Support: This pull request adds support for BailingMoE, including links to various models on Hugging Face. The Ling-plus model remains untested due to its size, and YaRN is not currently supported.
pull/12634

Quantifier Reversion: Issues caused by possessive quantifiers are addressed by reverting them to greedy quantifiers. The pull request includes changing quantifiers, adding tokenizer test files, and deleting specific vocabulary files.
pull/12677

CANN Backend Memory Fixes: This pull request resolves backend operation failures and memory inefficiencies in the CANN component. It includes fixes for memory waste, backend operation failures, and code formatting improvements.
pull/12708

OpenCL Documentation Update: The documentation for the OpenCL backend is updated by adding OpenCL information to build.md. It refines tool requirements for Windows 11 arm64 and includes a link to OPENCL.md.
pull/12702

FA Kernel Typedef Fix: The use of constexpr in FA kernels and a typedef issue are addressed. This pull request was successfully merged on March 30, 2025.
pull/12659

Vulkan Cooperative Matrix Support: Synchronization of the 'ggml' component includes improvements to CMake configuration for better Vulkan cooperative matrix support checks. Minor adjustments like fixing whitespace issues are also made.
pull/12670

CANN Backend Operator Optimization: The optimization of sin, cos, and argmax operators in the CANN backend uses the aclnn library. It ensures all tests pass successfully and includes code style adjustments.
pull/12709

Custom Hugging Face Endpoints: Support for specifying custom Hugging Face endpoints via the HF_ENDPOINT environment variable is introduced. This allows users to configure endpoints similarly to the huggingface-cli.
pull/12769

ggml-sycl Backend Configuration: The configuration and compilation of the ggml-sycl backend as a Visual Studio project/solution on Windows is enabled. It ensures compatibility with the Intel official compiler and has been tested on Windows 10.
pull/12625

Vulkan Flash Attention Optimization: The "split_k" feature for cooperative matrix flash attention in Vulkan is implemented. It optimizes performance by distributing work across streaming multiprocessors, benefiting models with large KV caches.
pull/12627

gguf-split Tool Update: The gguf-split tool is updated to respect the "dry-run" option during merge operations. This pull request includes commits for implementing this feature and removing a trailing space.
pull/12681

Clang Compiler Warning Fix: A Clang compiler warning in the gguf_check_reserved_keys function is addressed. The parameter 'val' is properly handled, as detected by the in-house CI for the MUSA backend.
pull/12686

BailingMoE Bug Fix: A bug fix in the BailingMoE module corrects the qkv split logic when the head_dim is zero. The Ling-lite-base model remains broken until a related pull request is merged.
pull/12687

FA Kernel Precision Update: Issue #12441 is addressed by updating FA kernels to use F32 precision in the Metal backend. There is no observed performance impact on the M2 Studio.
pull/12688

JSON Dependency Removal: The #include "json.hpp" directive is removed from common.cpp, and common_grammar_trigger::from/to_json functionality is relocated to the server module. This is part of a broader effort to eliminate JSON dependencies.
pull/12697

MUSA Compiler Warning Resolution: MUSA compiler warnings are resolved by replacing (void) with GGML_UNUSED. This pull request was successfully merged on April 3, 2025.
pull/12704

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

ngxson
125
8
0
60

ggerganov
99
9
2
64

zhouwg
94
4
1
37

ochafik
75
2
0
23

BradHutchings
79
1
0
0

CISC
34
8
0
25

jukofyork
39
2
0
3

0cc4m
13
3
0
27

EAddario
40
3
0
0

bandoti
30
2
1
9

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
ngxson	125	8	0	60
ggerganov	99	9	2	64
zhouwg	94	4	1	37
ochafik	75	2	0	23
BradHutchings	79	1	0	0
CISC	34	8	0	25
jukofyork	39	2	0	3
0cc4m	13	3	0	27
EAddario	40	3	0	0
bandoti	30	2	1	9