Weekly GitHub Report for Xla: November 24, 2025 - December 01, 2025 (12:01:30)

        December 1, 2025

Weekly GitHub Report for Xla: November 24, 2025 - December 01, 2025 (12:01:30)

            Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

Proposal: Use CCCL from GitHub instead of from the CUDA Toolkit: This issue proposes that the XLA project should source the CUDA Core Compute Libraries (CCCL) directly from GitHub rather than relying on the version bundled with the CUDA Toolkit. This change aims to improve compatibility across multiple CUDA Toolkit versions and better align with RAPIDS dependencies, enabling more flexible and up-to-date builds without being tightly coupled to specific CUDA Toolkit releases.

The comment agrees that switching to the upstream CCCL will decouple CUDA version selection from CCCL and RAPIDS version constraints, though RAPIDS version selection will still be limited. It also highlights that even without upgrading RAPIDS or CCCL, this change would allow quicker adoption of upstream bug fixes, which is difficult when using the CUDA Toolkit’s bundled headers.
Number of comments this week: 1

Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

New nvshmem rule breaks the build: This issue reports a build failure caused by a new nvshmem rule introduced in a recent update, which leads to an error related to the absence of a getenv method in the repository_ctx object during the CUDA configuration step. The reporter is seeking guidance on whether they need to make changes on their side to resolve this problem or if it requires a fix within the openxla project, specifically regarding the timing and details of addressing the cuda_configure rule.
Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the 'vhlo' dialect. The user is attempting to compile the exported MLIR into an XLA binary using XLA AOT compilation but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit.
Gpu collective performance model bug: This issue concerns a bug in the gpu_collective_performance model where an update to the lowLatencyBandwidth for AMD links was made without corresponding changes to the CUDA section. As a result, invoking the gpu_collective_performance model with H100 settings leads to a failure, indicating incomplete or inconsistent updates within the performance model code.
Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user reports that despite using the --config=cross_compile_linux_arm64 flag in the Bazel build system, the build process persistently tries to generate an x86 binary, indicating a possible misconfiguration or missing step in the cross-compilation setup.
Since there were fewer than 5 open issues, all of the open issues have been listed above.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 2
Summarized Issues:

CUDA Core Compute Libraries (CCCL) sourcing and compatibility: This issue proposes sourcing the CUDA Core Compute Libraries directly from its GitHub repository rather than the bundled CUDA Toolkit version to improve compatibility across multiple CUDA Toolkit versions. This change would also enable easier integration with RAPIDS dependencies and allow more flexible and up-to-date management of CCCL versions independent of CUDA Toolkit releases.  
issues/34357

Dynamic slicing and scatter operation failure in pinned host memory: This issue describes a failure occurring when using dynamic slicing with a scatter operation on an array in pinned host memory, causing a detailed error in the XLA compiler. In contrast, a fixed slice update works correctly, highlighting a specific problem with dynamic slicing in this context.  
issues/34565

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 1
Summarized Issues:

XLA CPU backend constant-folding bug: The XLA CPU backend has a bug where constant-folding involving gather and add operations incorrectly changes the memory layout of an array from row-major to column-major. This causes unexpected results when JAX JIT-compiles a function using these operations, affecting the correctness of compiled code.  
issues/34260

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 10
Key Open Pull Requests
1. [ROCm] Use rocminfo instead of lspci as it will report all connected gpus ev…: This pull request updates the GPU detection method in the ROCm environment by replacing lspci with rocminfo to accurately report all connected GPUs, including those visible inside Docker containers, thereby ensuring valid GPU availability information for test synchronization.

URL: pull/34520

Merged: No

Associated Commits: bfd4a, ae087, bb218, b4e87

2. Migrate from native built-ins to Starlark rule definitions: This pull request migrates all native Bazel rule definitions to their Starlark equivalents and adds the necessary load statements to replace implicit native rules, preparing the project for Bazel 9 compatibility.

URL: pull/34320

Merged: No

Associated Commits: 08d48, 3dc79, 37cd6

3. [ROCM] Added command buffers support for convolutions 2nd attempt: This pull request adds support for command buffers specifically for convolution operations in the ROCm backend, introducing a new flag to enable convolution graph capture by default disabled to prevent graph fragmentation, along with new unit tests and fixes following a previous reverted attempt.

URL: pull/34572

Merged: No

Associated Commits: a409c, 86cd6, f6386

Other Open Pull Requests

NCCL all-to-all integration: This pull request integrates the NCCL all-to-all API into XLA to simplify the logic by leveraging NCCL's native support since version 2.28U2. It includes existing all-to-all tests to ensure identical behavior and adds execution tests to validate the implementation.  
pull/34493

GPU autotuner and FP8 support fixes: This pull request addresses a bug in the cublasLT implementation by adding support for FP8 matrix multiplication operations with swapped operands in the GPU autotuner. It ensures correct scaling factor handling and updates the autotuner test suite, CUDA BLAS Lt matmul code, and build dependencies.  
pull/34499

Platform-specific test fixes: This pull request fixes the failing TritonEmitterTest/RocmWarpSizeIsSetCorrectly by defining valid tile parameters and non-zero shared memory for AMD GPU architectures. Additionally, it modifies the TmaPTX tests to be CUDA-specific by skipping them on the ROCm platform, addressing platform-specific test execution bugs.  
pull/34300, pull/34533

SPIRV extension filtering: This pull request introduces functionality to block unsupported SPIRV extensions by filtering them out from the complete set of valid extensions for the default SPIRV target triple.  
pull/34356

oneDNN convolution weight prepacking: This pull request adds weight prepacking support for oneDNN convolutions to achieve full functional feature parity with oneDNN matrix multiplications.  
pull/34420

Build and CI updates: This pull request enables the layering_check feature in XLA's Bazel builds and updates the XLA GPU GitHub Actions build to use a specific branch of rules_ml_toolchain to ensure compatibility.  
pull/34543

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 15
Key Closed Pull Requests
1. [ROCm] Add support for rocm tar/wheels in hermetic builds: This pull request aims to add support for ROCm tarballs and Python wheels as hermetic dependencies in the build process to ensure Jax can correctly match and set up its ROCm-related dependencies.

URL: pull/34049

Merged: No

Associated Commits: a568e, 66b21, ec2a8, 378ba, c42ca, a8671, 898ea, 3b3f2, 882c2, 21fdb, 1dbd2, f44de, b78af, 45023, 47489, 27807, 6ed67

2. [ROCm] Include multigpu tests: This pull request proposes including multigpu tests in the ROCm continuous integration command to enhance testing coverage for multi-GPU setups.

URL: pull/34112

Merged: No

Associated Commits: 39c0a, 3c23b

3. [ROCm] Add missing dependencies to header file : This pull request addresses a build failure by adding the missing rocm_config dependency to the rocm_headers Bazel target, thereby ensuring that source files including rocm/rocm_config.h have the correct build dependencies and removing redundant independent dependencies from BUILD files.

URL: pull/34156

Merged: No

Associated Commits: 9ec19, 03c29

Other Closed Pull Requests

ROCm build and dependency fixes: Multiple pull requests address build breaks and dependency updates in the ROCm and AMD GPU toolchains. These include fixing missing system dependencies in clang++ toolchain files, switching to newer rocm_device_libs to resolve missing symbols, and adding ROCm 7.11 as a hermetic dependency for heterogeneous GPU support in remote build environments.  
pull/34372, pull/34296, pull/34438, pull/34467

cuDNN backend updates and fixes: Several pull requests focus on improving the cuDNN backend by updating the GEMM backend to handle dot algorithms, removing outdated convolution workarounds, and upgrading the cuDNN version to 9.10 to fix execution errors and support block scaled dot operations. These changes aim to enhance performance and compatibility with modern cuDNN versions.  
pull/34163, pull/34227, pull/34309

Host offloading and benchmarking additions: Two pull requests propose adding new HLO benchmark files for large models with host-offloading features to facilitate testing of performance improvements. These benchmarks address the lack of host offloading tests in the existing benchmark directory.  
pull/34333, pull/34335

Code clarity and documentation improvements: One pull request refactors the max unroll factor heuristic to improve code clarity without changing functionality, while another adds a link to optimization level documentation in GPU flag guidance to help users configure flags more conveniently.  
pull/34228, pull/34362

Memory allocation profiling: A pull request adds debug code to profile memory allocation types used by a command buffer, aiming to gather data that can improve the persistence of memory allocations.  
pull/34316

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

alekstheod
43
11
0
8

Copilot
0
0
0
11

shawnwang18
8
2
0
0

sergachev
6
3
0
0

terryysun
6
3
0
0

dimitar-asenov
0
0
0
8

pemeliya
5
2
0
0

mfrancepillois
4
3
0
0

Tixxx
3
2
0
1

sfvaroglu
2
2
0
1

                            Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Comments
alekstheod	43	11	8
Copilot	0	0	11
shawnwang18	8	2	0
sergachev	6	3	0
terryysun	6	3	0
dimitar-asenov	0	0	8
pemeliya	5	2	0
mfrancepillois	4	3	0
Tixxx	3	2	1
sfvaroglu	2	2	1