Weekly GitHub Report for Xla: July 28, 2025 - August 04, 2025 (12:00:16)

            Weekly GitHub Report for Xla: July 28, 2025 - August 04, 2025 (12:00:16)

            Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

Correctness error for scatter: This issue reports a correctness error related to the scatter operation in the stablehlo dialect, where the variadic scatter appears to incorrectly share buffers between operands, leading to wrong results in some cases. The problem is demonstrated with test code showing that expected outputs do not match actual outputs, particularly highlighting discrepancies across different backends like Interpreter, GPU, CPU, and TPU.  
The comments discuss the nature of the bug, suggesting that the scatter operation shares buffers in-place which causes aliasing issues for multiple operands. It is proposed that inserting copies for the second operand might fix the problem, and observations are shared about inconsistent behavior across different execution backends, with some producing correct results and others failing.
Number of comments this week: 4

Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

New nvshmem rule breaks the build: This issue reports a build failure caused by a new nvshmem rule introduced in a recent pull request, which leads to an error related to the absence of a getenv method in the repository_ctx object during the CUDA configuration step. The reporter is seeking guidance on whether they need to update their side to resolve this error, particularly in relation to changes mentioned for JAX, or if the fix must come from the open_xla project, along with an estimated timeline for addressing the problem.
Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported model with XLA AOT but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit, and has provided code snippets and bytecode samples to help diagnose the issue.
support bazel modules: This issue is about inquiring whether there are plans to adopt Bazel modules within the project, as Bazel modules have seen significant adoption. The user points out that XLA is currently the only package in their Bazel build that does not support Bazel modules, highlighting a potential gap in compatibility.
Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update correctly adjusts the lowLatencyBandwidth for AMD links but fails to apply the corresponding update to the CUDA section. As a result, invoking the gpu_collective_performance model with H100 GPU settings leads to a failure, indicating incomplete handling of bandwidth parameters across different GPU architectures.
Since there were fewer than 5 open issues, all of the open issues have been listed above.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 2
Summarized Issues:

Scatter Operation Correctness: This issue highlights a correctness error in the variadic scatter operation implementation in stablehlo, where sharing the buffer between multiple scatter operands causes incorrect results. The problem is particularly evident in failing test cases on certain backends such as the interpreter and CPU.  
issues/29362

Fusion Regression on Nvidia GH200 GPUs: A regression in the fusion behavior of the XLA CUDA backend on Nvidia GH200 GPUs causes the creation of unnecessarily large temporary tensors, leading to a significant increase in memory footprint and runtime. This issue is observed in JAX versions after 0.4.36 and is demonstrated with a reproducible Python example and detailed memory usage comparisons.  
issues/29620

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 0
Summarized Issues:
As of our latest update, there were no issues closed in the project this week.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 6
Key Open Pull Requests
1. SPMD Dot Tests: This pull request adds end-to-end tests for the Single Program Multiple Data (SPMD) partitioning of dot operations to ensure their correct functionality.

URL: pull/29511

Merged: No

Associated Commits: b0dd0, 45c81

2. [XLA:CPU][oneDNN] Refactor OneDnnThreadPool to support asynchronous execution using ParallelLoopRunner: This pull request refactors the OneDnnThreadPool to support asynchronous execution by integrating a ParallelLoopRunner, adding a new constructor, updating tests, and implementing asynchronous flags and wait functionality to enable future asynchronous oneDNN custom calls within XLA’s async runtime.

URL: pull/29459

Merged: No

Associated Commits: 0a431

3. bazel: fix apple_support usage in bazelrc: This pull request aims to fix the usage of apple_support in the bazelrc configuration file to address the issue reported in https://github.com/openxla/xla/issues/27099.

URL: pull/29475

Merged: No

Associated Commits: 26257

Other Open Pull Requests

oneDNN backend improvements: These pull requests focus on enhancing the oneDNN integration in the XLA backend. One PR refines transpose folding conditions to avoid performance regressions, while another updates the aarch64 oneDNN builds to support the Graph API and adds aarch64 to the supported platforms list.  
[pull/29506, pull/29646]

Code readability enhancements: This pull request improves code clarity by renaming leftover cudnn sdpa tensor variables in the XLA GPU backend, making the codebase easier to understand and maintain.  
[pull/29707]

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 12
Key Closed Pull Requests
1. [Feature]: [JAX/XLA] hipblaslt support on gfx11* jaxlib-v.0.6.0: This pull request proposes adding hipBLASLt support for gfx11 GPUs in the jaxlib version 0.6.0, aiming to extend the ROCm backend capabilities by integrating this library for improved GPU-accelerated linear algebra operations, although it was not merged.

URL: pull/29675

Merged: No

Associated Commits: 89ba5, 4ad91, 86b90, 1ef70, 7444f, 70ea1, 681df, ab2a8, 10a74, d6704, 23108, 89d05, b91a9, ab568, a7ce4, a755d, 6b54b, 924ab, 62191, fda2b, 3c938, 5493d, 25d9a, 915d5, 241ab, d6fee, 633a9, a6376, 5420a, 03723, 380d4, e7ff3, 3408a, 849c1, 3cec5, f782c, 96904, 3735a, 3727a, 73d46, 0016f, 232ce, cdf0a, 753e8, d36a0, 08ccf, 3289a, 2b816, 80d4a, 51e6f, e097b, 7b193, 91b29, 24406, 6de30, 7b6aa, 468aa, 42ac7, f23df, ad937, c23af, 805e8, ca6ea, 58b2a, 9048b, 842db, c8272, e23cc, dbc4b, 72eb3, 42a06, feff4, da8a2, 05f3a, b347e, 73cd0, 7943b, da6ba, 56410, b62e6, 41163, a7a81, c2b39, 1dd1c, 24236, 8c212, ba2ad, 87c2d, fe43e, c6294, 4ccec, 8cb80, 29640, 0f5d7, b30a7, 46cff, 9e8a3, 1e689, c601b, e3821, c674f, 2fa60, 9904e, 1f038, 9f046, cb390, c7c61, e8ec0, a37a0, e8b30, 889bb, d537a, 1def8, 08c8d, 952e1, ee3f5, f8dad, 01369, 13aa1, f16bb, a7e1a, 1fc58, 24a71, 37c91, f2969, 8d71d, 18de4, bee28, 57517, 420d6, dc11a, 98631, f9e8d, 8a6e5, b91d4, 5701b, 4d8fc, 54b1b, e03f3, 7676e, 8131e, 3a102, 27c76, 762ad, 175b6, 28d02, baaed, d4273, e5dea, b1198, da4bd, 3566a, f567c, 50bff, 18ffd, 185f7, 9b6ce, 4f583, 7b9d3, 65fdf, 853b9, e278c, 8d3ee, eb774, 7a1c2, 80c36, 635f3, 04130, 255d6, 65069, 390a3, e71c2, b0a16, 9623a, 88cbd, 03f74, f0787, 6dc13, 7c16d, f6351, 17f42, 1ff12, 0afdf, 73058, b17b1, 0628c, f0859, fd06a, 40292, 09fce, d27ba, 077d4, dd90b, 53a03, 45cad, 092c2, 88d52, 2012b, ec7fc, 797d3, 951b7, a165e, 38346, 69d9c, c0763, f9c50, bd907, 32187, 86ebe, b5636, cb142, 8188c, a6613, 537ab, b34c1, 4f352, 16bed, 43a5e, 9f8cc, 95bc1, 82354, c5743, 6bac9, fb728, 3eb0b, 53014, 8e52f, 18290, 26c26, 967ce, 91607, 4b883, fab16, e89be, c396a, f8b3e, 407e4, c5966, 699d2, 830e7, a4fe4, bce92, ce393, 7da51, 35691, 4632f, d2d9e, 60502, 8a46e, 6aaec

2. [Feature]: [JAX/XLA] hipblaslt support on gfx11* jaxlib-v0.6.0: This pull request proposes adding hipBLASLt support for the gfx11 GPU architecture in the jaxlib-v0.6.0 ROCm backend, aiming to enhance GPU acceleration capabilities for AMD hardware.

URL: pull/29676

Merged: No

Associated Commits: 13d28, a7bb0, 08c9b, 9b74a, b03cd, c62e4, 53126, 4667e, ae2d3, d3f94, d0c29, 5be95, 497cf, 25316, 5b460, 109e1, 2e04d, 76eb7, 28f10, 6484d, ea4cd, 5c042, 55a8c, 7df1e, 7566a, 84d14, d13b3, 7bf45, 6ac14, de95c, 74854, f88a7, 1add4, 0cb54, 8fc19, b854d, b1f3e, 74101, 2fe5c, e8112, 68b4b, e03a8, 32eaf, c165e, 32e0c, d8c44, 5e7b4, cf65a, 60eb5, 7b708, 6fa7f, 3ed77, 10f52, 375a9, 6a540, 44f7d, 97dd5, 28b2d, 5af59, 76ece

3. Fix the oneAPI Presubmit failure: This pull request addresses the failure of the XLA Linux x86 GPU ONEAPI pre-submit by fixing issues related to certain flags being overridden, which were preventing the hermetic build from running successfully.

URL: pull/29342

Merged: No

Associated Commits: cedc8, abb8a, d2dc3, 6e604, 33a3c, 329b7, 253cd, 326fb, 25816, fea05, e841d, 9d6e9

Other Closed Pull Requests

API Extension for PJRT C Extension: This pull request adds an API to the PJRT C extension for registering Foreign Function Interface (FFI) call handlers, enhancing the PJRT API's extensibility by allowing dynamic registration of platform-specific custom calls. This improvement facilitates more flexible and customizable platform integrations.  
pull/29390

Bug Fixes in FP16 and oneDNN Post-Operations: These pull requests fix issues related to FP16 matmul tests by ensuring the SUM post-operation works correctly for FP16 data types and address a bug in the XLA CPU backend by correcting the population of oneDNN post-op arguments to prevent incorrect argument usage. Both changes improve the correctness and stability of computations involving oneDNN and FP16 data.  
pull/29392, pull/29622

Build and Linking Fixes: This pull request resolves a presubmit failure in the XLA Linux x86 GPU ONEAPI build caused by an invalid linker error related to the '-fuse-ld=lld' argument, enabling the hermetic build to complete successfully. It addresses critical build infrastructure issues that block successful compilation.  
pull/29517

CUDA and Related Package Version Updates: This pull request updates the cuda_redist_versions.bzl file to add support for CUDA versions 12.9.0 and 12.9.1, cuDNN versions 9.9.0, 9.10.0, 9.10.1, and 9.10.2, NVSHMEM version 3.3.9, and upgrades the nvidia-nccl-cu12 package to version 2.27.5. These updates ensure compatibility with the latest CUDA ecosystem components.  
pull/28837

NVTX Upgrade and Kernel Annotation: This pull request upgrades NVTX to version 3.2.1 and annotates the outputs of cuBLAS and cuDNN to mark them as initialized, aiming to prevent false positive initcheck failures reported by compute-sanitizer for kernels using TMA. This enhances debugging and validation accuracy for GPU kernels.  
pull/28782

CUDA Platform Registration Fix in Tests: This pull request fixes a bug in the GpuAotCompilationTest where the CUDA platform was not properly registered, causing test failures. It modifies the BUILD file to ensure the CUDA platform is linked and registered before test execution, improving test reliability.  
pull/29332

ROCm Device Description Cleanup: This pull request proposes cleaning improvements to the ROCm device description in the project but was not merged. The changes aimed to improve code clarity and maintainability for ROCm support.  
pull/28936

Documentation Heading Level Adjustment: This pull request proposes changing the heading level for the TPU XLA flags section to match that of the GPU XLA flags section, ensuring it appears correctly in the right-hand navigation, but it was not merged. This change was intended to improve documentation navigation consistency.  
pull/29653

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

othakkar
7
4
0
8

mraunak
14
2
0
0

penpornk
0
0
0
12

terryysun
6
3
0
1

frgossen
0
0
0
10

shawnwang18
6
3
0
0

hugomano
4
2
1
2

alekstheod
6
2
0
0

hmonishN
6
1
0
0

philipphack
3
2
0
2

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
othakkar	7	4	0	8
mraunak	14	2	0	0
penpornk	0	0	0	12
terryysun	6	3	0	1
frgossen	0	0	0	10
shawnwang18	6	3	0	0
hugomano	4	2	1	2
alekstheod	6	2	0	0
hmonishN	6	1	0	0
philipphack	3	2	0	2