Weekly GitHub Report for Xla: September 29, 2025 - October 06, 2025 (12:01:39)

Weekly GitHub Report for Xla: September 29, 2025 - October 06, 2025 (12:01:39)

        Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 
As of our latest update, there are no active issues with ongoing comments this week. 
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

New nvshmem rule breaks the build: This issue reports a build failure caused by a new nvshmem rule introduced in a recent pull request, which leads to an error related to the absence of a getenv method in the repository_ctx object during CUDA configuration. The reporter is seeking guidance on whether they need to update their side to resolve this error, particularly in relation to changes mentioned for JAX, or if the fix must come from the open_xla project, along with an estimated timeline for such a resolution.
Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported model with XLA AOT but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit, and has provided code snippets and bytecode samples to assist in troubleshooting.
support bazel modules: This issue discusses the potential adoption of Bazel modules within the project, highlighting that Bazel modules have gained significant usage. It specifically points out that XLA is currently the only package in the user's Bazel build that does not support Bazel modules, suggesting a need for compatibility improvements.
Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update correctly adjusts the lowLatencyBandwidth for AMD links but fails to apply the corresponding update to the CUDA section. As a result, invoking the gpu_collective_performance model with H100 GPU settings leads to a failure, indicating incomplete handling of bandwidth parameters across different GPU architectures.
Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user reports that despite using the --config=cross_compile_linux_arm64 flag in the Bazel build system, the build process persistently tries to generate an x86 binary, indicating a possible misconfiguration or missing step in the cross-compilation setup.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 1
Summarized Issues:

HLO verifier and compilation failures: The newly introduced HLO verifier between pre-scheduling and post-scheduling stages causes compilation failures specifically for collective-permute operations involving mixed precision data types. These operations previously compiled successfully, raising concerns about whether this new behavior is intentional or a regression.  
issues/32222

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 0
Summarized Issues:
As of our latest update, there were no issues closed in the project this week.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 14
Key Open Pull Requests
1. Support building with Bzlmod: This pull request introduces support for building the project using Bzlmod, including various fixes, updates to dependencies, and improvements to build tools and configurations.

URL: pull/32055

Merged: No

Associated Commits: 73e8f, a4f8e, aeb4e, a59c9, 3a17e, 70d8c, 00100, ef0f8, eea96, 7afa1, 8da1c, 55098, b5e46, ddf7a, 8e43d, e808c

2. [Refactor] Completely Remove AsyncStreamKind: This pull request completely removes the AsyncStreamKind type and all its usages, replacing them with ExecutionStreamId-based logic to determine operation stream placement, while replicating previous behaviors through GetStreamIdOverride and adding new end-to-end execution tests as part of ongoing multi-stream collective work.

URL: pull/32217

Merged: No

Associated Commits: db259, 90e6e, 480c7, fb15f, 1c85c, 34b55, 1221a, d7549, fddd4, 73493, 9fc52, c8d1a, ba9c0, f9db6, ec58a, 785ad

3. [ROCm] fix rocm build xla tools hlo runner: This pull request aims to fix the ROCm build process for XLA tools by avoiding hardcoded shared object versions and resolving the build error of the multihost_hlo_runner component.

URL: pull/32002

Merged: No

Associated Commits: e6aab, f39a0, 14f39

Other Open Pull Requests

Convolution command buffer support in ROCm backend: This pull request adds support for command buffers specifically for convolution operations in the ROCm backend to reduce graph fragmentation by enabling graph capture only for explicitly listed convolution custom call targets. It also includes new unit tests and improvements to execution graph management.

pull/32053

Caching and reuse of communicators for GPU cross-process transfers: These pull requests propose using the AcquireCollectiveCliques mechanism and modifying the PjRt API to cache and reuse communicators for cross-process device-puts on GPUs. The changes aim to improve performance by reducing redundant communicator creation during multiple transfers between the same device sets and by including global device ID information in key functions.

pull/32076, pull/32074

Code organization and maintainability improvements: These pull requests focus on improving code clarity and maintainability by moving computation simplification methods from the command buffer scheduling component to a new library and merging multiple methods that query the fusion kind in the GPU codebase. These changes help streamline the codebase and reduce redundancy.

pull/31994, pull/32003

Documentation enhancements for tiling: This pull request expands the tiling documentation by adding a Motivation section and details on tiling formats to improve clarity and completeness.

pull/32107

CUDA and PTX version updates: This pull request updates the project to support PTX version 9.0 starting with CUDA 13.0 and includes a slight refactoring of the code.

pull/32187

Platform-specific test and capability adjustments: These pull requests relax error specifications to enable the BitcastReduceWithStride1Tiling test to pass on the Spark platform, update compute capabilities to differentiate between Blackwell Edge GPUs, remove the IsAtLeastBlackwellPro method, and skip latency estimator tests on Edge GPUs to avoid crashes caused by the collective performance model.

pull/32226, pull/32229

Forward convolution with dilation and heuristic improvements: This pull request introduces support for forward convolution operations with dilation and implements a basic heuristic to differentiate between forward and backward convolutions, resulting in significant performance improvements across various dilation rates.

pull/32231

Gloo build compatibility fix: This pull request updates Gloo to use a specific commit that fixes build compatibility issues with GCC 15.

pull/32240

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 6
Key Closed Pull Requests
1. Execute rbe tests locally: This pull request proposes switching the execution of remote build execution (RBE) tests to run locally because the current RBE solution is not yet ready to execute tests remotely.

URL: pull/32049

Merged: No

Associated Commits: 13d28, a7bb0, 08c9b, 9b74a, b03cd, c62e4, 53126, 4667e, ae2d3, d3f94, d0c29, 5be95, 497cf, 25316, 5b460, 109e1, 2e04d, 76eb7, 28f10, 6484d, ea4cd, 5c042, 55a8c, 7df1e, 7566a, 84d14, d13b3, 7bf45, 6ac14, de95c, 74854, f88a7, 1add4, 0cb54, 8fc19, b854d, b1f3e, 74101, 2fe5c, e8112, 68b4b, e03a8, 32eaf, c165e, 32e0c, d8c44, 5e7b4, cf65a, 60eb5, 7b708, 6fa7f, 3ed77, 10f52, 375a9, 6a540, 44f7d, 97dd5, 28b2d, 5af59, 07dce, 87e78, deadc, 2e279, f814b, 510ea, fa40e, fc9e3, 50860, efa9d, c424a, fb6dd, ff879, 8526f, e53f8, ea6de, d63b3, 3c534, 85114, b2a42, 683db, 51a7f, 13c3d, 6d8c7, 1851b, 8513f, 9c9ad, 7775d, d0ac0, f3e17, 9cfa7, 0015d, 4be9c, c8154, 4d3e0, cdfdf, 3b077, 85548, 1a0db, fe042, edab8, 09aee, fb048, 59501, 5597c, 9d358, 910f1, d65c2, ff74b, 0be8d, 67988, e36e9

2. [ROCm] make toolchain hermetic compatible with rbe for rocm CI: This pull request aims to make the hipcc toolchain hermetic and compatible with remote build execution (rbe) workflows for ROCm continuous integration, enabling support for distributed builds by using local toolchain files and relative paths.

URL: pull/32139

Merged: No

Associated Commits: c84a1, fa951, eb5a4

3. [XLA:CPU] Make rendezvous timeouts configurable via flags: This pull request proposes adding configurable flags to set rendezvous timeouts and warning delays for parallel CPU workloads in XLA, aligning default timeout values with those used for GPUs to better accommodate longer and unevenly distributed tasks.

URL: pull/32115

Merged: No

Associated Commits: baf8d, e6f28

Other Closed Pull Requests

ROCm Build Fixes: Multiple pull requests address build errors and linking issues specific to the ROCm platform. These changes ensure successful compilation and execution of XLA components on ROCm environments by fixing problems related to cupti_tracer unavailability and multihost_hlo_runner build errors.  
pull/32009, pull/31990

Flexible Configuration Size in Custom Call Tests: One pull request removes the hardcoded configuration size in the GetSupportedConfigsFromCublasCustomCall function. This update allows the test to support different environment sizes, such as Spark's size of 9, by requiring the size to be at least 2 instead of a fixed value.  
pull/32185

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

meteorcloudy
28
3
0
2

othakkar
15
6
0
9

alekstheod
21
4
0
0

amd-songpiao
7
4
1
2

terryysun
11
1
0
0

athurdekoos
9
2
0
0

sergachev
6
3
0
1

draganmladjenovic
5
3
0
2

ScXfjiang
7
2
0
0

rao-ashish
3
3
0
2

                            Don't miss what's next. Subscribe to Weekly Project News:

                        https://github.com/owner/public_repo (required)

            Email address (required)

Contributor	Commits	Pull Requests	Issues	Comments
meteorcloudy	28	3	0	2
othakkar	15	6	0	9
alekstheod	21	4	0	0
amd-songpiao	7	4	1	2
terryysun	11	1	0	0
athurdekoos	9	2	0	0
sergachev	6	3	0	1
draganmladjenovic	5	3	0	2
ScXfjiang	7	2	0	0
rao-ashish	3	3	0	2