Weekly GitHub Report for Xla: August 11, 2025 - August 18, 2025 (12:01:18)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
As of our latest update, there are no active issues with ongoing comments this week.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- New
nvshmem
rule breaks the build: This issue reports a build failure caused by a newnvshmem
rule introduced in a recent pull request, which leads to an error related to the absence of thegetenv
method on therepository_ctx
object during CUDA configuration. The reporter is seeking guidance on whether they need to update their side to resolve this error, particularly in relation to changes mentioned for JAX, or if the fix must come from the open_xla project, along with an estimated timeline for such a resolution. - Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model using the torch-xla torchax export API to MLIR format, where the resulting MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported MLIR into an XLA binary using XLA AOT compilation but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit.
- support bazel modules: This issue requests the adoption of Bazel modules within the project, highlighting that Bazel modules have gained significant usage and support in the community. The reporter points out that XLA is currently the only package in their Bazel build that does not support these modules, implying a need for compatibility improvements.
- Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update to lowLatencyBandwidth for AMD links was not applied to the CUDA section, causing failures when using H100 settings. As a result, the model call with these settings does not function correctly, indicating an inconsistency in how bandwidth parameters are handled across different GPU architectures. Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 1
Summarized Issues:
- Bazel Build Integration: This topic covers the request to contribute the
hwloc
Bazel build files to the Bazel Central Registry (BCR) to improve accessibility. The issue suggests that either the current maintainers submit the files or the issue author offers to do so if there are no objections, highlighting collaboration and contribution processes. - issues/30098
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 1
Summarized Issues:
- HloEvaluator crashes with dynamic loops: The HloEvaluator encounters crashes when processing dynamic loops, particularly in scenarios involving nested while loops with dynamic loop bounds and accumulation within the HLO module. This issue highlights instability in handling dynamic control flow constructs during evaluation.
- issues/30134
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 8
Key Open Pull Requests
1. [ROCM] Splitting gridDim.x for very large reductions / loop fusion kernels, added grid-stride loops for BufferComparator/RedzoneChecker: This pull request proposes splitting the gridDim.x dimension into gridDim.x and gridDim.y for very large reduction and loop fusion kernels on ROCm to address failures caused by large grid sizes, adds grid-stride loops to the BufferComparator and RedzoneChecker kernels, extends related tests, and improves error logging for kernel launch failures.
- URL: pull/30127
- Merged: No
2. [XLA:GPU]Improve Flag Handling while Linking for oneAPI: This pull request improves the handling of linker flags in the XLA GPU backend for oneAPI by using --whole-archive and --no-whole-archive flags to include all symbols from object files, thereby preventing command-line overflow errors that cause linking failures, and adds support for a VERBOSE=1 environment variable to aid in debugging the compiler invocation.
- URL: pull/30072
- Merged: No
3. [GPU] Tweak NVML library loading error message wording: This pull request modifies the NVML library loading error message by removing the word "Error" to reduce confusion during debugging in non-MNNVL clusters, changing it to a warning instead.
- URL: pull/30239
- Merged: No
Other Open Pull Requests
- Bazel command update in XLA Linux x86 GPU oneAPI presubmit job: This pull request changes the Bazel command from 'bazel test' to 'bazel build' to fix failures caused by the lack of test targets and the need for Intel hardware to run tests. The update ensures the presubmit job runs successfully without requiring unavailable test targets.
- pull/30342
- CUDA 13 API compatibility update: This pull request modifies the XLA code to use cuGraphAddNode_v2 to handle the API change introduced in CUDA 13, maintaining compatibility with CUDA 12.3 and later. It also notes that supporting earlier CUDA versions would require conditional compilation.
- pull/30179
- ROCm executor bugfix for peer-to-peer access: This pull request fixes a bug in the ROCm executor by properly enabling peer-to-peer access, which resolves a previously failing unit test for all_reduce operations. The fix improves the reliability of the ROCm backend.
- pull/30276
- Triton dot fusion emitter performance and stability improvements: This pull request enables block_n=8 in the Triton dot fusion emitter in the XLA GPU backend, significantly improving performance for shapes with small non-contracting dimensions. It also updates the TritonDotFusionSearchSpace to avoid runtime failures caused by certain configuration parameters.
- pull/30317
- XLA CPU backend dot to oneDNN Matmul rewrite criteria update: This pull request updates the criteria for rewriting Dot operations to oneDNN Matmul in the XLA CPU backend, focusing on data types, tensor shapes, and canonical ordering. These changes refine the conditions under which the rewrite occurs to improve correctness and performance.
- pull/30328
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 11
Key Closed Pull Requests
1. [XLA:GPU] Add parent pointer for command buffer to track CommandBuffer/NestedCommandBuffer: This pull request introduces a parent pointer in the CudaCommandBuffer to link nested command buffers with their parent, enabling the graph executor to traverse from nested to top-level command buffers for explicit updates of nested command sequences.
- URL: pull/30036
- Merged: No
2. [XLA:GPU] Add ChildCmd to command buffer cmd.: This pull request proposes adding a new ChildCmd
command type to the command buffer to enable hierarchical construction of command sequences, implements move semantics for CUDA child nodes to maintain valid graph handles, and renames related functions for clarity and consistency.
- URL: pull/30045
- Merged: No
3. Force command buffer usage on all compatible custom calls. : This pull request enforces the wrapping of all registered compatible custom calls in command buffers regardless of the number of other operations available for combination, implementing Proposal A to ensure consistent command buffer usage.
- URL: pull/30183
- Merged: No
Other Closed Pull Requests
- CUDA version updates for XLA GPU backend: Multiple pull requests propose updating the CUDA build version for the XLA GPU backend to newer releases, specifically versions 12.9 and 12.9.1. These updates aim to keep the backend current but none of these changes have been merged yet.
- pull/30044, pull/30062
- Command buffer enhancements in XLA GPU backend: Several pull requests focus on enabling and configuring command buffers for improved GPU operations, including enabling command buffers for block scaled dot calls in the cuDNN backend and setting the command buffer to use the default left-hand side topology. These changes are intended to optimize GPU execution workflows.
- pull/30070, pull/30132
- Nvshmem communicator team change for NVIDIA GPU communication: One pull request proposes changing the communicator team from "node" to "shared" to allow collective operations across a larger nvlink domain, enhancing performance for NVIDIA GPU communication. This change supports more efficient communication beyond physical node boundaries.
- pull/30077
- Cudnn-frontend version update for CUDA 13 compatibility: A pull request suggests updating the cudnn-frontend to version 1.13.0 to ensure compatibility with CUDA 13, reflecting necessary backend adjustments for newer CUDA versions.
- pull/30165
- Removal of deprecated XLA GPU graph level flags: One pull request proposes removing deprecated graph level flags in favor of using the
xla_gpu_enable_command_buffer
flag, aiming to streamline flag usage in the XLA GPU backend. This change was not merged. - pull/30187
- Fix for Bazel query pre-submit error related to oneAPI on XLA:GPU: A pull request attempts to fix a Bazel query pre-submit error encountered in the Google OpenXLA repository related to oneAPI on the XLA:GPU platform, but this fix was not merged.
- pull/30249
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
shawnwang18 | 32 | 10 | 0 | 0 |
mraunak | 14 | 3 | 0 | 1 |
Copilot | 0 | 0 | 0 | 11 |
Arech8 | 4 | 1 | 0 | 5 |
terryysun | 4 | 1 | 0 | 2 |
pemeliya | 6 | 1 | 0 | 0 |
chaserileyroberts | 4 | 2 | 0 | 0 |
othakkar | 3 | 1 | 0 | 2 |
Zoey-Cheng | 5 | 1 | 0 | 0 |
beckerhe | 2 | 0 | 0 | 4 |