Weekly GitHub Report for Xla: November 03, 2025 - November 10, 2025 (12:01:58)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
add argmax and argmin to stablehlo: This issue requests the addition of argmax and argmin operations to the stablehlo operation set, seeking community evaluation on the feasibility and benefits of this enhancement. The original poster references a related LLVM project issue and asks for consideration of these operations to potentially improve functionality.
- The comment raises concerns about the cost of adding new operations and asks for a detailed explanation of the necessity for argmax and argmin, including why existing operations are insufficient and how these additions would significantly benefit a wide range of problems.
- Number of comments this week: 1
Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- New
nvshmemrule breaks the build: This issue reports a build failure caused by a newnvshmemrule introduced in a recent update, which leads to an error related to the absence of agetenvmethod in therepository_ctxobject during the CUDA configuration step. The reporter is seeking guidance on whether they need to make changes on their side to resolve this error or if the problem requires a fix within the openxla project, specifically regarding the timing and details of potential updates to thecuda_configuresettings. - Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a Torch model to MLIR using the Torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported MLIR into an XLA binary using XLA AOT compilation but faces deserialization errors with StableHLO, despite using matching versions of Torch, TorchXLA, and building XLA from the corresponding commit.
- support bazel modules: This issue discusses the potential adoption of Bazel modules within the project, highlighting that Bazel modules have gained significant usage. The reporter notes that XLA is currently the only package in their Bazel build that does not support Bazel modules and inquires about plans to implement this support.
- Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update correctly adjusts the lowLatencyBandwidth for AMD links but fails to apply the corresponding update to the CUDA section. As a result, invoking the gpu_collective_performance model with H100 settings leads to a failure, indicating incomplete handling of bandwidth parameters for different GPU architectures.
- Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user reports that despite using the
--config=cross_compile_linux_arm64flag, the Bazel build system continues to produce an x86 binary, indicating a potential misconfiguration or missing step in the cross-compilation process.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 3
Summarized Issues:
- Operation Enhancements: This topic covers requests for new operations and features to improve the stablehlo operation set and CPU PJRT capabilities. One issue discusses adding argmax and argmin operations to stablehlo to evaluate their necessity and benefits, while another requests the implementation of collective_broadcast for CPU PJRT to support distributed training without multi-device hardware.
- [issues/33449, issues/33502]
- Build and Compilation Errors: This topic addresses problems encountered during the build process of ROCm device libraries, specifically errors related to copying input directories into the sandbox. The issue includes warnings about unsound dependency checking and a file existence error that causes compilation failure.
- [issues/33589]
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 1
Summarized Issues:
- MLIR-HLO Build Issues: This topic covers problems and ambiguities in the MLIR-HLO build instructions, including missing or unclear LLVM commit references and build configuration errors. It also questions whether the project should remain self-contained or update to use LLVM from the third_party directory, noting that MLIR-HLO is intended to be eventually removed.
- issues/33472
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 7
Key Open Pull Requests
1. [WIP ROCm] Introduce rocm_ci github action: This pull request introduces a new GitHub action for the ROCm continuous integration (CI) job to replace the existing Jenkins job, aiming to improve response times on pull requests.
- URL: pull/33531
- Merged: No
2. [ROCm] Fix rocm device lib failure due to clash in include dirs: This pull request addresses a build failure in the ROCm device library by resolving a conflict caused by including the same directory twice in the include paths of a genrule, thereby fixing the issue reported in GitHub issue #33589.
- URL: pull/33656
- Merged: No
3. [DOC] added memory space identifiers: This pull request proposes adding a new subsection on memory space identifiers along with a brief mention of memory space in the introduction to enhance the project's documentation.
- URL: pull/33448
- Merged: No
- Associated Commits: e4ca0
Other Open Pull Requests
- ROCm Configuration and Dependency Management: Several pull requests focus on improving ROCm support by unifying build configurations to use only Clang and removing unused GCC Bazel settings, simplifying dependency management by switching from Debian packages to hermetic tarball distributions. These changes aim to reduce configuration complexity, minimize OS-specific variations, and facilitate support for newer ROCm versions.
- pull/33534, pull/33714
- Performance Optimization for NCCL Collectives: One pull request introduces a mechanism to configure NCCL to minimize streaming multiprocessor utilization during multiple asynchronous collective operations, improving resource overlap by offloading traffic to copy engines or networking hardware. This optimization results in approximately a 7% speedup on H100 GPUs with NCCL 2.28.
- pull/33613
- ROCm Trace Event Configuration: A pull request adds a configurable flag to set the maximum number of ROCm trace events in XLA, defaulting to 4 million, and triggers a warning when the trace event count reaches this limit. This feature helps manage trace event volume and maintain performance monitoring effectiveness.
- pull/33666
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 11
Key Closed Pull Requests
1. [WIP do not merge] Ci fix jax build test failures: This pull request is a work-in-progress effort aimed at fixing continuous integration test failures related to JAX unit tests by addressing build and dependency issues, particularly for the ROCm platform, although it was not merged.
- URL: pull/33510
- Merged: No
- Associated Commits: 10464, c61d2, 9c4cf, 31124, 492d1, 30c09, 3febc, 35e0c, 56968, 16ec7, 1944e, 74b4b, 06103, bae38, c01b3, 2e597, 73cb0, 64a37, 86d4a, 52ecd, be9da, 69b5e, eade4, aa5c0
2. Add a new version of ShapeUtil::ByteStrides() and fix its use in the GPU compiler.: This pull request introduces a new version of the ShapeUtil::ByteStrides() function and corrects its usage in the GPU compiler to disable incorrect GPU copy fusions on packed sub-byte types caused by improper stride calculations, thereby fixing the GPU DynamicMemcpyFusion issue and maintaining compatibility with existing PJRT behavior through the addition of UnpackedByteStrides().
- URL: pull/33464
- Merged: No
3. [XLA:GPU] Dump command buffer contents to folder specified by --xla-dump-to through dump.h: This pull request proposes migrating the dumping of command buffer contents to use dump.h, enabling the contents to be saved to a folder specified by the --xla-dump-to flag.
- URL: pull/33505
- Merged: No
Other Closed Pull Requests
- ROCm and Triton compilation fixes: Multiple pull requests address issues related to ROCm and Triton compilation. One fixes the Triton compilation pipeline by adding a missing pass essential for thread dimension extraction, while another restores missing dependencies and mandates hipblaslt to fix hermetic builds for ROCm.
- pull/33671, pull/33681
- CUDA stream and NCCL kernel priority improvements: Two pull requests focus on CUDA stream and NCCL kernel priority handling. One refactors code to unify priority setting for CUDA streams and graph nodes, improving code reuse, and the other changes the default NCCL kernel priority in CUDA graph command buffers to the highest level to enhance performance.
- pull/33533, pull/33655
- Code safety and concurrency improvements: A pull request adds a mutex guard to the GetReadyFuture function to prevent race conditions and crashes caused by concurrent access. This change enhances the stability of the code under concurrent execution scenarios.
- pull/33489
- API and code clarity refactoring: One pull request refactors the PJRT C API header by redefining C structs with typedefs for better compatibility and clarity. Another improves error message clarity by formatting memory byte counts with commas and human-readable units without changing program behavior.
- pull/33470, pull/33504
- Documentation updates: A pull request updates documentation to note the deprecation of the mlir-hlo project, helping users avoid confusion by clearly indicating its deprecated status.
- pull/33513
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| alekstheod | 56 | 13 | 0 | 2 |
| rao-ashish | 2 | 1 | 0 | 17 |
| shawnwang18 | 11 | 8 | 0 | 0 |
| emilyfertig | 0 | 0 | 0 | 12 |
| mingxu1067 | 6 | 2 | 0 | 3 |
| mtsokol | 6 | 4 | 0 | 0 |
| sergachev | 5 | 2 | 0 | 3 |
| akuegel | 0 | 0 | 0 | 9 |
| Tixxx | 2 | 2 | 0 | 4 |
| mmakevic-amd | 7 | 0 | 0 | 0 |