Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Xla: September 29, 2025 - October 06, 2025 (12:01:39)

Weekly GitHub Report for Xla

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

No recent version releases were found.

1.2 Version Information:

Please provide the version release information you would like me to analyze and summarize.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

As of our latest update, there are no active issues with ongoing comments this week.

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. New nvshmem rule breaks the build: This issue reports a build failure caused by a new nvshmem rule introduced in a recent pull request, which leads to an error related to the absence of a getenv method in the repository_ctx object during CUDA configuration. The reporter is seeking guidance on whether they need to update their side to resolve this error, particularly in relation to changes mentioned for JAX, or if the fix must come from the open_xla project, along with an estimated timeline for such a resolution.
  2. Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported model with XLA AOT but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit, and has provided code snippets and bytecode samples to assist in troubleshooting.
  3. support bazel modules: This issue discusses the potential adoption of Bazel modules within the project, highlighting that Bazel modules have gained significant usage. It specifically points out that XLA is currently the only package in the user's Bazel build that does not support Bazel modules, suggesting a need for compatibility improvements.
  4. Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update correctly adjusts the lowLatencyBandwidth for AMD links but fails to apply the corresponding update to the CUDA section. As a result, invoking the gpu_collective_performance model with H100 GPU settings leads to a failure, indicating incomplete handling of bandwidth parameters across different GPU architectures.
  5. Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user reports that despite using the --config=cross_compile_linux_arm64 flag in the Bazel build system, the build process persistently tries to generate an x86 binary, indicating a possible misconfiguration or missing step in the cross-compilation setup.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 1

Summarized Issues:

  • HLO verifier and compilation failures: The newly introduced HLO verifier between pre-scheduling and post-scheduling stages causes compilation failures specifically for collective-permute operations involving mixed precision data types. These operations previously compiled successfully, raising concerns about whether this new behavior is intentional or a regression.
  • issues/32222

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 0

Summarized Issues:

As of our latest update, there were no issues closed in the project this week.

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 14

Key Open Pull Requests

1. Support building with Bzlmod: This pull request introduces support for building the project using Bzlmod, including various fixes, updates to dependencies, and improvements to build tools and configurations.

  • URL: pull/32055
  • Merged: No
  • Associated Commits: 73e8f, a4f8e, aeb4e, a59c9, 3a17e, 70d8c, 00100, ef0f8, eea96, 7afa1, 8da1c, 55098, b5e46, ddf7a, 8e43d, e808c

2. [Refactor] Completely Remove AsyncStreamKind: This pull request completely removes the AsyncStreamKind type and all its usages, replacing them with ExecutionStreamId-based logic to determine operation stream placement, while replicating previous behaviors through GetStreamIdOverride and adding new end-to-end execution tests as part of ongoing multi-stream collective work.

  • URL: pull/32217
  • Merged: No
  • Associated Commits: db259, 90e6e, 480c7, fb15f, 1c85c, 34b55, 1221a, d7549, fddd4, 73493, 9fc52, c8d1a, ba9c0, f9db6, ec58a, 785ad

3. [ROCm] fix rocm build xla tools hlo runner: This pull request aims to fix the ROCm build process for XLA tools by avoiding hardcoded shared object versions and resolving the build error of the multihost_hlo_runner component.

  • URL: pull/32002
  • Merged: No
  • Associated Commits: e6aab, f39a0, 14f39

Other Open Pull Requests

  • Convolution command buffer support in ROCm backend: This pull request adds support for command buffers specifically for convolution operations in the ROCm backend to reduce graph fragmentation by enabling graph capture only for explicitly listed convolution custom call targets. It also includes new unit tests and improvements to execution graph management.
    pull/32053
  • Caching and reuse of communicators for GPU cross-process transfers: These pull requests propose using the AcquireCollectiveCliques mechanism and modifying the PjRt API to cache and reuse communicators for cross-process device-puts on GPUs. The changes aim to improve performance by reducing redundant communicator creation during multiple transfers between the same device sets and by including global device ID information in key functions.
    pull/32076, pull/32074
  • Code organization and maintainability improvements: These pull requests focus on improving code clarity and maintainability by moving computation simplification methods from the command buffer scheduling component to a new library and merging multiple methods that query the fusion kind in the GPU codebase. These changes help streamline the codebase and reduce redundancy.
    pull/31994, pull/32003
  • Documentation enhancements for tiling: This pull request expands the tiling documentation by adding a Motivation section and details on tiling formats to improve clarity and completeness.
    pull/32107
  • CUDA and PTX version updates: This pull request updates the project to support PTX version 9.0 starting with CUDA 13.0 and includes a slight refactoring of the code.
    pull/32187
  • Platform-specific test and capability adjustments: These pull requests relax error specifications to enable the BitcastReduceWithStride1Tiling test to pass on the Spark platform, update compute capabilities to differentiate between Blackwell Edge GPUs, remove the IsAtLeastBlackwellPro method, and skip latency estimator tests on Edge GPUs to avoid crashes caused by the collective performance model.
    pull/32226, pull/32229
  • Forward convolution with dilation and heuristic improvements: This pull request introduces support for forward convolution operations with dilation and implements a basic heuristic to differentiate between forward and backward convolutions, resulting in significant performance improvements across various dilation rates.
    pull/32231
  • Gloo build compatibility fix: This pull request updates Gloo to use a specific commit that fixes build compatibility issues with GCC 15.
    pull/32240

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 6

Key Closed Pull Requests

1. Execute rbe tests locally: This pull request proposes switching the execution of remote build execution (RBE) tests to run locally because the current RBE solution is not yet ready to execute tests remotely.

  • URL: pull/32049
  • Merged: No
  • Associated Commits: 13d28, a7bb0, 08c9b, 9b74a, b03cd, c62e4, 53126, 4667e, ae2d3, d3f94, d0c29, 5be95, 497cf, 25316, 5b460, 109e1, 2e04d, 76eb7, 28f10, 6484d, ea4cd, 5c042, 55a8c, 7df1e, 7566a, 84d14, d13b3, 7bf45, 6ac14, de95c, 74854, f88a7, 1add4, 0cb54, 8fc19, b854d, b1f3e, 74101, 2fe5c, e8112, 68b4b, e03a8, 32eaf, c165e, 32e0c, d8c44, 5e7b4, cf65a, 60eb5, 7b708, 6fa7f, 3ed77, 10f52, 375a9, 6a540, 44f7d, 97dd5, 28b2d, 5af59, 07dce, 87e78, deadc, 2e279, f814b, 510ea, fa40e, fc9e3, 50860, efa9d, c424a, fb6dd, ff879, 8526f, e53f8, ea6de, d63b3, 3c534, 85114, b2a42, 683db, 51a7f, 13c3d, 6d8c7, 1851b, 8513f, 9c9ad, 7775d, d0ac0, f3e17, 9cfa7, 0015d, 4be9c, c8154, 4d3e0, cdfdf, 3b077, 85548, 1a0db, fe042, edab8, 09aee, fb048, 59501, 5597c, 9d358, 910f1, d65c2, ff74b, 0be8d, 67988, e36e9

2. [ROCm] make toolchain hermetic compatible with rbe for rocm CI: This pull request aims to make the hipcc toolchain hermetic and compatible with remote build execution (rbe) workflows for ROCm continuous integration, enabling support for distributed builds by using local toolchain files and relative paths.

  • URL: pull/32139
  • Merged: No
  • Associated Commits: c84a1, fa951, eb5a4

3. [XLA:CPU] Make rendezvous timeouts configurable via flags: This pull request proposes adding configurable flags to set rendezvous timeouts and warning delays for parallel CPU workloads in XLA, aligning default timeout values with those used for GPUs to better accommodate longer and unevenly distributed tasks.

  • URL: pull/32115
  • Merged: No
  • Associated Commits: baf8d, e6f28

Other Closed Pull Requests

  • ROCm Build Fixes: Multiple pull requests address build errors and linking issues specific to the ROCm platform. These changes ensure successful compilation and execution of XLA components on ROCm environments by fixing problems related to cupti_tracer unavailability and multihost_hlo_runner build errors.
  • pull/32009, pull/31990
  • Flexible Configuration Size in Custom Call Tests: One pull request removes the hardcoded configuration size in the GetSupportedConfigsFromCublasCustomCall function. This update allows the test to support different environment sizes, such as Spark's size of 9, by requiring the size to be at least 2 instead of a fixed value.
  • pull/32185

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
meteorcloudy 28 3 0 2
othakkar 15 6 0 9
alekstheod 21 4 0 0
amd-songpiao 7 4 1 2
terryysun 11 1 0 0
athurdekoos 9 2 0 0
sergachev 6 3 0 1
draganmladjenovic 5 3 0 2
ScXfjiang 7 2 0 0
rao-ashish 3 3 0 2

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.