Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Xla: September 15, 2025 - September 22, 2025 (12:02:31)

Weekly GitHub Report for Xla

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

No recent version releases were found.

1.2 Version Information:

Please provide the version release information you would like me to analyze and summarize.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. No SoL config found for device NVIDIA H200.: This issue reports warning messages appearing after upgrading to jax version 0.7.2, indicating that no Speed-of-Light (SoL) configuration is found for the NVIDIA H200 device, causing XLA to use a default configuration. The user inquires whether it is normal for XLA to lack this specific config and seeks guidance on how to handle or customize the warnings related to the analytical GPU cost/latency model.

    • The maintainers clarify that the warning is expected because internal data for the H200 device is unavailable, so the system falls back to H100 parameters, which may slightly underestimate networking speed but is generally acceptable. They suggest lowering the warning level, disabling the new cost model, or tuning the NIC speed via environment variables, and the user follows up asking for the appropriate NIC speed value to set for an 8 H200 node connected through NVLink at 900Gbps.
    • Number of comments this week: 2

Since there were fewer than 5 open issues, all of the open issues have been listed above.

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. New nvshmem rule breaks the build: This issue reports a build failure caused by a new nvshmem rule introduced in a recent update, which leads to an error related to the absence of a getenv method in the repository_ctx object during the CUDA configuration step. The reporter is seeking guidance on whether they need to make changes on their side to resolve this problem or if it requires a fix within the open_xla project, specifically regarding the timing and details of addressing the cuda_configure rule.
  2. Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported MLIR into an XLA binary using XLA AOT compilation but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit.
  3. support bazel modules: This issue requests the adoption of Bazel modules within the project, highlighting that Bazel modules have seen significant usage and support in the community. The reporter points out that XLA is currently the only package in their Bazel build that does not support Bazel modules, implying a need for compatibility improvements.
  4. Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update to the lowLatencyBandwidth for AMD links was not consistently applied to the CUDA section, causing failures when the model is called with H100 settings. Specifically, the change multiplies the per_link bandwidth by eight for AMD but neglects to make a corresponding update for CUDA, leading to incorrect performance modeling in that context. Since there were fewer than 5 open issues, all of the open issues have been listed above.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 2

Summarized Issues:

  • Device Configuration Warnings: Users upgrading to jax version 0.7.2 encounter warning messages indicating that no Speed-of-Light (SoL) configuration is found for the NVIDIA H200 device in XLA, causing the system to fall back to default or H100 configurations. The issue seeks clarification on whether this fallback behavior is expected and how to properly configure or suppress these warnings.
  • issues/31481
  • Command-line Help Text Truncation: The help text output for XLA command-line options in certain tools is being truncated unexpectedly, which prevents users from seeing the full list of available options. It was identified that changing the logging method in the code can restore the complete output, resolving the truncation problem.
  • issues/31486

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 1

Summarized Issues:

  • ROCm Test Failures and Deadlocks: This issue involves a test failure and hanging problem on ROCm 7.0.2 during a distributed layer normalization MLP test on the TransformerEngine branch release_v2.1_rocm. The problem is caused by a CUDA error related to peer access already being enabled, resulting in a deadlock that cannot be interrupted by Ctrl-C, while the test passes on ROCm 6.4.1.
  • issues/31344

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 13

Key Open Pull Requests

1. Support building with Bzlmod: This pull request adds support for building the XLA project using Bzlmod by introducing common Bazel dependencies as modules, incorporating some C++ dependencies with module extensions under third_party/extensions, and enabling most tests to pass with the bazel test --config=bzlmod //xla/tests:all command.

  • URL: pull/31477
  • Merged: No
  • Associated Commits: 3f94b, dba73, 2dfd2, 76b96, 2fdae

2. [DOC] operation_semantics HLO specifications update: This pull request updates the operation semantics HLO specifications by reorganizing operations alphabetically to align with StableHLO, adding new operations consistent with hlo_opcode.h, refining links to StableHLO variants and relevant headers, removing certain grouped sections in favor of individual headings with detailed semantics, and improving formatting, spelling, and table consistency for enhanced documentation clarity.

  • URL: pull/31550
  • Merged: No
  • Associated Commits: 6b53b, 1e7b6, ea263, fc298, 3427f

3. [XLA:GPU] Adding more debug support for command_buffer_conversion_pass: This pull request adds enhanced debug support to the command_buffer_conversion_pass in the XLA GPU backend to help identify which thunks are not lowered to command buffers and understand the reasons behind it.

  • URL: pull/31472
  • Merged: No
  • Associated Commits: 8f21d, e7c45, ff0bf

Other Open Pull Requests

  • ROCm support and fixes: Multiple pull requests improve ROCm platform support by fixing build issues, correcting hardcoded properties, removing unnecessary reserved memory allocations, and restoring hipblaslt support in GEMM autotuning. These changes enhance ROCm compatibility and maintainability, including updates to test scripts and error handling for successful test execution.
  • pull/31348, pull/31374, pull/31386, pull/31387, pull/31409, pull/31493
  • GPU backend and collective operations enhancements: Updates to the XLA GPU backend include adding NVLink domain connectivity checks for NVSHMEM backend usage and optimizing GPU collectives by replacing sub-byte type operations with signed 8-bit integer operations. These improvements reduce data movement and extend backend functionality to multi-device NVLink domains.
  • pull/31375, pull/31525
  • oneDNN Convolution support in XLA CPU runtime: This pull request enables oneDNN Convolution operations by updating the thunk emitter to generate OneDnnOpThunk and adding execution support for these operations. This integration allows the XLA:CPU runtime to leverage oneDNN custom call rewrites for convolution workloads.
  • pull/31511
  • Thread safety fix for Literal::Storage: A race condition is addressed by synchronizing access to the data_ field of Literal::Storage, preventing concurrent thread conflicts during data manipulation. This fix improves the stability and correctness of data handling in multi-threaded contexts.
  • pull/31533

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 6

Key Closed Pull Requests

1. [GPU] Move bitcast-convert expansion past layout assignment.: This pull request proposes moving the bitcast-convert expansion step to occur after layout assignment in GPU code, enabling the post-layout assignment algebraic simplifier to first replace no-op bitcast-converts with simpler bitcasts.

  • URL: pull/31340
  • Merged: No
  • Associated Commits: 4a757

2. [XLA:GPU] command buffer ChildCmd and WhileCmd support multiple device: This pull request addresses a bug fix by modifying the command buffer implementation to support multiple devices for ChildCmd and WhileCmd in the SPMD configuration, ensuring that the child_command_buffer_ object is maintained per device.

  • URL: pull/31341
  • Merged: No
  • Associated Commits: 915f2

3. [XLA:GPU] Add some VLOG debug prints for the summary results of command_buffer_scheduling pass: This pull request proposes adding VLOG debug print statements to provide summary results for the command_buffer_scheduling pass in the XLA:GPU component, aiming to enhance debugging without including unit tests or documentation changes.

  • URL: pull/31354
  • Merged: No
  • Associated Commits: acd13

Other Closed Pull Requests

  • Documentation updates for operation_semantics: This pull request enhances the documentation of operation_semantics by updating tables and signatures for multiple operations including AllGather, AllReduce, and others. It adds detailed information such as shard count, layout, use_global_device_ids, and links to the StableHLO specification to improve clarity and completeness.
  • pull/31366
  • Layout assignment for bitcast-convert operands: This pull request fixes layout assignment issues by ensuring consistent layout between input and output for bitcast-convert operands, especially when type width conversions add or remove a dimension without transposing preserved dimensions. It addresses a previously unhandled case in output to input layout propagation.
  • pull/31395
  • Suppress NVML library loading error message: This pull request improves user experience by suppressing the default NVML library loading error message when not running on MNNVL nodes. It also adds actionable suggestions to help users resolve the error, reducing confusion.
  • pull/31457

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
shawnwang18 21 4 0 3
othakkar 14 4 0 7
athurdekoos 18 3 0 0
sergachev 13 5 0 1
mraunak 8 1 0 0
sergey-kozub 6 2 0 0
penpornk 1 0 0 7
sfvaroglu 4 3 0 0
mgoldfarb-nvidia 7 0 0 0
ScXfjiang 4 3 0 0

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.