Weekly Project News

Archives
November 24, 2025

Weekly GitHub Report for Xla: November 17, 2025 - November 24, 2025 (12:01:38)

Weekly GitHub Report for Xla

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

No recent version releases were found.

1.2 Version Information:

Please provide the version release information you would like me to analyze and summarize.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

As of our latest update, there are no active issues with ongoing comments this week.

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. New nvshmem rule breaks the build: This issue reports a build failure caused by a new nvshmem rule introduced in a recent update, which leads to an error related to the absence of a getenv method in the repository_ctx object during the CUDA configuration step. The reporter is seeking guidance on whether any changes are needed on their side to resolve this problem, particularly in relation to recent pull requests affecting JAX, and is also inquiring about the timeline for a fix from the open_xla project if the issue originates there.
  2. Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported MLIR into an XLA binary using XLA AOT compilation but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit.
  3. support bazel modules: This issue requests the adoption of Bazel modules within the project, highlighting that Bazel modules have gained significant usage and support. The reporter points out that XLA is currently the only package in their Bazel build that does not support these modules, implying a need for compatibility improvements.
  4. Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update to lowLatencyBandwidth for AMD links was not applied to the CUDA section, causing failures when the model is called with H100 settings. Specifically, the inconsistency in bandwidth configuration leads to errors in performance modeling for GPU collectives on CUDA-enabled devices.
  5. Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user reports that despite using the --config=cross_compile_linux_arm64 flag, the Bazel build system continues to produce an x86 binary, indicating a possible misconfiguration or missing step in the cross-compilation process.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 2

Summarized Issues:

  • XLA Backend Bugs: This topic covers issues related to incorrect behavior in the XLA CPU backend, including a bug where constant-folding involving gather and add operations changes the memory layout from row-major to column-major, leading to unexpected results in JAX's compiled functions. These bugs affect the correctness of computations and the expected memory layout during execution.
  • issues/34260
  • Performance Issues in JAX Convolution: This topic addresses significant performance degradation in JAX's convolution operation, which is reported to be over 30 times slower than PyTorch due to XLA selecting a suboptimal convolution implementation. The issue includes detailed reproduction code and benchmarks highlighting the performance discrepancy.
  • issues/34273

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 0

Summarized Issues:

As of our latest update, there were no issues closed in the project this week.

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 20

Key Open Pull Requests

1. [ROCm] Add support for rocm tar/wheels in hermetic builds: This pull request adds support for Python wheels as a hermetic ROCm dependency in hermetic builds to enable Jax to match the wheels when setting up its dependencies.

  • URL: pull/34049
  • Merged: No
  • Associated Commits: a568e, 66b21, ec2a8, 378ba, c42ca, a8671, 898ea, 3b3f2, 882c2, 21fdb, 1dbd2, f44de, b78af, 45023, 47489, 27807, 6ed67

2. Terryysun/a2a s curve: This pull request adds all-to-all communication support to the S-curve model, resulting in an 11.78% performance improvement for models with cross-NVL domain all-to-all, and includes new unit and execution tests to ensure accuracy and proper communication-compute overlap.

  • URL: pull/34143
  • Merged: No
  • Associated Commits: 794ef, 4f85d, 1dc94, 38d02, 2329f, 13d44

3. [ROCm] Include multigpu tests: This pull request aims to include multigpu tests in the ROCm continuous integration command to enhance testing coverage for multi-GPU setups.

  • URL: pull/34112
  • Merged: No
  • Associated Commits: 39c0a, 3c23b

Other Open Pull Requests

  • GPU backend improvements and fixes: Multiple pull requests enhance GPU backend functionality by introducing SYCL stream support for the XLA GPU oneAPI backend and fixing AMD GPU stack allocation errors through correct address space usage and optimization of allocas. Additionally, a deadlock issue in NVIDIA GPU communication is resolved by adjusting communicator initialization behavior to prevent hangs during collective operations.
    pull/34137, pull/34196, pull/34215
  • cuDNN and CUDA related enhancements: Several pull requests upgrade cuDNN support by enabling fusion for GEMM operations with double-precision and complex types, adding runtime version checks, and proposing an upgrade to cuDNN frontend version 1.16.0. A new C API plugin attribute cuda_version is introduced to query the CUDA runtime version, and a workaround for convolution graphs in cuDNN is removed to improve performance.
    pull/34035, pull/34047, pull/34110, pull/34227
  • Build and test environment fixes: Fixes include adding a missing rocm_config dependency to the Bazel build configuration to resolve build errors and modifying the lit test execution environment to include runfiles, ensuring ROCm libraries are available during testing and preventing gpu_test_correctness failures.
    pull/34156, [pull/34150](https://github.com/pull/34150]
  • Kernel and performance optimizations: A pull request introduces grid-stride loops to buffer_comparator and redzone checker kernels to fix kernel launch failures on large inputs, removes ROCm-specific hacks, and addresses an ASAN issue. Another PR renames warp to shmem_group in PackedTranspose to fix AMD GPU performance regressions by recalculating thread counts and updating tests accordingly.
    pull/34247, [pull/34173](https://github.com/pull/34173]
  • Bug fixes in memory and tiling logic: Fixes include correcting an ASAN heap-buffer-overflow error on ROCm caused by invalid literal copying with dynamic shapes and resolving a bug in the TryFindBestTilingForFusion function by skipping tiles with infinite runtime to avoid suboptimal tile size selection and register spilling.
    pull/34246, [pull/34250](https://github.com/pull/34250]
  • Host offload utility enhancements: Two new utility functions are added to host_offload_utils to detect dynamic slice operations in host offload patterns, specifically identifying MoveToHost feeding DynamicUpdateSlice and MoveToDevice consuming DynamicSlice, accompanied by unit tests.
    [pull/34118](https://github.com/pull/34118]
  • cuDNN GEMM backend update: The cuDNN GEMM backend is updated to recognize and handle dot algorithms, which previously caused failures when non-default algorithms were requested, with unit tests added to verify this behavior.
    [pull/34163](https://github.com/pull/34163]

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 5

Key Closed Pull Requests

1. Fix comments for xla/hlo/ir/hlo_input_output_alias_config.h: This pull request aims to correct a typo in the comments for the xla::HloInputOutputAliasConfig::AliasKind within the xla/hlo/ir/hlo_input_output_alias_config.h file to improve documentation clarity.

  • URL: pull/34042
  • Merged: No
  • Associated Commits: 0193b

2. [GPU] Fix layout assignment of bitcast-converts.: This pull request addresses a bug fix for GPU by ensuring that "mandatory" compatible layouts are assigned simultaneously to both operands and outputs in bitcast-convert operations to prevent invalid layout changes during subsequent layout propagation.

  • URL: pull/34103
  • Merged: No
  • Associated Commits: f0ff6

3. [XLA:GPU] Fix cublas fallback test on Thor GPU (sm_110): This pull request aims to fix the failing StatelessAutotunerTest.CublasFallbackForBf16Bf16F32Algorithm test on Jetson Thor GPUs with compute capability 11.0.

  • URL: pull/34107
  • Merged: No
  • Associated Commits: 2385b

Other Closed Pull Requests

  • Device Specification Additions: This topic covers pull requests that propose adding new device specifications to the project. One such pull request introduces a specification for the Blackwell Ultra (B300) device, although it has not been merged.
  • pull/34116
  • Continuous Integration Enhancements: This topic includes pull requests aimed at improving the CI process through GitHub actions. A notable pull request proposes a main GitHub action to be used as a continuous integration gate, but it was not merged.
  • pull/34157

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
alekstheod 55 14 0 9
rao-ashish 2 1 0 17
sergachev 9 5 0 2
emilyfertig 0 0 0 12
mingxu1067 6 2 0 3
Copilot 0 0 0 11
shawnwang18 8 2 0 0
Tixxx 3 2 0 4
dimitar-asenov 0 0 0 8
dimvar 4 3 0 0

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.