Weekly Project News

Archives

Weekly GitHub Report for Xla: February 15, 2026 - February 22, 2026 (14:50:47)

Weekly GitHub Report for Xla

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

No recent version releases were found.

1.2 Version Information:

Please provide the version release information you would like me to analyze and summarize.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

As of our latest update, there are no active issues with ongoing comments this week.

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

As of our latest update, there are no stale issues for the project this week.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 1

Summarized Issues:

  • GPU hang with warp specialization: A GPU hang bug occurs when running a Triton kernel with warp specialization enabled on an RTX5090, specifically triggered by a loop upper-bound of 2 combined with the warp specialization attribute. The hang does not occur if either the loop upper-bound or the warp specialization attribute is changed or removed, indicating a precise condition for the issue.
  • issues/38082

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 2

Summarized Issues:

  • Segmentation Faults and Crashes: Multiple JAX tests are experiencing segmentation faults linked to a specific pull request, with crashes occurring due to FFI type handling issues in the GPU transpose plan cache and execution state destructors. These faults disrupt the stability of GPU operations and complicate debugging efforts.
  • issues/37752
  • GPU Autotuning Cache Persistence: The xla_gpu_per_fusion_autotune_cache_dir option fails to persist autotune computations, resulting in non-deterministic compilation outcomes compared to the file-based alternative. This inconsistency undermines reproducibility in GPU autotuning processes.
  • issues/37902

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 12

Key Open Pull Requests

1. Upgrade to bazel 8 and turn on Bzlmod by default: This pull request upgrades the project to Bazel 8, enables Bzlmod by default, and includes various fixes and updates to support the new build system and toolchains.

  • URL: pull/37923
  • Associated Commits: 38db4, 062ce, 064fa, a8e5e, a345d, aee69, fbdc7, e44ea, 6a62e, 7b472, 774d6, 9aec2, 9a98c

2. [ROCm] Support hipblaslt group-gemm: This pull request adds support for HipBlasLT-Ext GroupedGemm in the RaggedDot operation for matrices with and without batch dimensions, implements three ragged modes, extends autotuner capabilities for group-gemm configurations, and includes performance improvements and unit tests to validate correctness and efficiency on AMD ROCm hardware.

  • URL: pull/38088
  • Associated Commits: 1771c, 85916, 01d07, 29249

3. [xla:gpu] Use Command::Walk APIs to collect buffer uses and command properties: This pull request updates the XLA GPU backend to use the Command::Walk APIs for collecting buffer uses and command properties, ensuring consistent semantics for buffer_uses in preparation for unifying the Command and Thunk components.

  • URL: pull/37903
  • Associated Commits: de2b0

Other Open Pull Requests

  • BoringSSL library update: This pull request updates the BoringSSL library to the latest release version 0.20260211 to resolve build issues caused by the previously obsolete version. The update addresses problems reported by the ArchLinux community.
    pull/37924
  • GPU executable optimization and clique cache fixes: These pull requests optimize GPU executable performance by reducing rendezvous operations and fix deadlock issues in the GPU clique cache by evicting sibling sub-cliques when a parent communicator mismatch occurs. Together, they improve synchronization efficiency and maintain symmetric cache state for proper collective participation.
    pull/37936, pull/38105
  • ROCm autotune search space correction: This pull request fixes invalid split_k and block_k configurations in the ROCm autotune search space by correcting the maximum split_k calculation and skipping block_k values that are too large. The fix addresses issues observed on AMD GPUs in CPX mode and is verified with added regression tests.
    pull/37992
  • GitHub Actions workflow refactor: This automated pull request refactors the project's GitHub Actions workflow to comply with the latest internal standards specified in guideline b/485167538. The upgrade is managed by the GHSS team to ensure continued compliance and maintainability.
    pull/38001
  • XLA PJRT GPU backend network topology update: This pull request updates the XLA PJRT GPU backend to pass network nodes to the LocalTopologyProto, enabling the coordinator process to perform network-topology-optimized global device assignment. This enhancement improves device assignment efficiency based on network topology.
    pull/38009
  • XLA GPU output memory space fix: This pull request updates the collective_ops_ffi_test in the XLA GPU code to ensure output results are returned in the default device memory space rather than the collective memory space. This change aligns the output behavior with user-facing expectations.
    pull/38107
  • Host runtime debuggability improvements: This pull request enhances the debuggability of the host runtime in the XLA:GPU project by adding TraceMe annotations compatible with XLA_LOG_DEVICE in both the CommonPjRtClient and the XLA:GPU-specific StreamExecutor client. These annotations embed crucial metadata into each trace for better runtime analysis.
    pull/38110
  • AsyncWorkRunner API unification: This pull request unifies the AsyncWorkRunner API with the existing tsl::Executor by migrating PJRT to use the standard API. This enables a single implementation of ExecuteWhenReady that leverages the common RunWhenReady method, improving consistency and facilitating future cleanup.
    pull/38135

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 49

Key Closed Pull Requests

1. [xla:gpu] Add a HangWatchdog to detect deadlocks in XLA:GPU execution: This pull request introduces a HangWatchdog safety mechanism to detect and abort deadlocked XLA:GPU executions, including additional checks, fixes for race conditions, and stress tests to ensure robustness.

  • URL: pull/37789
  • Associated Commits: 75c9a, 6da6d, 96f17, 711d9
  • Associated Commits: 75c9a, 6da6d, 96f17, 711d9

2. [XLA:GPU][oneAPI] Fix platform error in stream executor tests with SYCL backend: This pull request addresses and fixes the platform error encountered in stream executor tests when using the SYCL backend by ensuring the TENSORFLOW_USE_SYCL macro is properly applied in the hermetic build environment, thereby resolving the issue of the missing registered platform named "cuda" during test execution.

  • URL: pull/37235
  • Associated Commits: d18e8, 202ee, 32cc6
  • Associated Commits: d18e8, 202ee, 32cc6

3. [ROCm] Add option to disable automatic solib and rocm rpaths adding to the final library: This pull request introduces a new toolchain feature called no_solib that allows disabling the automatic addition of solib and ROCm runtime library rpaths to the final library, which is necessary for building ROCm JAX plugins with custom rpath settings for the wheels.

  • URL: pull/37513
  • Associated Commits: d87f5, a36a5, 2d35d
  • Associated Commits: d87f5, a36a5, 2d35d

Other Closed Pull Requests

  • GitHub Actions automated refactor: Multiple pull requests propose an automated refactor of the project's GitHub Actions workflows to align with the latest standards specified in the internal guideline b/485167538. These changes aim to facilitate an upgrade process that may be force merged by the GHSS team if not accepted voluntarily, ensuring standardized and up-to-date workflow configurations.
  • [pull/37944, pull/37945, pull/37946, pull/37948, pull/37949, pull/37951, pull/37952, pull/37953, pull/37955, pull/37956, pull/37957]
  • ROCm platform improvements and fixes: Several pull requests enhance ROCm support by suppressing redundant CUDA-related error messages, enabling native FP8 Triton-generated matrix multiplications, improving register spill information propagation for autotuning, fixing compilation pipeline pass nesting, and adding ROCm-specific device configurations to tests. These changes collectively improve ROCm compatibility, performance, and test coverage while resolving related bugs and warnings.
  • [pull/36882, pull/37573, pull/37629, pull/37732, pull/37854]
  • Collective operations and device communication enhancements: Pull requests address issues in collective pipeline parallelism tests, increase the maximum device communication buffer size to prevent future compilation failures, and optimize global device ID assignment by incorporating network topology information. These updates improve correctness, scalability, and efficiency of collective operations on GPU platforms.
  • [pull/36700, pull/37876, pull/37906]
  • Code and build process improvements: One pull request introduces building and using the xxd tool from source to remove a non-hermetic, non-POSIX dependency in the PJRT plugins build process, enabling fully remote and hermetic builds on a vanilla Debian Docker image. Another pull request improves logging readability in the xla:gpu HangWatchdog component without functional changes.
  • [pull/37816, pull/37876]
  • DynamicMemcpyFusion and FFI handler state management: A pull request adds support for sub-byte types in DynamicMemcpyFusion on XLA:GPU by enabling memcpy operations with byte-aligned strides and includes tests for validation. Another pull request adds functionality to set and get state in all execution stages of the FFI handler, supporting both shared per-instance and per-execution state lifetimes.
  • [pull/37769, pull/37783]
  • New container implementation: A pull request introduces tsl::UniqueAny, an std::any-like container designed to support move-only types, accompanied by benchmark results demonstrating its performance.
  • [pull/37821]

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
ezhulenev 49 10 0 0
alekstheod 48 4 0 0
benknutson-google 29 0 0 0
google-admin 0 29 0 0
meteorcloudy 13 1 0 0
leo-amd 12 1 0 0
nurmukhametov 7 3 0 0
mfrancepillois 6 2 0 0
terryysun 5 2 0 0
Eetusjo 4 2 0 0

Access Last Week's Newsletter:

  • Link
Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.