Weekly GitHub Report for Xla: February 16, 2026 - February 23, 2026 (17:34:20)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
- [GPU] [STAT:AWAITING RESPONSE FROM CONTRIBUTOR] GPU hang with Triton kernel using Warp Specialization: This issue describes a GPU hang occurring when running a Triton kernel that uses warp specialization, specifically triggered by a loop with an upper bound of 2 in the kernel code. The user provides a minimal reproducible example and notes that the hang disappears if the loop upper bound is reduced to 1 or if the warp specialization attribute is removed, highlighting a potential bug in the Triton kernel implementation on the GPU backend.
- The comments include a request for reproduction steps, which the user provides in detail, outlining how to clone the repository, checkout the relevant branch, run a Docker container, configure the build, and execute the test that triggers the hang, confirming the issue is reproducible but the test never completes.
- Number of comments this week: 2
Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
As of our latest update, there are no stale issues for the project this week.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 1
Summarized Issues:
- GPU hang with Triton kernel warp specialization: A GPU hang bug occurs when running a Triton kernel with warp specialization enabled, specifically triggered by a certain loop upper-bound and warp specialization attribute. This issue does not reproduce with the Python version of Triton or when these conditions are changed, indicating a narrow and specific cause for the hang.
- issues/38082
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 2
Summarized Issues:
- Segmentation Faults and Crashes: Multiple JAX tests are experiencing segmentation faults linked to a specific pull request, with crashes occurring due to FFI type handling issues in the GPU transpose plan cache and execution state destructors. These faults disrupt normal GPU operations and indicate underlying stability problems in the system's memory management during execution.
- issues/37752
- GPU Autotuning Cache Persistence: The
xla_gpu_per_fusion_autotune_cache_diroption fails to persist autotune computations, causing non-deterministic compilation results that differ from the file-based alternative. This inconsistency impacts reproducibility and reliability in GPU autotuning processes. - issues/37902
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 13
Key Open Pull Requests
1. Upgrade to bazel 8 and turn on Bzlmod by default: This pull request upgrades the project to Bazel 8, enables Bzlmod by default, and includes various fixes and updates to support the new build system and toolchains.
- URL: pull/37923
- Associated Commits: 38db4, 062ce, 064fa, a8e5e, a345d, aee69, fbdc7, e44ea, 6a62e, 7b472, 774d6, 9aec2, 9a98c
2. [ROCm] Support hipblaslt group-gemm: This pull request adds support for HipBlasLT-Ext GroupedGemm in the RaggedDot operation for matrices with and without batch dimensions across three ragged modes, extends the autotuner to handle group-gemm configurations with consistent group-size inputs, and includes performance improvements demonstrated by benchmarks on MI300 hardware, while also providing a user flag to enable or disable this feature due to current limitations in datatype support and optimization.
- URL: pull/38088
3. [xla:gpu] Use Command::Walk APIs to collect buffer uses and command properties: This pull request updates the XLA GPU backend to use the Command::Walk APIs for collecting buffer uses and command properties, ensuring consistent semantics for buffer_uses in preparation for unifying the Command and Thunk components.
- URL: pull/37903
- Associated Commits: de2b0
Other Open Pull Requests
- BoringSSL Dependency Update: This pull request updates the BoringSSL dependency to the latest release version 0.20260211 to resolve build issues caused by the previously obsolete version. The update addresses problems reported by the ArchLinux community to ensure smoother builds.
pull/37924
- GPU Executable Optimization and Deadlock Fixes: These pull requests optimize GPU executable performance by reducing unnecessary synchronization after initial NCCL initialization and fix a deadlock issue in the GPU clique cache by evicting sibling sub-cliques to maintain symmetric cache state. Together, they improve GPU runtime efficiency and stability during collective operations.
pull/37936, pull/38105
- ROCm Autotune and Test Enhancements: This pull request fixes invalid split_k and block_k configurations in the ROCm autotune search space and adds a regression test to ensure compatibility constraints are met. Another pull request improves ROCm test coverage by including ROCm-specific parameters and removing hardcoded CUDA expectations.
pull/37992, pull/38156
- GitHub Actions Workflow Refactor: This automated pull request refactors the project's GitHub Actions workflow to comply with the latest internal standards, facilitating an upgrade managed by the GHSS team. The update ensures the CI/CD pipeline remains current and maintainable.
pull/38001
- XLA PJRT GPU Backend Network Topology Update: This pull request updates the XLA PJRT GPU backend to pass network nodes to the LocalTopologyProto, enabling the coordinator process to perform network-topology-optimized global device assignment. This enhancement improves device assignment efficiency based on network topology.
pull/38009
- Memory Space Correction in GPU Collective Ops Test: This pull request updates the collective_ops_ffi_test to ensure output results are returned in the default device memory space rather than the collective memory space. This prevents users from receiving results in an incorrect memory space, improving test accuracy.
pull/38107
- Host Runtime Debugging Enhancements: This pull request adds
TraceMeannotations formatted for compatibility withXLA_LOG_DEVICEwithin theCommonPjRtClientand the XLA:GPU-specificStreamExecutorclient. These annotations embed important metadata into each trace, enhancing debuggability of the host runtime.
pull/38110
- AsyncWorkRunner API Unification: This pull request unifies the AsyncWorkRunner API with the existing tsl::Executor by migrating PJRT to use the standard API. This enables a single implementation of ExecuteWhenReady that leverages the common RunWhenReady method, improving consistency and enabling future cleanup.
pull/38135
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 43
Key Closed Pull Requests
1. [XLA:GPU][oneAPI] Fix platform error in stream executor tests with SYCL backend: This pull request addresses and fixes the platform error encountered in stream executor tests when using the SYCL backend by ensuring the TENSORFLOW_USE_SYCL macro is properly applied in the hermetic build environment, thereby resolving the issue of the missing registered platform named "cuda" during test execution.
- URL: pull/37235
2. [ROCm] suppress cuda error messages on ROCm when doing training: This pull request aims to suppress harmless CUDA-related error messages that appear during training on ROCm platforms by preventing redundant plugin registration warnings, aligning ROCm's behavior with the existing suppression on CUDA.
- URL: pull/36882
3. Build and use xxd from source: This pull request introduces building and using the xxd tool from source to eliminate the only non-hermetic, non-POSIX dependency in the PJRT plugins build process, thereby enabling fully remote and hermetic builds on a vanilla Debian Docker image.
- URL: pull/37816
Other Closed Pull Requests
- GitHub Actions Workflow Refactor: Multiple pull requests propose an automated refactor of the project's GitHub Actions workflow to align with the latest standards specified in the internal guideline b/485167538. These changes aim to facilitate an upgrade process that may be force merged by the GHSS team if not accepted voluntarily, ensuring compliance and modernization of the CI/CD pipeline.
- pull/37944, pull/37945, pull/37946, pull/37948, pull/37949, pull/37951, pull/37952, pull/37953, pull/37955, pull/37957, pull/37958, pull/37959, pull/37960, pull/37961, pull/37962, pull/37963, pull/37956
- ROCm GPU Backend Enhancements: Several pull requests improve ROCm GPU support by enabling native FP8 Triton-generated GEMM operations on AMD MI300/MI355 GPUs, propagating per-kernel register spill information to the autotuner, and adding platform detection with ROCm-specific device configurations to expand unit test coverage. These changes enhance performance, compatibility, and testing robustness for ROCm platforms.
- pull/37573, pull/37629, pull/37732
- XLA:GPU Feature Improvements: Pull requests introduce support for sub-byte types in DynamicMemcpyFusion by enabling memcpy operations with byte-aligned strides and add functionality to support setting and getting state in all execution stages of the FFI handler. These enhancements improve memory operation flexibility and state management across XLA executions.
- pull/37769, pull/37783
- New Container and Benchmarking: A pull request introduces
tsl::UniqueAny, anstd::any-like container designed to support move-only types, accompanied by benchmark results demonstrating its performance. This addition provides a new utility container for improved type handling in the codebase. - pull/37821
- GPU Collectives and Distributed Topology Optimization: One pull request proposes increasing the maximum device communication buffer size in GPU collectives to prevent future compilation failures, while another optimizes global device ID assignment in distributed XLA by incorporating network topology awareness to reduce spine switch traffic and cleans up related compilation warnings. These changes aim to improve scalability and efficiency in distributed GPU operations.
- pull/37893, pull/37906
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| ezhulenev | 44 | 8 | 0 | 0 |
| alekstheod | 48 | 3 | 0 | 0 |
| benknutson-google | 29 | 0 | 0 | 0 |
| google-admin | 0 | 29 | 0 | 0 |
| meteorcloudy | 13 | 1 | 0 | 0 |
| leo-amd | 12 | 1 | 0 | 0 |
| nurmukhametov | 7 | 3 | 0 | 0 |
| mfrancepillois | 6 | 2 | 0 | 0 |
| Eetusjo | 5 | 3 | 0 | 0 |
| terryysun | 5 | 2 | 0 | 0 |