Weekly GitHub Report for Xla: February 16, 2026 - February 23, 2026 (17:34:20)

Weekly GitHub Report for Xla: February 16, 2026 - February 23, 2026 (17:34:20)

        Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

[GPU] [STAT:AWAITING RESPONSE FROM CONTRIBUTOR] GPU hang with Triton kernel using Warp Specialization: This issue describes a GPU hang occurring when running a Triton kernel that uses warp specialization, specifically triggered by a loop with an upper bound of 2 in the kernel code. The user provides a minimal reproducible example and notes that the hang disappears if the loop upper bound is reduced to 1 or if the warp specialization attribute is removed, highlighting a potential bug in the Triton kernel implementation on the GPU backend.  
The comments include a request for reproduction steps, which the user provides in detail, outlining how to clone the repository, checkout the relevant branch, run a Docker container, configure the build, and execute the test that triggers the hang, confirming the issue is reproducible but the test never completes.
Number of comments this week: 2

Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 
As of our latest update, there are no stale issues for the project this week. 
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 1
Summarized Issues:

GPU hang with Triton kernel warp specialization: A GPU hang bug occurs when running a Triton kernel with warp specialization enabled, specifically triggered by a certain loop upper-bound and warp specialization attribute. This issue does not reproduce with the Python version of Triton or when these conditions are changed, indicating a narrow and specific cause for the hang.  
issues/38082

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 2
Summarized Issues:

Segmentation Faults and Crashes: Multiple JAX tests are experiencing segmentation faults linked to a specific pull request, with crashes occurring due to FFI type handling issues in the GPU transpose plan cache and execution state destructors. These faults disrupt normal GPU operations and indicate underlying stability problems in the system's memory management during execution.  
issues/37752

GPU Autotuning Cache Persistence: The xla_gpu_per_fusion_autotune_cache_dir option fails to persist autotune computations, causing non-deterministic compilation results that differ from the file-based alternative. This inconsistency impacts reproducibility and reliability in GPU autotuning processes.  
issues/37902

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 13
Key Open Pull Requests
1. Upgrade to bazel 8 and turn on Bzlmod by default: This pull request upgrades the project to Bazel 8, enables Bzlmod by default, and includes various fixes and updates to support the new build system and toolchains.

URL: pull/37923

Associated Commits: 38db4, 062ce, 064fa, a8e5e, a345d, aee69, fbdc7, e44ea, 6a62e, 7b472, 774d6, 9aec2, 9a98c

2. [ROCm] Support hipblaslt group-gemm: This pull request adds support for HipBlasLT-Ext GroupedGemm in the RaggedDot operation for matrices with and without batch dimensions across three ragged modes, extends the autotuner to handle group-gemm configurations with consistent group-size inputs, and includes performance improvements demonstrated by benchmarks on MI300 hardware, while also providing a user flag to enable or disable this feature due to current limitations in datatype support and optimization.

URL: pull/38088

Associated Commits: 1771c, 85916, 01d07, 29249

3. [xla:gpu] Use Command::Walk APIs to collect buffer uses and command properties: This pull request updates the XLA GPU backend to use the Command::Walk APIs for collecting buffer uses and command properties, ensuring consistent semantics for buffer_uses in preparation for unifying the Command and Thunk components.

URL: pull/37903

Associated Commits: de2b0

Other Open Pull Requests

BoringSSL Dependency Update: This pull request updates the BoringSSL dependency to the latest release version 0.20260211 to resolve build issues caused by the previously obsolete version. The update addresses problems reported by the ArchLinux community to ensure smoother builds.

pull/37924

GPU Executable Optimization and Deadlock Fixes: These pull requests optimize GPU executable performance by reducing unnecessary synchronization after initial NCCL initialization and fix a deadlock issue in the GPU clique cache by evicting sibling sub-cliques to maintain symmetric cache state. Together, they improve GPU runtime efficiency and stability during collective operations.

pull/37936, pull/38105

ROCm Autotune and Test Enhancements: This pull request fixes invalid split_k and block_k configurations in the ROCm autotune search space and adds a regression test to ensure compatibility constraints are met. Another pull request improves ROCm test coverage by including ROCm-specific parameters and removing hardcoded CUDA expectations.

pull/37992, pull/38156

GitHub Actions Workflow Refactor: This automated pull request refactors the project's GitHub Actions workflow to comply with the latest internal standards, facilitating an upgrade managed by the GHSS team. The update ensures the CI/CD pipeline remains current and maintainable.

pull/38001

XLA PJRT GPU Backend Network Topology Update: This pull request updates the XLA PJRT GPU backend to pass network nodes to the LocalTopologyProto, enabling the coordinator process to perform network-topology-optimized global device assignment. This enhancement improves device assignment efficiency based on network topology.

pull/38009

Memory Space Correction in GPU Collective Ops Test: This pull request updates the collective_ops_ffi_test to ensure output results are returned in the default device memory space rather than the collective memory space. This prevents users from receiving results in an incorrect memory space, improving test accuracy.

pull/38107

Host Runtime Debugging Enhancements: This pull request adds TraceMe annotations formatted for compatibility with XLA_LOG_DEVICE within the CommonPjRtClient and the XLA:GPU-specific StreamExecutor client. These annotations embed important metadata into each trace, enhancing debuggability of the host runtime.

pull/38110

AsyncWorkRunner API Unification: This pull request unifies the AsyncWorkRunner API with the existing tsl::Executor by migrating PJRT to use the standard API. This enables a single implementation of ExecuteWhenReady that leverages the common RunWhenReady method, improving consistency and enabling future cleanup.

pull/38135

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 43
Key Closed Pull Requests
1. [XLA:GPU][oneAPI] Fix platform error in stream executor tests with SYCL backend: This pull request addresses and fixes the platform error encountered in stream executor tests when using the SYCL backend by ensuring the TENSORFLOW_USE_SYCL macro is properly applied in the hermetic build environment, thereby resolving the issue of the missing registered platform named "cuda" during test execution.

URL: pull/37235

Associated Commits: d18e8, 202ee, 32cc6

Associated Commits: d18e8, 202ee, 32cc6

2. [ROCm] suppress cuda error messages on ROCm when doing training: This pull request aims to suppress harmless CUDA-related error messages that appear during training on ROCm platforms by preventing redundant plugin registration warnings, aligning ROCm's behavior with the existing suppression on CUDA.

URL: pull/36882

Associated Commits: 60c26, e1a65

Associated Commits: 60c26, e1a65

3. Build and use xxd from source: This pull request introduces building and using the xxd tool from source to eliminate the only non-hermetic, non-POSIX dependency in the PJRT plugins build process, thereby enabling fully remote and hermetic builds on a vanilla Debian Docker image.

URL: pull/37816

Associated Commits: 2b97e, ab301

Associated Commits: 2b97e, ab301

Other Closed Pull Requests

GitHub Actions Workflow Refactor: Multiple pull requests propose an automated refactor of the project's GitHub Actions workflow to align with the latest standards specified in the internal guideline b/485167538. These changes aim to facilitate an upgrade process that may be force merged by the GHSS team if not accepted voluntarily, ensuring compliance and modernization of the CI/CD pipeline.  
pull/37944, pull/37945, pull/37946, pull/37948, pull/37949, pull/37951, pull/37952, pull/37953, pull/37955, pull/37957, pull/37958, pull/37959, pull/37960, pull/37961, pull/37962, pull/37963, pull/37956

ROCm GPU Backend Enhancements: Several pull requests improve ROCm GPU support by enabling native FP8 Triton-generated GEMM operations on AMD MI300/MI355 GPUs, propagating per-kernel register spill information to the autotuner, and adding platform detection with ROCm-specific device configurations to expand unit test coverage. These changes enhance performance, compatibility, and testing robustness for ROCm platforms.  
pull/37573, pull/37629, pull/37732

XLA:GPU Feature Improvements: Pull requests introduce support for sub-byte types in DynamicMemcpyFusion by enabling memcpy operations with byte-aligned strides and add functionality to support setting and getting state in all execution stages of the FFI handler. These enhancements improve memory operation flexibility and state management across XLA executions.  
pull/37769, pull/37783

New Container and Benchmarking: A pull request introduces tsl::UniqueAny, an std::any-like container designed to support move-only types, accompanied by benchmark results demonstrating its performance. This addition provides a new utility container for improved type handling in the codebase.  
pull/37821

GPU Collectives and Distributed Topology Optimization: One pull request proposes increasing the maximum device communication buffer size in GPU collectives to prevent future compilation failures, while another optimizes global device ID assignment in distributed XLA by incorporating network topology awareness to reduce spine switch traffic and cleans up related compilation warnings. These changes aim to improve scalability and efficiency in distributed GPU operations.  
pull/37893, pull/37906

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

ezhulenev
44
8
0
0

alekstheod
48
3
0
0

benknutson-google
29
0
0
0

google-admin
0
29
0
0

meteorcloudy
13
1
0
0

leo-amd
12
1
0
0

nurmukhametov
7
3
0
0

mfrancepillois
6
2
0
0

Eetusjo
5
3
0
0

terryysun
5
2
0
0

                                Don't miss what's next. Subscribe to Weekly Project News:

                        https://github.com/owner/public_repo (required)

            Email address (required)

Contributor	Commits	Pull Requests
ezhulenev	44	8
alekstheod	48	3
benknutson-google	29	0
google-admin	0	29
meteorcloudy	13	1
leo-amd	12	1
nurmukhametov	7	3
mfrancepillois	6	2
Eetusjo	5	3
terryysun	5	2