Weekly GitHub Report for Xla: February 01, 2026 - February 08, 2026 (15:58:20)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
- [STAT:AWAITING OPENXLA-ENG] [ERR:PERFORMANCE] ConvertElementType bool-to-int returns incorrect result for nonstandard booleans: This issue addresses a problem in the XLA compiler where converting nonstandard boolean values to integers using ConvertElementType returns incorrect results, violating the StableHLO specification that true should convert to one and false to zero. The discussion highlights challenges in fixing this behavior across different backends, the impact on related NumPy operations like view, and the potential need to extend XLA's BitcastConvertType to support boolean inputs and outputs to fully resolve the issue.
- The comments reveal attempts to fix the conversion logic in CPU/GPU backends while noting difficulties with the interpreter and compiler optimizations; they also discuss related test failures due to differences between ConvertElementType and BitcastConvertType, consider the implications for JAX's numpy.view function, and explore extending XLA to support boolean types in bitcast conversions as a possible solution.
- Number of comments this week: 5
Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
As of our latest update, there are no stale issues for the project this week.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 3
Summarized Issues:
- Conversion and Type Handling Issues: The ConvertElementType operation in XLA incorrectly handles nonstandard boolean values when converting to integers, failing to comply with the StableHLO specification. This results in improper conversion behavior that deviates from expected standards.
- issues/37159
- Precision and Compliance Bugs: There is a precision bug in XLA on GPU where float32 division does not produce exact IEEE 754 results, as the current implementation uses a fast but approximate division method lacking full rounding support. This causes expressions like a/a to not equal 1.0, violating expected floating-point behavior.
- issues/37181
- Tool Usage and TPU Support: Users have inquired about the possibility and instructions for running the tools
run_hlo_moduleandhlo-opton Google TPUs, seeking guidance on TPU compatibility and usage. No explicit resolution or instructions are provided in the data. - issues/37277
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 0
Summarized Issues:
As of our latest update, there were no issues closed in the project this week.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 10
Key Open Pull Requests
1. [XLA:GPU][oneAPI] Fix platform error in stream executor tests with SYCL backend: This pull request addresses and fixes the platform error "Could not find registered platform with name: 'cuda'" encountered when running specific stream executor tests with the SYCL backend by ensuring the TENSORFLOW_USE_SYCL macro is properly applied in the hermetic build environment using rules_ml_toolchain.
- URL: pull/37235
2. [GPU][NFC] Rename H100 and B200 test data for clarity.: This pull request renames the H100 and B200 test data to improve clarity by correctly distinguishing them from RTX models, and also removes an unused dependency.
- URL: pull/37309
3. add conv fusion rewriter: This pull request introduces a new convolution fusion rewriter that refactors XLA cuDNN convolution operations to enable various fusion types such as epilogue activation fusion, integer and FP8 convolution fusion, enforces NHWC layout, supports float normalization to avoid unnecessary type conversions, constructs backward convolution windows consistent with forward convolutions, normalizes layouts, and reuses existing unit tests while adapting to the new cuDNN frontend constraints.
- URL: pull/37311
Other Open Pull Requests
- Swish Activation Fusion Logic Update: This pull request modifies the swish activation fusion logic to skip fusion when the pre-activation output is used, addressing the lack of SILU_AUX support in Hipblaslt. It also prevents redundant GEMM operations by introducing a function to count final users rather than direct users.
- pull/37287
- Compiler Flag and Build Configuration Enhancements: These pull requests add the missing
--offload-compressclang flag to thehipcccompiler to reduce binary size and fix a linking error, and enable the--config=warningsoption as the default build configuration to treat compiler warnings as errors. Both changes improve build reliability and error detection during development. - pull/37370, pull/37430
- Strongly-Typed Identifiers Consistency: This pull request updates the PjRt component to use the same strongly-typed identifiers—ProcessId, DeviceId, and ChipId—as those in the xla/runtime, ensuring consistency across the codebase. This alignment helps maintain type safety and uniformity in identifier usage.
- pull/37395
- CollectiveMemory API Usage and Testing: These pull requests implement the use of CollectiveMemory in testing the CollectiveKernel to verify correct memory space assignment and introduce APIs to use CollectiveMemory for acquiring peer addresses in the xla:gpu module. Both ensure consistency and correctness in handling collective memory requests and peer address acquisition.
- pull/37414, pull/37490
- Concurrency API Enhancement: This pull request introduces a new API called Future::Flatten() to the project, enhancing its concurrency capabilities. This addition aims to improve the handling of asynchronous operations.
- pull/37473
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 27
Key Closed Pull Requests
1. [Test only do not merge] Test explicit tests for mgpu: This pull request is a test-only submission intended to add explicit tests for multi-GPU (mgpu) functionality, without merging into the main codebase.
- URL: pull/37214
- Associated Commits: 45517, b7b4e, d9942, 568bb, a5cee, 57ccf, b806e, 454ff, 3e57d, 65f9f, 488a8, 6f043, 56b54, 3cd22, 527df, 65d86
- Associated Commits: 45517, b7b4e, d9942, 568bb, a5cee, 57ccf, b806e, 454ff, 3e57d, 65f9f, 488a8, 6f043, 56b54, 3cd22, 527df, 65d86
2. [ROCm] Enable backends/gpu/autotuner unit tests on ROCm: This pull request makes several autotuner unit tests platform-independent by removing dependencies on NVPTXCompiler and enables these tests on the ROCm platform to expand ROCm test coverage, ensuring all tests either pass or are skipped when not relevant.
- URL: pull/36553
3. [xla:gpu] Clean up execution stream assignment: This pull request attempts to simplify and clean up the execution stream assignment for GPU operations by organizing it around execution scopes, addressing asynchronous instruction handling, adding support for pipelined send/receive operations, and fixing a crash caused by out-of-order instruction processing, although it was not merged.
- URL: pull/37389
Other Closed Pull Requests
- Communicator and Device Group Improvements: This set of pull requests fixes deadlocks caused by improper use of communicator split APIs and renames
participant_groupstodevice_groupsto better reflect their usage during communicator acquisition. It also enforces explicit checks for rank participation and enhances logging with debug information and timing for communicator initialization.
- Collective Memory and Multicast Support: These pull requests introduce support for multicast memory in
CollectiveMemoryandCollectiveMemoryRequestsas an optimized alternative toCollectiveMultimem, with plans to migrate existing users. Additionally,CollectiveMultimemsupport is removed from the FFI to fully replace it with the new collective memory APIs.
- Execution Stream and Instruction Processing Enhancements: This pull request restructures execution stream assignment around execution scopes to simplify the XLA GPU backend and fixes a crash caused by out-of-order instruction processing.
- Intra-process Data Transfer Bug Fix: A bug fix modifies the timing of event allocation and recording for intra-process data transfers to occur immediately after enqueueing dependencies, preventing host thread blocking during dependent executable launches.
- ROCm Backend Fixes and Improvements: These pull requests fix duplicated function issues in the ROCm backend by sharing AsBlasLtEpilogue with added SILU support, update expected error values for ROCm-specific numerical precision mismatches, enable building the xla-opt binary with necessary visibility patches, add an experimental flag for autotuning fusions with Triton, and clean up ROCm build tag filters.
- NCCL DevComm and Symmetric Memory Tests: These pull requests add end-to-end tests for NCCL DevComm on GPUs, including symmetric memory allocation and management during executable initialization, and verify that the same physical memory allocation can be mapped to multiple symmetric memory regions across different communicator sets.
- Test Environment and Script Fixes: This pull request fixes the parallel_gpu_execute script to allow running tests on machines without GPUs, addressing failures in GPU-less test environments.
- Reduce Scatter Creator Bug Fix: A bug fix improves matching logic between AR and Slice operations in the reduce scatter creator to increase replacement opportunities, reducing communication overhead in 3D parallel workloads, and includes new unit tests for verification.
- Convolution Fusion Rewriter Enhancements: This pull request introduces a new convolution kind assignment pass to determine convolution types as part of splitting the convolution fusion rewriter, accompanied by unit tests for the pass and related utilities.
- PJRT Receive Callback Bug Fix: This pull request fixes a bug in PJRT where receive callbacks were incorrectly handled using the size of send callbacks, causing segmentation faults during Rust bindings development.
- XLA API Standardization: This pull request renames identifiers from
node_idtoprocess_idand fromnodestoprocessesto better represent multiple communicating processes on the same host, as a non-functional API standardization.
- Thunk and Command Unification: This pull request proposes unifying the handling of buffer_uses and buffers between Thunk and Command components in the XLA GPU backend as a step towards consolidating commands and thunks.
- FloatNormalization Autotuner Fix for ROCm: This pull request changes the FloatNormalization autotuner to use
gpu_compute_capability()instead ofcuda_compute_capability()when constructingGpuFloatSupport, addressing potential ROCm platform issues.
- Open Source Layering Check Restoration: This pull request patches brotlin and riegel to restore the
--features=layering_checkfunctionality in the open-source version, enabling compiler time errors for missing dependencies and improving OSS contributor productivity.
- JoinFutures Functionality Enhancements: These pull requests implement JoinFutures functionality for Future types carrying payloads and for statically known types, enabling combination of multiple futures into a single future with resolved values, and fix handling of futures completing with errors.
- XLA:FFI Header Cleanup: This pull request prevents runtime crashes caused by duplicate static registries by clarifying that
ffi_api.his internal and that XLA:FFI users should only depend onffi.h, marking an initial step towards header dependency cleanup.
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| ezhulenev | 74 | 18 | 1 | 5 |
| alekstheod | 54 | 6 | 0 | 0 |
| bhavani-subramanian | 13 | 1 | 0 | 3 |
| leo-amd | 15 | 1 | 0 | 0 |
| nurmukhametov | 8 | 4 | 0 | 0 |
| terryysun | 7 | 1 | 0 | 0 |
| mdfaijul | 6 | 2 | 0 | 0 |
| Eetusjo | 6 | 1 | 0 | 0 |
| Tixxx | 6 | 0 | 0 | 0 |
| pavithraes | 6 | 0 | 0 | 0 |
Access Last Week's Newsletter: