Weekly Project News

Archives
Subscribe

Weekly GitHub Report for Xla: February 01, 2026 - February 08, 2026 (15:58:20)

Weekly GitHub Report for Xla

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

No recent version releases were found.

1.2 Version Information:

Please provide the version release information you would like me to analyze and summarize.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. [STAT:AWAITING OPENXLA-ENG] [ERR:PERFORMANCE] ConvertElementType bool-to-int returns incorrect result for nonstandard booleans: This issue addresses a problem in the XLA compiler where converting nonstandard boolean values to integers using ConvertElementType returns incorrect results, violating the StableHLO specification that true should convert to one and false to zero. The discussion highlights challenges in fixing this behavior across different backends, the impact on related NumPy operations like view, and the potential need to extend XLA's BitcastConvertType to support boolean inputs and outputs to fully resolve the issue.
    • The comments reveal attempts to fix the conversion logic in CPU/GPU backends while noting difficulties with the interpreter and compiler optimizations; they also discuss related test failures due to differences between ConvertElementType and BitcastConvertType, consider the implications for JAX's numpy.view function, and explore extending XLA to support boolean types in bitcast conversions as a possible solution.
    • Number of comments this week: 5

Since there were fewer than 5 open issues, all of the open issues have been listed above.

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

As of our latest update, there are no stale issues for the project this week.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 3

Summarized Issues:

  • Conversion and Type Handling Issues: The ConvertElementType operation in XLA incorrectly handles nonstandard boolean values when converting to integers, failing to comply with the StableHLO specification. This results in improper conversion behavior that deviates from expected standards.
  • issues/37159
  • Precision and Compliance Bugs: There is a precision bug in XLA on GPU where float32 division does not produce exact IEEE 754 results, as the current implementation uses a fast but approximate division method lacking full rounding support. This causes expressions like a/a to not equal 1.0, violating expected floating-point behavior.
  • issues/37181
  • Tool Usage and TPU Support: Users have inquired about the possibility and instructions for running the tools run_hlo_module and hlo-opt on Google TPUs, seeking guidance on TPU compatibility and usage. No explicit resolution or instructions are provided in the data.
  • issues/37277

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 0

Summarized Issues:

As of our latest update, there were no issues closed in the project this week.

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 10

Key Open Pull Requests

1. [XLA:GPU][oneAPI] Fix platform error in stream executor tests with SYCL backend: This pull request addresses and fixes the platform error "Could not find registered platform with name: 'cuda'" encountered when running specific stream executor tests with the SYCL backend by ensuring the TENSORFLOW_USE_SYCL macro is properly applied in the hermetic build environment using rules_ml_toolchain.

  • URL: pull/37235
  • Associated Commits: d18e8, 202ee, 32cc6

2. [GPU][NFC] Rename H100 and B200 test data for clarity.: This pull request renames the H100 and B200 test data to improve clarity by correctly distinguishing them from RTX models, and also removes an unused dependency.

  • URL: pull/37309
  • Associated Commits: 93a4f, b0b5c

3. add conv fusion rewriter: This pull request introduces a new convolution fusion rewriter that refactors XLA cuDNN convolution operations to enable various fusion types such as epilogue activation fusion, integer and FP8 convolution fusion, enforces NHWC layout, supports float normalization to avoid unnecessary type conversions, constructs backward convolution windows consistent with forward convolutions, normalizes layouts, and reuses existing unit tests while adapting to the new cuDNN frontend constraints.

  • URL: pull/37311
  • Associated Commits: b7231, bae0b

Other Open Pull Requests

  • Swish Activation Fusion Logic Update: This pull request modifies the swish activation fusion logic to skip fusion when the pre-activation output is used, addressing the lack of SILU_AUX support in Hipblaslt. It also prevents redundant GEMM operations by introducing a function to count final users rather than direct users.
  • pull/37287
  • Compiler Flag and Build Configuration Enhancements: These pull requests add the missing --offload-compress clang flag to the hipcc compiler to reduce binary size and fix a linking error, and enable the --config=warnings option as the default build configuration to treat compiler warnings as errors. Both changes improve build reliability and error detection during development.
  • pull/37370, pull/37430
  • Strongly-Typed Identifiers Consistency: This pull request updates the PjRt component to use the same strongly-typed identifiers—ProcessId, DeviceId, and ChipId—as those in the xla/runtime, ensuring consistency across the codebase. This alignment helps maintain type safety and uniformity in identifier usage.
  • pull/37395
  • CollectiveMemory API Usage and Testing: These pull requests implement the use of CollectiveMemory in testing the CollectiveKernel to verify correct memory space assignment and introduce APIs to use CollectiveMemory for acquiring peer addresses in the xla:gpu module. Both ensure consistency and correctness in handling collective memory requests and peer address acquisition.
  • pull/37414, pull/37490
  • Concurrency API Enhancement: This pull request introduces a new API called Future::Flatten() to the project, enhancing its concurrency capabilities. This addition aims to improve the handling of asynchronous operations.
  • pull/37473

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 27

Key Closed Pull Requests

1. [Test only do not merge] Test explicit tests for mgpu: This pull request is a test-only submission intended to add explicit tests for multi-GPU (mgpu) functionality, without merging into the main codebase.

  • URL: pull/37214
  • Associated Commits: 45517, b7b4e, d9942, 568bb, a5cee, 57ccf, b806e, 454ff, 3e57d, 65f9f, 488a8, 6f043, 56b54, 3cd22, 527df, 65d86
  • Associated Commits: 45517, b7b4e, d9942, 568bb, a5cee, 57ccf, b806e, 454ff, 3e57d, 65f9f, 488a8, 6f043, 56b54, 3cd22, 527df, 65d86

2. [ROCm] Enable backends/gpu/autotuner unit tests on ROCm: This pull request makes several autotuner unit tests platform-independent by removing dependencies on NVPTXCompiler and enables these tests on the ROCm platform to expand ROCm test coverage, ensuring all tests either pass or are skipped when not relevant.

  • URL: pull/36553
  • Associated Commits: 08bc5, 995d7, 7fcfb, e25cc, 27b72, d501c
  • Associated Commits: 08bc5, 995d7, 7fcfb, e25cc, 27b72, d501c

3. [xla:gpu] Clean up execution stream assignment: This pull request attempts to simplify and clean up the execution stream assignment for GPU operations by organizing it around execution scopes, addressing asynchronous instruction handling, adding support for pipelined send/receive operations, and fixing a crash caused by out-of-order instruction processing, although it was not merged.

  • URL: pull/37389
  • Associated Commits: 62aad, 8d8fc, 1daac
  • Associated Commits: 62aad, 8d8fc, 1daac

Other Closed Pull Requests

  • Communicator and Device Group Improvements: This set of pull requests fixes deadlocks caused by improper use of communicator split APIs and renames participant_groups to device_groups to better reflect their usage during communicator acquisition. It also enforces explicit checks for rank participation and enhances logging with debug information and timing for communicator initialization.
    • pull/36981
  • Collective Memory and Multicast Support: These pull requests introduce support for multicast memory in CollectiveMemory and CollectiveMemoryRequests as an optimized alternative to CollectiveMultimem, with plans to migrate existing users. Additionally, CollectiveMultimem support is removed from the FFI to fully replace it with the new collective memory APIs.
    • pull/37179, pull/37408
  • Execution Stream and Instruction Processing Enhancements: This pull request restructures execution stream assignment around execution scopes to simplify the XLA GPU backend and fixes a crash caused by out-of-order instruction processing.
    • pull/37247
  • Intra-process Data Transfer Bug Fix: A bug fix modifies the timing of event allocation and recording for intra-process data transfers to occur immediately after enqueueing dependencies, preventing host thread blocking during dependent executable launches.
    • pull/35456
  • ROCm Backend Fixes and Improvements: These pull requests fix duplicated function issues in the ROCm backend by sharing AsBlasLtEpilogue with added SILU support, update expected error values for ROCm-specific numerical precision mismatches, enable building the xla-opt binary with necessary visibility patches, add an experimental flag for autotuning fusions with Triton, and clean up ROCm build tag filters.
    • pull/36963, pull/37010, pull/37157, pull/37160, pull/37179
  • NCCL DevComm and Symmetric Memory Tests: These pull requests add end-to-end tests for NCCL DevComm on GPUs, including symmetric memory allocation and management during executable initialization, and verify that the same physical memory allocation can be mapped to multiple symmetric memory regions across different communicator sets.
    • pull/37024, pull/37082
  • Test Environment and Script Fixes: This pull request fixes the parallel_gpu_execute script to allow running tests on machines without GPUs, addressing failures in GPU-less test environments.
    • pull/37064
  • Reduce Scatter Creator Bug Fix: A bug fix improves matching logic between AR and Slice operations in the reduce scatter creator to increase replacement opportunities, reducing communication overhead in 3D parallel workloads, and includes new unit tests for verification.
    • pull/37081
  • Convolution Fusion Rewriter Enhancements: This pull request introduces a new convolution kind assignment pass to determine convolution types as part of splitting the convolution fusion rewriter, accompanied by unit tests for the pass and related utilities.
    • pull/37101
  • PJRT Receive Callback Bug Fix: This pull request fixes a bug in PJRT where receive callbacks were incorrectly handled using the size of send callbacks, causing segmentation faults during Rust bindings development.
    • pull/37129
  • XLA API Standardization: This pull request renames identifiers from node_id to process_id and from nodes to processes to better represent multiple communicating processes on the same host, as a non-functional API standardization.
    • pull/37250
  • Thunk and Command Unification: This pull request proposes unifying the handling of buffer_uses and buffers between Thunk and Command components in the XLA GPU backend as a step towards consolidating commands and thunks.
    • pull/37252
  • FloatNormalization Autotuner Fix for ROCm: This pull request changes the FloatNormalization autotuner to use gpu_compute_capability() instead of cuda_compute_capability() when constructing GpuFloatSupport, addressing potential ROCm platform issues.
    • pull/37289
  • Open Source Layering Check Restoration: This pull request patches brotlin and riegel to restore the --features=layering_check functionality in the open-source version, enabling compiler time errors for missing dependencies and improving OSS contributor productivity.
    • pull/37300
  • JoinFutures Functionality Enhancements: These pull requests implement JoinFutures functionality for Future types carrying payloads and for statically known types, enabling combination of multiple futures into a single future with resolved values, and fix handling of futures completing with errors.
    • pull/37313, pull/37411, pull/37420
  • XLA:FFI Header Cleanup: This pull request prevents runtime crashes caused by duplicate static registries by clarifying that ffi_api.h is internal and that XLA:FFI users should only depend on ffi.h, marking an initial step towards header dependency cleanup.
    • pull/37470

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
ezhulenev 74 18 1 5
alekstheod 54 6 0 0
bhavani-subramanian 13 1 0 3
leo-amd 15 1 0 0
nurmukhametov 8 4 0 0
terryysun 7 1 0 0
mdfaijul 6 2 0 0
Eetusjo 6 1 0 0
Tixxx 6 0 0 0
pavithraes 6 0 0 0

Access Last Week's Newsletter:

  • Link
Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.