Weekly GitHub Report for Xla: January 25, 2026 - February 01, 2026 (21:35:24)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
- [STAT:AWAITING RESPONSE FROM CONTRIBUTOR] [ERR:BUILD] macos cross compile eigen unary build failure: This issue reports a build failure when cross-compiling the eigen unary component on macOS, specifically due to errors in the Eigen library's PacketMath.h file related to type mismatches during compilation with clang. The user provided detailed build logs showing the compilation error involving vector type initialization and binding issues in the Eigen source code.
- The comment requests additional information from the user, including reproduction steps, a minimal code snippet, and platform details, to assist in diagnosing the build failure.
- Number of comments this week: 1
Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
As of our latest update, there are no stale issues for the project this week.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 4
Summarized Issues:
- Build and Compilation Errors: The macOS cross-compilation of eigen_unary.cc fails due to clang compiler errors related to type mismatches in Eigen's PacketMath.h when compiling with specific vector types. This issue highlights problems in the build process that prevent successful compilation on certain platforms.
- issues/36849
- Debugging and Metadata Display: There is a request to add a flag in XLA to enable inline display of stack frame information within HLO output metadata, improving debugging clarity by avoiding the need to cross-reference compact stack metadata. This enhancement aims to make debugging more straightforward and efficient.
- issues/36953
- Numerical Computation Errors: The CPU runtime exhibits incorrect behavior when multiplying an array by the maximum fp64 value, resulting in infinities, while multiplying a single element by the same value yields a finite result. This indicates that the multiplication is not performed elementwise as expected, causing inconsistent numerical results.
- issues/37116
- Instruction Fusion Logic Flaws: The InstructionFusion component's IsAlwaysDuplicable function violates the intended semantics of the may_duplicate_ flag by allowing duplication of instructions even when may_duplicate_ is false. This flaw ignores important index-computation and register-pressure costs, complicates maintenance, and undermines consistent fusion policy enforcement.
- issues/37117
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 0
Summarized Issues:
As of our latest update, there were no issues closed in the project this week.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 23
Key Open Pull Requests
1. [ROCm] CI: Add ROCm CI support to GitHub Actions workflow: This pull request adds AMD ROCm GPU continuous integration support to the GitHub Actions workflow and Python build system by configuring sequential single and multi-GPU test suites, migrating build logic from shell scripts to Python, defining new build types targeting modern AMD architectures, and enabling flexible test skipping, thereby ensuring OpenXLA commits are validated on ROCm platforms with consolidated and consistent infrastructure.
- URL: pull/36893
- Associated Commits: 0160f, 33627, 5fd0b, 6c682, f8ef7, 90316, cfd64, dc80f, 5ca07, c0cc5, 64281, 424e2
2. [ROCm] Switch to k8s workers for rocm CI: This pull request switches the ROCm continuous integration (CI) workflow to use Kubernetes (k8s) workers, aiming to improve the infrastructure for running tests and benchmarks related to ROCm workloads.
- URL: pull/36924
3. [ROCm] Distinguish infra errors vs test failures: This pull request introduces a mechanism to distinguish between infrastructure errors and test failures in the ROCm testing pipelines to reduce noise and false positives caused by flaky infrastructure.
- URL: pull/36881
Other Open Pull Requests
- ROCm and GPU Backend Improvements: Multiple pull requests focus on enhancing ROCm backend stability and performance, including adding a flag to disable hipblaslt swish activation fusion to address LLM model performance issues, fixing duplicated function issues to restore epilogue support, and re-enabling miopen autotune for better convolution support. These changes collectively improve ROCm's functionality and compatibility with GPU workloads.
- [pull/36826, pull/36963, pull/36845, pull/37064]
- Deadlock and Communicator Split Fixes: Several pull requests address deadlocks and synchronization issues in GPU communicator splitting by enforcing participation checks, invalidating cached cliques, and improving logging and timing for communicator initialization. These fixes prevent indefinite waits and improve multi-GPU collective operation reliability.
- [pull/36981, pull/37024, pull/36882]
- Performance and Integration Enhancements: A pull request introduces a new XLA flag for asynchronous collective multi-streaming on GPUs, demonstrating a 16% performance improvement on a 1024-GPU run, alongside minor integration tweaks and added collective tests. Another PR adds an end-to-end test for NCCL DevComm with symmetric memory management to optimize collective communication performance.
- [pull/36839, pull/37010]
- Testing and CI Pipeline Additions: New testing infrastructure includes a ROCm JAX continuous integration pipeline to test JAX against the latest XLA changes, a test verifying multiple symmetric memory mappings, and a fix to allow running GPU tests on machines without GPUs. These efforts enhance test coverage and CI robustness across different environments.
- [pull/37056, pull/37081, pull/37063, pull/37026]
- LLVM and SPIRV Backend Fixes: Fixes include adding support for range metadata in the LLVM-SPIRV backend to prevent runtime crashes and applying zero initializers for scalar or empty global constants on the SYCL platform to avoid test failures. These changes ensure valid IR and stable execution on SPIRV backends.
- [pull/36845, pull/37026]
- Bug Fixes and Workarounds: Pull requests implement a workaround for Thor devices by hardcoding memory parameters due to driver issues, update expected error values to fix numerical precision mismatches in ROCm tests, and fix a Use-After-Free race condition in the CPU JIT compiler by introducing a task lifetime indirection. These fixes address hardware quirks, test stability, and compiler reliability.
- [pull/36963, pull/36970, pull/37101]
- Code Refactoring and New Passes: Refactoring efforts include improving the OneDnnResources component for better resource management and introducing a convolution kind assignment pass with unit tests to support convolution fusion rewriting. These changes improve code maintainability and add new functionality for convolution handling.
- [pull/36806, pull/37101]
- Error Message Suppression: A pull request suppresses harmless CUDA-related error messages on ROCm platforms caused by double plugin registration, aligning ROCm behavior with CUDA and reducing noise during training.
- [pull/36882]
- Communication Performance Optimization: Fixes to matching logic in reduce scatter creators address missing replacement opportunities, resulting in significant communication performance improvements in 3D parallel workloads, supported by new unit tests.
- [pull/37074]
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 12
Key Closed Pull Requests
1. [WIP] test with more UTs: This pull request aims to increase test coverage on the ROCm side by adding more unit tests and improving the CI/CD pipeline to better handle multigpu testing and test execution for the current pull request.
- URL: pull/36513
- Associated Commits: fb121, 9ba2a, bd739, a6a5f, 2d4e3, 11388, 5f064, e0164, 5f9cb, b8a6d, 9627f, 06687, 43611, 6eaa2, d9088, c8951, 27e75
- Associated Commits: fb121, 9ba2a, bd739, a6a5f, 2d4e3, 11388, 5f064, e0164, 5f9cb, b8a6d, 9627f, 06687, 43611, 6eaa2, d9088, c8951, 27e75
2. Sync gha docker image with rbe docker image: This pull request aims to synchronize the GitHub Actions (GHA) Docker image with the Remote Build Execution (RBE) Docker image to ensure that the local build and run environment matches the RBE executor environment, thereby preventing inconsistencies and errors during the build process.
- URL: pull/36815
3. [xla:gpu] Use command buffer resources to track command executor record state: This pull request proposes using command buffer resources to track the command executor record state, adds a record ID to distinguish multiple recordings of the same executor, and updates the related documentation.
- URL: pull/36521
Other Closed Pull Requests
- Bug fixes and typo corrections in parsing and documentation: Multiple pull requests address errors in code and documentation, including fixing a typo in the HLO parser to correctly parse the
rhs_dilatefield and correcting a dtype reference from 'bf316' to 'bf16' in the shapes documentation. These fixes ensure accurate parsing and clearer documentation for users and developers. - [pull/36803, pull/36798]
- GPU backend improvements and test coverage: Several pull requests enhance GPU-related functionality by fixing build breaks in GPU buffer tests and extending test coverage to all supported GPU architectures, removing unused state in GPU command initialization to prepare for thunk and command unification, and adding support for NCCL scalable initialization to handle multiple clique IDs and improve logging and deadlock handling. These changes improve stability, scalability, and maintainability of GPU operations in XLA.
- [pull/36599, pull/37029, pull/36901]
- Debugging enhancements: Pull requests focus on improving debugging capabilities by limiting verbosity and standardizing logging for collective GPU operations, improving rendezvous debugging to identify stuck threads, and introducing a new debug option to print inline stack frames with source file and line number information in HLO instructions. These improvements facilitate easier diagnosis of issues during development and runtime.
- [pull/36846, pull/36975]
- Memory handling fix for host offloaded outputs: One pull request fixes the handling of memory space information in
entry_computation_layoutfor host offloaded outputs in thecompute_oncustom call, enabling workloads like MaxText to correctly offload computations to host memory and adding new unit tests to verify this behavior. This fix ensures correct memory management for specific offloading scenarios. - [pull/36525]
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| ezhulenev | 63 | 7 | 1 | 16 |
| alekstheod | 37 | 7 | 0 | 7 |
| leo-amd | 15 | 1 | 0 | 1 |
| mwhittaker | 0 | 0 | 0 | 13 |
| terryysun | 7 | 1 | 0 | 3 |
| bhavani-subramanian | 10 | 0 | 0 | 0 |
| mdfaijul | 6 | 2 | 0 | 0 |
| Eetusjo | 6 | 1 | 0 | 0 |
| Tixxx | 6 | 0 | 0 | 0 |
| nurmukhametov | 5 | 1 | 0 | 0 |