Weekly GitHub Report for Xla: October 20, 2025 - October 27, 2025 (12:01:21)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
WhileThunk may block in async mode: This issue addresses a problem where the WhileThunk operation in the XLA CPU backend may block for a long time during asynchronous execution if the number of iterations is large, despite the loop body being fast enough to run synchronously. The reporter suggests potential solutions such as using a cost model to decide when to execute asynchronously or implementing a timeout to switch to asynchronous execution after a threshold, and provides a detailed test case and logs illustrating dependency and buffer usage issues that might contribute to the blocking behavior.
- The discussion includes attempts to reproduce the issue with a simplified test case involving a large matrix row summation using a while loop, observations about dependency overlaps in the execution graph, and detailed logs of thunk execution and buffer usage; contributors request more information and the reporter shares extensive code and diagnostic output while continuing to investigate the root cause of the blocking in asynchronous mode.
- Number of comments this week: 5
Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- New
nvshmemrule breaks the build: This issue reports a build failure caused by a newnvshmemrule introduced in a recent pull request, which leads to an error related to the absence of agetenvmethod in therepository_ctxobject during the CUDA configuration step. The reporter is seeking guidance on whether they need to update their side to resolve this error, particularly in relation to changes mentioned for JAX, or if the fix must come from the open_xla project, along with an estimated timeline for addressing the problem. - Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the Torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported model with XLA AOT but faces deserialization errors with StableHLO, despite using compatible versions of Torch, TorchXLA, and building XLA from the corresponding commit, and has provided code snippets and bytecode samples to assist in troubleshooting.
- support bazel modules: This issue requests the adoption of Bazel modules within the project, highlighting that Bazel modules have seen significant usage and benefits. It specifically points out that XLA is currently the only package in the user's Bazel build that does not support these modules, suggesting a need for compatibility improvements.
- Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the update to lowLatencyBandwidth for AMD links was correctly applied as 8 times the per_link bandwidth, but the corresponding update for the CUDA section was omitted. As a result, invoking the gpu_collective_performance model with H100 GPU settings leads to a failure, indicating incomplete handling of bandwidth parameters across different GPU architectures.
- Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user is unable to prevent the Bazel build system from producing an x86 binary despite using the
--config=cross_compile_linux_arm64flag and is seeking guidance on the correct approach to achieve successful cross-compilation.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 2
Summarized Issues:
- Performance issues with WhileThunk in XLA CPU backend: The WhileThunk operation can cause significant blocking during asynchronous execution when loop iterations are large, leading to performance degradation even if the loop body is fast enough for synchronous execution. Potential solutions discussed include using a cost model or timeout to better decide when to run asynchronously.
- issues/33048
- Compilation failures with dynamic shapes in XLA CPU C API: Running JAX-exported stablehlo modules with dynamic or bounded dynamic shapes fails due to unbounded dynamism being disabled and missing implementations for certain custom calls. The user requests guidance on enabling support for dynamic batch sizes as outlined in the stablehlo documentation.
- issues/33092
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 0
Summarized Issues:
As of our latest update, there were no issues closed in the project this week.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 14
Key Open Pull Requests
1. Execute all xla tests in one command: This pull request aims to adapt the ROCm continuous integration workflow by enabling the execution of all XLA unit tests in a single command, including forcing multigpu tests to run locally, thereby improving test coverage and efficiency without introducing functional changes.
- URL: pull/33078
- Merged: No
2. [XLA:GPU][oneAPI] SYCL memcpy functions and tests: This pull request implements SYCL memcpy functions along with corresponding tests and introduces miscellaneous features in the sycl_gpu_runtime such as functions to retrieve SYCL frequency and synchronize streams, enhancing GPU support for oneAPI within the XLA project.
- URL: pull/32921
- Merged: No
3. [XLA:CPU[oneDNN] Delete oneDNN code corresponding to the legacy runtime: This pull request proposes deleting the existing oneDNN code in the XLA:CPU component that corresponds to the legacy runtime.
- URL: pull/32926
- Merged: No
- Associated Commits: 01513
Other Open Pull Requests
- oneDNN support and fixes: Multiple pull requests enhance oneDNN integration by fixing failing tests and F16 regressions through expanded type promotion conditions and enabling float support for custom calls and graph execution. Additionally, quantization support is introduced in the XLA CPU backend using oneDNN, following a specified RFC design.
- [pull/32934, pull/32959]
- NCCL performance and configuration improvements: Several pull requests focus on optimizing NCCL behavior for GPUs by setting the maximum channels to 32 for Blackwell GPUs to avoid regressions, prioritizing NCCL collective commands in the GPU command buffer to improve compute-communication overlap, and adding a warm-up iteration in the GPU backend to handle unsupported CUDA graph API calls during NCCL setup.
- [pull/32970, pull/32993, pull/33073]
- ROCm backend updates and fixes: Updates to the ROCm platform include bumping the ROCm version to 7.0.2 for better features and performance, fixing scheduling logic by removing forced delays to align with earliest scheduling intentions, and relaxing spawn strategies for remote build execution to enable remote-only test execution.
- [pull/33016, pull/33076, pull/33085]
- cuDNN and CUDA graph compatibility fixes: A pull request adds a NumericOptions field to restrict cuDNN plan selection to those supporting CUDA graphs, fixing test failures caused by fallback to cuBLAS without CUDA graph support, and adjusts note placements in cuda_dnn.cc to ensure proper effect.
- [pull/33106]
- SYCL memory management implementation: One pull request implements SYCL memory management functions and adds tests for the XLA GPU oneAPI backend, expanding support for SYCL in XLA.
- [pull/32918]
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 5
Key Closed Pull Requests
1. [ROCm] Refactor testing scripts: This pull request proposes a refactor of ROCm-specific testing scripts by partially upstreaming changes from previous ROCm xla contributions to improve internal CI validation pipelines, while skipping some asan/tsan modifications for now.
- URL: pull/32960
- Merged: No
2. [ROCm] Introduce pool name for rbe: This pull request proposes introducing a separate pool name for GPU tests execution in the ROCm RBE configuration to support better management of GPU test jobs in the continuous integration system.
- URL: pull/32954
- Merged: No
3. Allow mixed precision operands for async collective permute: This pull request proposes allowing mixed precision operands for asynchronous collective permute operations in the verifier to fix issue #32845, includes tests to ensure verifier compatibility, and has been manually tested with a JAX repro.
- URL: pull/32905
- Merged: No
- Associated Commits: f44fa
Other Closed Pull Requests
- ROCm CI and Bazel Configuration Updates: These pull requests focus on improving ROCm internal CI validation pipelines by fixing test scripts and adding a bazel disk cache to speed up builds and tests. Additionally, a CI-specific bazelrc file is introduced as a temporary workaround to manage dependencies on existing ROCm bazelrc files until the split logic in CI is removed.
- pull/32951, pull/33008
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| alekstheod | 39 | 5 | 0 | 3 |
| meteorcloudy | 17 | 1 | 0 | 0 |
| dimvar | 7 | 5 | 0 | 0 |
| rao-ashish | 3 | 3 | 0 | 5 |
| sergachev | 6 | 3 | 0 | 0 |
| shawnwang18 | 5 | 4 | 0 | 0 |
| mmakevic-amd | 7 | 2 | 0 | 0 |
| hsharsha | 6 | 1 | 0 | 1 |
| draganmladjenovic | 3 | 3 | 0 | 0 |
| mtsokol | 2 | 2 | 0 | 2 |