Weekly GitHub Report for Xla: October 13, 2025 - October 20, 2025 (12:01:32)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
As of our latest update, there are no active issues with ongoing comments this week.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- New
nvshmemrule breaks the build: This issue reports a build failure caused by a newnvshmemrule introduced in a recent pull request, which leads to an error where therepository_ctxobject lacks the expectedgetenvmethod during the CUDA configuration step. The reporter is seeking guidance on whether they need to update their side to resolve this problem or if the fix must come from the openxla project, along with an estimated timeline for addressing the issue. - Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the Torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported model with XLA AOT but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit, and has provided code snippets and bytecode samples to assist in troubleshooting.
- support bazel modules: This issue requests the adoption of Bazel modules within the project, highlighting that Bazel modules have gained significant usage. The reporter notes that XLA is currently the only package in their Bazel build that lacks support for these modules and inquires about any plans to implement this feature.
- Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update correctly adjusts the lowLatencyBandwidth for AMD links but fails to apply the same update to the CUDA section. As a result, invoking the gpu_collective_performance model with H100 GPU settings leads to a failure, indicating incomplete handling of bandwidth parameters across different GPU architectures.
- Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user reports that despite using the
--config=cross_compile_linux_arm64flag in the Bazel build system, the build process persistently produces an x86 binary, indicating a possible misconfiguration or missing step in the cross-compilation setup.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 3
Summarized Issues:
- Dynamic shape support and optimization: This topic covers the reliance of XLA on runtime just-in-time compilation to generate concrete kernels based on actual input shapes, which raises questions about reducing runtime compilation overhead. It also explores efforts to enable more static-graph-like optimizations for partially dynamic shapes to improve performance.
- issues/32619
- Multi-backend partitioning and cost-based compilation: This topic involves developing a PJRT plugin for a custom accelerator to support cost-based partitioning of XLA/HLO modules across multiple devices like CPU, GPU, and the accelerator. It discusses the need for exposing device cost constraints during compilation and deciding whether to implement partitioning via an XLA pass or an external orchestration layer.
- issues/32677
- Verifier failures in mixed precision collective operations: This topic addresses a verifier failure during collective permute operations on multi-GPU setups when mixed floating point precisions are used. The issue suggests that adding
kCollectivePermuteto the allowed opcode list could fix the problem by permitting mixed precision operands. - issues/32845
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 1
Summarized Issues:
- MLIR Emitter Crash Due to Symbol Removal: The issue involves a crash in the MLIR emitters during the compilation of a 4D convolution HLO module caused by the premature removal of unused symbols in the indexing map. This removal leads to a segmentation fault later in the code, which assumes those symbols still exist, resulting in a failure during compilation.
- issues/32635
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 15
Key Open Pull Requests
1. [ROCm] Fix hermetic build for rocm: This pull request addresses fixing the hermetic build for ROCm by introducing the missing hipblaslt dependency, correcting invalid library linkings, and aligning data directories to ensure proper build functionality.
- URL: pull/32782
- Merged: No
2. [XLA:GPU] Enable command buffer DynamicSliceCopyFusion command unrolling: This pull request enables the command buffer DynamicSliceCopyFusion command to be recorded into an unrolled CUDA graph when surrounded by WhileCmd, facilitating full command buffer WhileCmd unrolling into CUDA graphs.
- URL: pull/32688
- Merged: No
3. [ROCm] Fix convolution fp16 performance drop on gfx11xx, gfx12xx: This pull request addresses a performance regression in fp16 precision convolution on gfx11xx and gfx12xx GPUs by removing the hardcoded NHWC convolution layout, resulting in significant throughput improvements as demonstrated by profiling benchmarks.
- URL: pull/32773
- Merged: No
Other Open Pull Requests
- Documentation improvements: Multiple pull requests enhance project documentation by adding new guides and detailed error code pages. These updates provide users with better resources for performance optimization and error handling within the project.
pull/32872, pull/32628
- Performance and scalability enhancements: Several pull requests improve performance and scalability, including updating communication types for multi-node NVLink topologies, enabling cuDNN GEMM autotuning for scaled dot fusion on GPUs, and fixing scalability issues in the asynchronous gRPC profiling client by consolidating CompletionQueues. These changes optimize model dispatching, GPU operation tuning, and profiling reliability in high-concurrency environments.
pull/32836, pull/32738, pull/32645
- Command buffer and CUDA graph support: One pull request enables recording the DynamicSliceFusion command into unrolled CUDA graphs when surrounded by WhileCmd, facilitating full command buffer WhileCmd support in CUDA graphs. This improves the flexibility and capability of command buffer execution on CUDA.
pull/32719
- Testing and platform support updates: Updates include selectively disabling failing test cases while enabling others on the B200 platform, and adding support for the RISC-V 64 architecture by updating build, code generation, and packaging infrastructure. These changes improve test reliability and expand platform compatibility.
pull/32724, pull/32812
- Codebase modernization and bug fixes: Pull requests remove legacy proto workarounds following thunk-based execution adoption, fix a bug in family-conditional logic for architecture fallback, and enable verifier support for mixed precision operands in collective permute operations. These updates improve code correctness, maintainability, and support for new features.
pull/32800, pull/32838, pull/32846
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 5
Key Closed Pull Requests
1. [XLA:GPU] add conv fusion support in cudnn fusion compiler: This pull request introduces convolution fusion support in the cuDNN fusion compiler for XLA on GPU by adding convolution types to the fusion configuration, implementing a dimension adapter for logical layout generation, and defining translation rules from XLA convolution operations to the cuDNN frontend graph API, aiming to replace the existing convolution custom call with fused convolutions.
- URL: pull/32718
- Merged: No
2. [ROCm] Use working sha256 for latest ROCm 7.0 docker image and fix test scripts: This pull request aims to update the ROCm 7.0 docker image with a correct sha256 checksum to prevent CI failures caused by a malformed image and to fix test scripts by passing the ROCM_PATH environment variable to the bazel sandbox, ensuring continued successful CI runs.
- URL: pull/32678
- Merged: No
3. [ROCm] Fix rocm build, oneapi deps: This pull request aims to fix the ROCm build process by excluding targets marked as oneAPI or CUDA-only from compilation, thereby resolving build break issues in the ROCm continuous integration job.
- URL: pull/32740
- Merged: No
Other Closed Pull Requests
- Build and CI Fixes: Multiple pull requests address issues related to the build and continuous integration processes. One PR fixes the invalid run_under script in the ROCm CI job to properly handle AddressSanitizer ignore files, resolving build issues related to asan settings, while another adds a dummy file to the project, although it has not been merged yet.
- [pull/32642, pull/32636]
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| alekstheod | 31 | 5 | 0 | 0 |
| meteorcloudy | 24 | 2 | 0 | 2 |
| sergachev | 6 | 3 | 0 | 18 |
| rao-ashish | 3 | 3 | 0 | 5 |
| athurdekoos | 7 | 3 | 0 | 0 |
| othakkar | 5 | 3 | 0 | 2 |
| dimvar | 6 | 4 | 0 | 0 |
| terryysun | 7 | 2 | 0 | 0 |
| Cjkkkk | 4 | 1 | 0 | 4 |
| mtsokol | 3 | 3 | 0 | 2 |