Weekly GitHub Report for Xla: August 04, 2025 - August 11, 2025 (22:39:30)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
As of our latest update, there are no active issues with ongoing comments this week.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- New
nvshmem
rule breaks the build: This issue reports a build failure caused by a newnvshmem
rule introduced in a recent pull request, which leads to an error related to therepository_ctx
object lacking agetenv
method during the CUDA configuration step. The reporter is seeking guidance on whether they need to update their side to resolve this error, particularly in relation to changes mentioned for JAX, or if the fix must come from the open_xla project, along with an estimated timeline for addressing the problem. - Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported model with XLA AOT but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit, and has provided code snippets and bytecode samples to assist in troubleshooting.
- support bazel modules: This issue discusses the potential adoption of Bazel modules within the project, highlighting that Bazel modules have seen significant adoption in the community. The reporter points out that XLA is currently the only package in their Bazel build that does not support Bazel modules and inquires about any plans to integrate this support.
- Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update to lowLatencyBandwidth for AMD links was not applied to the CUDA section, causing failures when using H100 settings. As a result, the model call with these settings does not function correctly, indicating an inconsistency in how bandwidth parameters are handled across different GPU architectures.
- Possibility to specify strides when sending the data from buffer to host: This issue addresses the limitation in specifying byte strides when transferring data from a PJRT Buffer back to the host, particularly for data originally in column-major format. It highlights that while the
byte_strides
argument facilitates this conversion when creating the buffer, a similar mechanism is not generally supported for the reverse operation due to inconsistent plugin support, and requests the addition of abyte_strides
field to enable this functionality.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 2
Summarized Issues:
- Performance Optimization Techniques: This topic covers efforts to improve computational efficiency in XLA operations, including requests for benchmark data on GPU MFU performance for Llama models and optimization of loop operations for partial prefix sums. The issues highlight challenges such as replacing inefficient ReduceWindow with sliding window techniques and exploring parallelization of WhileOp to enhance performance.
- [issues/29836, issues/29857]
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 0
Summarized Issues:
As of our latest update, there were no issues closed in the project this week.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 6
Key Open Pull Requests
1. [XLA:GPU} Add nested command buffer support: This pull request proposes adding support for nested command buffers to the XLA GPU backend, aiming to enhance command buffer management and execution.
- URL: pull/29787
- Merged: No
2. while_loop_analysis supports module that has been parsed by command buffer rewriter (has nested call): This pull request enhances the while_loop_analysis to support modules that have been parsed by the command buffer rewriter, including those with nested calls, by introducing a new HloModule clone API, adding a call inliner, and performing various related fixes and build system cleanups.
- URL: pull/29854
- Merged: No
3. Communication Fusion via Nvshmem: allreduce softmax: This pull request introduces a communication fusion mechanism using Nvshmem to optimize the allreduce softmax operation, including the addition of an ar-softmax fusion pass, integration of ar in optimization, TTIR graph corrections, implementation of the nvshmemx API call, and setting up the nvshmem linker.
- URL: pull/30028
- Merged: No
Other Open Pull Requests
- bf16 Support and ROCm Device Optimization: This pull request adds support for bf16 starting from gfx11, fixes bugs, and optimizes the RocmComputeCapability in device_description.h. It also enables the ALG_DOT_BF16 operator on ROCm hardware with appropriate support.
- pull/29766
- Integration of rocprofiler-sdk and roctracer for GPU Profiling: This pull request integrates rocprofiler-sdk (v3) and roctracer (v1) into the XLA project to replace older profiling tools. It enables improved GPU event profiling on AMD GPUs with support for both time-based and step-based profiling, conditional compilation based on ROCm version, and includes new unit tests for ROCm version 6.3 and above.
- pull/29769
- ScopedClonedModuleCallInliner for Loop Analysis: This pull request introduces the ScopedClonedModuleCallInliner class in the call_inliner module, which clones a target module and performs inlining during initialization. This addresses limitations with the while_loop_analysis pass on modules parsed by the command buffer rewriter by enabling loop analysis on modules that cannot be modified directly.
- pull/29884
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 7
Key Closed Pull Requests
1. [XLA:CPU][oneDNN] Add build flag to enable asynchronous support in oneDNN: This pull request proposes adding a build flag to optionally enable asynchronous execution support in the oneDNN library for XLA on CPU, allowing users to compile oneDNN with this feature.
- URL: pull/28883
- Merged: No
2. [GPU] Bubble up mismatched buffer color from donation: This pull request aims to improve error handling in XLA by bubbling up buffer assignment check messages when users specify donation of an input buffer with a mismatched output memory space via out_shardings
, thereby preventing silent failures and providing clear feedback in cases where buffer donation is not possible.
- URL: pull/29270
- Merged: No
3. SPMD Dot Tests: This pull request proposes adding end-to-end tests for the Single Program Multiple Data (SPMD) partitioning of dot operations to ensure correctness and reliability, although it has not been merged.
- URL: pull/29511
- Merged: No
Other Closed Pull Requests
- Stream management improvements: These pull requests focus on enhancing stream handling within the XLA framework. One introduces a round-robin stream assignment algorithm for asynchronous collective operations on NVIDIA GPUs as a preparatory step for future pipeline integration, while another removes the Stream ID from the command buffer implementation due to the adoption of a DAG for dependency specification, making the Stream ID redundant.
- pull/28919, pull/29204
- HloModule cloning API enhancement: This pull request adds a new
CloneWithContext
API to theHloModule
, allowing users to retrieve the mapped instruction or computation within the cloned module. The change is primarily a refactor of existing code and does not introduce new tests. - pull/29852
- AUTHORS file update attempt: This pull request proposes adding NVIDIA Corporation to the AUTHORS file but was ultimately not merged.
- pull/29894
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
shawnwang18 | 18 | 6 | 0 | 2 |
othakkar | 5 | 2 | 0 | 5 |
mraunak | 11 | 0 | 0 | 0 |
Copilot | 0 | 0 | 0 | 11 |
Arech8 | 4 | 1 | 0 | 5 |
frgossen | 0 | 0 | 0 | 10 |
penpornk | 0 | 0 | 0 | 8 |
philipphack | 3 | 2 | 0 | 2 |
jaro-sevcik | 2 | 2 | 1 | 1 |
Zoey-Cheng | 5 | 1 | 0 | 0 |