Weekly GitHub Report for Xla: August 18, 2025 - August 25, 2025 (12:01:17)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
The new (?)
PJRT_ProcessInfo
struct inpjrt_c_api.h
is missingsize_t struct_size;
field: This issue points out that the newly introducedPJRT_ProcessInfo
struct in the PJRT C API lacks thesize_t struct_size;
field, which is present in other structs to presumably manage compatibility. The user suspects this omission was unintentional and seeks confirmation or correction regarding this inconsistency.- A single comment tags the author of the commit that introduced the struct, implicitly requesting their input or clarification on the missing field.
- Number of comments this week: 1
Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- New
nvshmem
rule breaks the build: This issue reports a build failure caused by a newnvshmem
rule introduced in a recent pull request, which leads to an error related to therepository_ctx
object lacking agetenv
method during the CUDA configuration step. The reporter is seeking guidance on whether they need to update their side to resolve this error, particularly in relation to changes mentioned for JAX, or if the fix must come from the open_xla project, along with an estimated timeline for addressing the problem. - Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user reports that this error occurs during deserialization of the StableHLO artifact and provides detailed reproduction steps, including code snippets and environment details, seeking assistance to resolve the incompatibility.
- support bazel modules: This issue requests the adoption of Bazel modules within the project, highlighting that Bazel modules have seen significant usage and benefits. It specifically points out that XLA is currently the only package in the user's Bazel build that lacks support for these modules, suggesting a need for compatibility improvements.
- Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update to lowLatencyBandwidth for AMD links was not applied to the CUDA section, causing failures when the model is called with H100 settings. As a result, the performance model does not correctly handle CUDA configurations, leading to errors during execution.
- Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user reports that despite using the
--config=cross_compile_linux_arm64
flag in the Bazel build system, the build process continues to produce an x86 binary, indicating a possible misconfiguration or missing step in the cross-compilation setup.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 2
Summarized Issues:
- PJRT C API Struct Compatibility: The newly introduced
PJRT_ProcessInfo
struct inpjrt_c_api.h
is missing thesize_t struct_size;
field, which is present in other PJRT C API structs to ensure compatibility control. This omission is suspected to be unintentional and may affect the consistency of the API. - issues/30439
- Compiler Optimization Concerns: There is a question about whether a straightforward compiler optimization has been overlooked in the project, as discussed in the linked JAX GitHub repository. This suggests potential missed opportunities for improving compiler efficiency.
- issues/30552
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 1
Summarized Issues:
- Advanced Profiler Options for AWS Trainium: This issue discusses the need to add support for advanced profiler options tailored to AWS Trainium hardware in JAX. It also seeks clarification on the validation rules for the key-value pairs that can be passed as these advanced options, highlighting the need for clear guidelines.
- issues/30465
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 8
Key Open Pull Requests
1. [XLA:CPU][oneDNN] Add Base Implementation of oneDNN Thunk via Custom Call FFI: This pull request introduces the foundational implementation of the OneDnnThunk
in XLA:CPU, enabling execution of oneDNN-based operations via a typed FFI custom call interface, including new source files, resource management extensions, unit tests, and updated build rules.
- URL: pull/30562
- Merged: No
2. [XLA:GPU] Refactor dynamic slice fusion lowering code to reduce the API calls: This pull request aims to refactor the dynamic slice fusion lowering code in the XLA GPU backend to reduce redundant API calls, specifically minimizing the number of calls to GetLoopInductionVarTupleIdx
.
- URL: pull/30397
- Merged: No
3. [XLA:GPU][oneAPI] Add sycl_timer component and test: This pull request introduces the sycl_timer component along with its corresponding test to the XLA GPU oneAPI codebase, enhancing timing support for SYCL operations.
- URL: pull/30555
- Merged: No
Other Open Pull Requests
- Documentation on HLO Dumps and Debugging: This pull request introduces a new documentation page that explains how to obtain HLO Dumps in various formats across different environments. It also adds instructions on filtering, transforming, and replaying these dumps, along with a new "Debugging" section in the sidebar to organize this and future debugging guides.
- pull/30414
- GPU Backend Scheduling and Naming Updates: These pull requests propose moving the command buffer LHS scheduling to the execution graph within the XLA GPU backend and rename the SM number for Thor from sm_101 to sm_110 to align with CUDA 13 changes. The renaming reflects that sm_101 will be unused and support for Thor will start only on CUDA 13 onward.
- pull/30472, pull/30514
- Test Suite Filtering for Multi-GPU: This pull request filters out unit tests tagged with
multi_gpu
from the single-GPU test suite to ensure these tests run exclusively under the multi-GPU test script located atbuild_tools/rocm/run_xla_multi_gpu.sh
. This separation improves test accuracy and environment specificity. - pull/30544
- SYCL Support and Error Handling Enhancements: This pull request adds support for the sycl_event component with tests, updates the sycl_library wrapper to include SYCL-specific build and linking options, and implements error handling by catching SYCL synchronous exceptions and returning an error status. These changes facilitate easier maintenance and addition of SYCL targets.
- pull/30507
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 8
Key Closed Pull Requests
1. [XLA:GPU]Improve Flag Handling while Linking for oneAPI: This pull request aims to resolve a linking failure caused by command-line flag overflow during the linking stage by improving the handling of whole-archive object files with specific linker flags and adding support for a VERBOSE=1 environment variable to aid in debugging and verifying the compiler invocation.
- URL: pull/30072
- Merged: No
2. [GPU] Tweak NVML library loading error message wording: This pull request proposes changing the NVML library loading error message wording by removing the term "Error" and downgrading it to a warning to reduce confusion during debugging in non-MNNVL clusters that do not require certain NVML libraries.
- URL: pull/30239
- Merged: No
3. [XLA:GPU] Update Bazel command to run presubmit for oneAPI: This pull request proposes changing the Bazel command from "bazel test" to "bazel build" for the XLA Linux x86 GPU oneAPI presubmit job to address failures caused by the lack of test targets and the requirement of Intel hardware for running tests.
- URL: pull/30342
- Merged: No
Other Closed Pull Requests
- CUDA 13 API updates: Multiple pull requests address compatibility with CUDA 13 by updating XLA to use new or modified CUDA API functions. These changes include switching to cuGraphAddNode_v2 to maintain consistent signatures and adding cuEventElapsedTime_v2 to support building JAX with CUDA 13.
- pull/30179, pull/30427
- GPU initialization and runtime fixes: Pull requests focus on improving GPU runtime behavior by fixing peer-to-peer access bugs in the ROCm executor and moving cuDNN handle initialization to an earlier phase to ensure proper setup across multiple GPUs. These changes enhance stability and correctness in multi-GPU environments.
- pull/30276, pull/30491
- Code refactoring: One pull request proposes refactoring the DynamicSliceFusion lowering code in the XLA GPU backend, but this change was not merged.
- pull/30396
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
shawnwang18 | 34 | 6 | 0 | 5 |
mraunak | 12 | 2 | 0 | 1 |
othakkar | 8 | 2 | 0 | 2 |
Arech8 | 4 | 1 | 0 | 5 |
Copilot | 0 | 0 | 0 | 9 |
SandSnip3r | 0 | 0 | 0 | 9 |
dimvar | 4 | 3 | 0 | 0 |
pemeliya | 6 | 1 | 0 | 0 |
Zoey-Cheng | 5 | 1 | 0 | 0 |
beckerhe | 2 | 0 | 0 | 4 |