Weekly GitHub Report for Xla: June 23, 2025 - June 30, 2025 (23:00:15)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
To provide a summary, I would need the specific version release information, including the description and creation date. Please provide those details so I can assist you effectively.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
Profiling lots of events causes integer overflow on 64 bit systems: This issue involves an integer overflow problem on 64-bit systems when profiling a large number of events, which results in corrupted gzipped JSON trace data files. The overflow occurs due to a problematic cast in the code, specifically when the data size exceeds 2GB, affecting the
xyz.trace.json.gz
file generated by thejax.profiler.stop_trace()
function.- A comment suggests a potential fix for the issue by linking to a specific commit, indicating that a solution might have been implemented or proposed.
- Number of comments this week: 1
-
How can cross-architecture operator libraries be applied in JAX, such as cuBLAS?: This issue is about integrating a custom cross-architecture operator library, similar to cuBLAS, into JAX, with a focus on understanding the configurable paths for JAX to call cuBLAS and whether PJRT or XLA CustomCall should be used. The user is seeking clarification on the recommended extension path and examples of PJRT plugin operator replacements to confirm the officially recommended access path.
- The comment section includes a user expressing interest in understanding how to call the cuBLAS API within XLA, indicating a need for further clarification on the integration process.
- Number of comments this week: 1
Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
As of our latest update, there are no stale issues for the project this week.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 4
Summarized Issues:
- Integer Overflow on 64-bit Systems: This issue involves an integer overflow when profiling a large number of events on 64-bit systems. The overflow leads to corrupted gzipped JSON trace files due to a problematic cast in the code, especially when the data size exceeds 2GB.
- XLA Targets Failure on ARM Architectures: Certain XLA targets fail on ARM architectures due to incorrect passing of nvcc-only options to clang. This problem arose after recent hermetic changes, requiring specific build configurations to resolve the errors.
- Integration of Custom Cross-Architecture Operator Library: There is a need to integrate a custom cross-architecture operator library, similar to cuBLAS, into JAX. The issue seeks clarification on whether the integration should be through PJRT or if XLA CustomCall can be used, and inquires about recommended extension paths and examples for PJRT plugin operator replacements.
- Availability of Fuzzer Tools for HLO Texture IR: A query has been raised about the availability of fuzzer tools that can generate HLO texture IR. This is discussed in the GitHub project linked in the issue.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 1
Summarized Issues:
- Inference Latency Comparison: The issue discusses the relative inference latency of PjRt versus Local Client for CPU and GPU. It questions the current validity of an older discussion that suggested PjRt is faster for CPU inference while Local Client is faster for GPU inference. A comment clarifies that PjRt is now considered the most stable and recommended API, without addressing GPU performance.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 7
Key Open Pull Requests
1. [XLA:GPU] Update ONEAPI crosstool compiler wrapper: This pull request updates the Crosstool wrapper compiler template in the XLA:GPU project to invoke the ICPX (oneAPI C++ Compiler) for device-side code and use the host compiler (such as Clang or GCC) for host-side code, while configuring the wrapper to efficiently delegate between device and host compilers and adding necessary compilation and linking flags for proper integration with DPC++.
- URL: pull/28257
- Merged: No
2. [ROCm] Enable mx data type for ROCm: This pull request introduces support for MX datatypes on the ROCm platform by utilizing hipBLASLt for implementation, including updates to the BlockScalingRewriter and CublasLtMatmulThunk to facilitate the new custom call, while noting that the GemmAlgorithmPicker is not yet enabled for these datatypes.
- URL: pull/28173
- Merged: No
- Associated Commits: 63469
3. Extend WhileLoopAllReduceCodeMotion
pass with a new pattern (DUS): This pull request extends the WhileLoopAllReduceCodeMotion
pass by introducing a new pattern that optimizes the Llama3 model (fp8) by matching all-reduces scattered into the loop output using the "dynamic-update-slice" operation with the loop induction variable as the index parameter, addressing performance slowdowns when many devices participate.
- URL: pull/28184
- Merged: No
- Associated Commits: 13ae3
Other Open Pull Requests
- NVSHMEM allreduce workaround: This pull request introduces a workaround for performing out-of-place allreduce operations using NVSHMEM for NVIDIA GPUs. It addresses the limitation of NVSHMEM's default algorithm requiring separate input and output buffers, ahead of planned changes in version 3.3.
- Memory space enforcement relaxation for tuple shapes: The pull request proposes to relax memory space enforcement for tuple shapes in the GPU context. This change prevents potential crashes by excusing tuple shapes from the memory space check, as they do not require layout.
- Allreduce kernel registration for ROCm: This pull request addresses the addition of a missing allreduce kernel registration for ROCm. It resolves failures in the AllReduceTest suite within the collective_ops_e2e_test and is currently awaiting review.
- Transition from AsyncStreamKind to stream id: The pull request aims to transition the codebase from using AsyncStreamKind to stream id for GPU collectives. This is part of a broader effort to deprecate AsyncStreamKind, allowing both to coexist temporarily due to extensive downstream dependencies.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 5
Key Closed Pull Requests
1. Sync branch 6 24 2025: This pull request, titled "Sync branch 6 24 2025," involves multiple updates and improvements to the XLA project, including documentation updates for Docker images, enhancements to GPU and ROCm support, the addition of new methods and dependencies, and various code refactoring and optimizations, but it was ultimately not merged.
- URL: pull/28177
- Merged: No
- Associated Commits: ecf69, 89ba5, 4ad91, 86b90, 1ef70, 7444f, 70ea1, 681df, ab2a8, 10a74, d6704, 23108, 89d05, b91a9, ab568, a7ce4, a755d, 6b54b, 924ab, 62191, fda2b, 3c938, 5493d, 25d9a, 915d5, 241ab, d6fee, 633a9, a6376, 5420a, 03723, 380d4, e7ff3, 3408a, 849c1, 3cec5, f782c, 96904, 3735a, 3727a, 73d46, 0016f, 232ce, cdf0a, 753e8, d36a0, 08ccf, 3289a, 2b816, 80d4a, 51e6f, e097b, 7b193, 91b29, 24406, 6de30, 7b6aa, 468aa, 42ac7, f23df, ad937, c23af, 805e8, ca6ea, 58b2a, 9048b, 842db, c8272, e23cc, dbc4b, 72eb3, 42a06, feff4, da8a2, 05f3a, b347e, 73cd0, 7943b, da6ba, 56410, b62e6, 41163, a7a81, c2b39, 1dd1c, 24236, 8c212, ba2ad, 87c2d, fe43e, c6294, 4ccec, 8cb80, 29640, 0f5d7, b30a7, 46cff, 9e8a3, 1e689, c601b, e3821, c674f, 2fa60, 9904e, 1f038, 9f046, cb390, c7c61, e8ec0, a37a0, e8b30, 889bb, d537a, 1def8, 08c8d, 952e1, ee3f5, f8dad, 01369, 13aa1, f16bb, a7e1a, 1fc58, 24a71, 37c91, f2969, 8d71d, 18de4, bee28, 57517, 420d6, dc11a, 98631, f9e8d, 8a6e5, b91d4, 5701b, 4d8fc, 54b1b, e03f3, 7676e, 8131e, 3a102, 27c76, 762ad, 175b6, 28d02, baaed, d4273, e5dea, b1198, da4bd, 3566a, f567c, 50bff, 18ffd, 185f7, 9b6ce, 4f583, 7b9d3, 65fdf, 853b9, e278c, 8d3ee, eb774, 7a1c2, 80c36, 635f3, 04130, 255d6, 65069, 390a3, e71c2, b0a16, 9623a, 88cbd, 03f74, f0787, 6dc13, 7c16d, f6351, 17f42, 1ff12, 0afdf, 73058, b17b1, 0628c, f0859, fd06a, 40292, 09fce, d27ba, 077d4, dd90b, 53a03, 45cad, 092c2, 88d52, 2012b, ec7fc, 797d3, 951b7, a165e, 38346, 69d9c, c0763, f9c50, bd907, 32187, 86ebe, b5636, cb142, 8188c, a6613, 537ab, b34c1, 4f352, 16bed, 43a5e, 9f8cc, 95bc1, 82354, c5743, 6bac9, fb728, 3eb0b, 53014, 8e52f, 18290, 26c26, 967ce, 91607, 4b883, fab16, e89be, c396a, f8b3e, 407e4, c5966, 699d2, 830e7, a4fe4, bce92, ce393, 7da51, 35691, 4632f, d2d9e, 60502, 8a46e
2. Fixed ppc64le onednn build issue: This pull request addresses a build issue specific to the ppc64le architecture for the oneDNN library in the openxla/xla project, but it was ultimately not merged into the main codebase.
- URL: pull/28108
- Merged: No
- Associated Commits: 31a1d
3. [ROCm] Introduce rocm6.4.1 hermetic dependency: This pull request aims to introduce a hermetic dependency on ROCm version 6.4.1 to the XLA project, as indicated by the title and commit message, although it was not merged.
- URL: pull/28161
- Merged: No
- Associated Commits: 9bbbd
Other Closed Pull Requests
- Memory alignment for NVIDIA GPUs: This topic covers the introduction of intermediate copies for collective memory operations on NVIDIA GPUs. The pull request ensures compatibility with allocators like ncclMemAlloc or nvshmemAlloc to prevent runtime errors related to invalid permissions or memory issues.
- ROCm 7 environment compatibility: This topic involves addressing an issue with the ROCm 7 environment by removing support for outdated ROCm versions. The pull request ensures compatibility with the correct rocblas library version, although it was ultimately not merged.
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
Google-ML-Automation | 57 | 0 | 0 | 0 |
mraunak | 33 | 3 | 0 | 0 |
alekstheod | 16 | 4 | 0 | 1 |
amd-songpiao | 15 | 4 | 0 | 0 |
akuegel | 12 | 0 | 0 | 0 |
WillFroom | 9 | 0 | 0 | 0 |
beckerhe | 6 | 0 | 0 | 2 |
loislo | 8 | 0 | 0 | 0 |
bchetioui | 8 | 0 | 0 | 0 |
yliu120 | 6 | 1 | 0 | 0 |