Weekly GitHub Report for Xla: July 07, 2025 - July 14, 2025 (12:01:03)
Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
To provide a summary, I would need the specific version release information, including the description and creation date. Please provide those details so I can assist you effectively.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
Poor cuDNN kernel selection: This issue discusses a problem with the selection of cuDNN kernels in the context of XLA, where the user suspects that the kernel selection might not be optimal for their specific hardware configuration. The user provides detailed information about the HLO module and the cuDNN version they are using, and they are seeking insights into whether the issue might be related to how XLA requests kernels from cuDNN or if there is a misrecognition of their GPU's capabilities.
- The comments explore the possibility of a bug in XLA's requests to cuDNN, with users sharing HLO details and cuDNN versions. One user reports that cuDNN selects an SM100 kernel for bf16 on their setup, while another user notes that all kernels during autotuning are sm80, questioning if the GPU is misrecognized. The discussion includes traces and comparisons with PyTorch, which selects a faster bf16 kernel, raising questions about cuDNN's kernel selection logic.
- Number of comments this week: 5
-
xla/service/gpu/kernels/cutlass_gemm_kernel_bf16xbf16_to_bf16.cu.cc trouble compiling cutlass: This issue involves a compilation error encountered when building JAX with CUDA on Windows, specifically related to the file
cutlass_gemm_kernel_bf16xbf16_to_bf16.cu.cc
. The user is seeking advice on whether a specific NVCC flag is needed to resolve an error inuint128.h
, which is hindering the porting of JAX CUDA to Windows.- The comments discuss a potential solution by switching the compiler from
clang-cl.exe
tocl.exe
, which reduced the number of errors significantly. Another comment suggests that the issue might be due to an incompatibility between Cutlass and MSVC, and recommends trying CUDA 12.9, which includes a workaround for thedim3
structure that might resolve the issue. - Number of comments this week: 2
- The comments discuss a potential solution by switching the compiler from
Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- New
nvshmem
rule breaks the build: This issue pertains to a build failure encountered by the PyTorchXLA team while attempting to update their xla pin, which is linked to a newnvshmem
rule that seems to be causing errors in thecuda_configure
script. The error message indicates a problem with therepository_ctx
object, specifically the absence of agetenv
method, and the team is seeking guidance on whether they need to make updates on their end or if the issue requires a fix from the open_xla project, particularly in thecuda_configure
settings. Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 5
Summarized Issues:
- CUDA Toolkit Path Issues: The problem arises when building TensorFlow with a custom CUDA installation, where the error "Can't find libdevice directory" occurs due to the TF_CUDA_TOOLKIT_PATH being unset. This may be caused by a bug in the Bazel file that fails to pass the determined path, leading to build failures.
- Kernel Selection in XLA: In the XLA framework, there is an issue with the selection of cuDNN kernels, where the system defaults to sm80 kernels instead of potentially faster sm100 kernels for bf16 operations. This could be due to a misrecognition of the GPU type or limitations in kernel availability for certain GPU models.
- JAX Compilation on Windows: A compilation error is encountered when building the JAX library with CUDA on Windows, specifically related to the
cutlass_gemm_kernel_bf16xbf16_to_bf16.cu.cc
file. The issue may stem from an incompatibility between Cutlass and MSVC, with a potential workaround available in CUDA 12.9.
- Cross-Compiling XLA for ARM64: Users face difficulties when cross-compiling XLA from x86 to ARM64 using a custom GCC compiler, as the Bazel build system defaults to x86 binaries. This occurs despite using the
--config=cross_compile_linux_arm64
configuration, hindering the cross-compilation process.
- Data Transfer from PJRT Buffer: There is a need for a
byte_strides
field when transferring data from a PJRT Buffer back to the host, as the current method using thehost_layout
field is not universally supported. This is particularly problematic for data in column-major format, affecting compatibility with various plugins.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 1
Summarized Issues:
- Compilation Issues with PJRT Plugin for ROCm: Users encountered problems compiling a PJRT plugin for ROCm due to an ambiguous overloaded function call error. This issue was introduced by a specific commit and was resolved by adding an explicit type annotation, highlighting a potential compatibility issue with the GCC version being used.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 11
Key Open Pull Requests
1. Add Nvidia benchmarks: This pull request aims to introduce Nvidia benchmarks to the project, as evidenced by multiple commits that include adding the benchmarks, updating benchmark numbers, fixing XLA flags format, and updating configuration names, although it has not yet been merged.
- URL: pull/28728
- Merged: No
2. Moved InstructionVerifier class declaration from implementation file to header file: This pull request involves refactoring by moving the InstructionVerifier
class declaration from the implementation file to a header file, which enhances code organization, facilitates potential reuse across multiple compilation units, and allows external users, such as the team, to utilize the InstructionVerifier
class.
- URL: pull/28685
- Merged: No
3. [XLA:CPU][oneDNN] Implement oneDNN primitives for custom calls in Thunk runtime: This pull request introduces the implementation of oneDNN primitives intended for use with custom calls in the Thunk runtime, serving as a foundational stub for future integration of oneDNN-backed custom calls, with the core logic for various oneDNN operations being established to facilitate easier review and integration in subsequent updates.
- URL: pull/28615
- Merged: No
Other Open Pull Requests
- Fabric Info Test Compatibility: This pull request modifies the fabric info test to ensure compatibility with lower CUDA driver versions. It removes the assumption of a minimum version of 550 and eliminates the requirement for empty fabric info on Hopper.
- XLA:GPU Enhancements: Several pull requests focus on enhancing the XLA:GPU backend. One lowers the dynamic update slice operation into a command buffer when dependent on loop iteration, while another introduces a new
sycl_kernel
component with its corresponding test.
- ROCm Framework Configuration: This pull request involves relocating the multigpu settings to a specific bazelrc file within the ROCm framework. The change is indicated by the commit message and the associated GitHub link.
- Operation Fusion in Models: This pull request enables the fusion of matmul, bias addition, and addition operations in certain models. It adjusts the binary operands' shapes to ensure matching element counts and includes a test to verify this functionality.
- CUDA Graph Concurrent Mode: This pull request aims to enable the CUDA graph concurrent mode by default in the XLA:GPU project. The change is indicated by the commit message and the title of the pull request.
- NVTX and CUDA Updates: Two pull requests focus on updating NVTX and CUDA components. One upgrades NVTX to version 3.2.1 and annotates outputs to prevent false positives, while the other updates the
cuda_redist_versions.bzl
file to include new versions of CUDA, cuDNN, and NVSHMEM.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 2
Key Closed Pull Requests
1. Add nvidia benchmarks: This pull request aimed to introduce NVIDIA benchmarks to the project, as indicated by the commit messages, but it was ultimately closed without being merged.
- URL: pull/28715
- Merged: No
2. Test pipeline changes: This pull request, titled "Test pipeline changes," involves the creation of a test file as part of a proposed modification to the project's testing pipeline, but it was ultimately not merged into the main codebase.
- URL: pull/28755
- Merged: No
- Associated Commits: 62a80
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
Google-ML-Automation | 57 | 0 | 0 | 0 |
mraunak | 33 | 3 | 0 | 0 |
alekstheod | 15 | 3 | 0 | 0 |
akuegel | 12 | 0 | 0 | 0 |
beckerhe | 6 | 0 | 0 | 4 |
WillFroom | 9 | 0 | 0 | 0 |
allanrenucci | 6 | 0 | 0 | 3 |
amd-songpiao | 6 | 2 | 0 | 0 |
terryysun | 4 | 3 | 0 | 1 |
loislo | 8 | 0 | 0 | 0 |