Weekly GitHub Report for Xla: July 07, 2025 - July 14, 2025 (12:01:03)

            Weekly GitHub Report for Xla: July 07, 2025 - July 14, 2025 (12:01:03)

            Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
To provide a summary, I would need the specific version release information, including the description and creation date. Please provide those details so I can assist you effectively.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

Poor cuDNN kernel selection: This issue discusses a problem with the selection of cuDNN kernels in the context of XLA, where the user suspects that the kernel selection might not be optimal for their specific hardware configuration. The user provides detailed information about the HLO module and the cuDNN version they are using, and they are seeking insights into whether the issue might be related to how XLA requests kernels from cuDNN or if there is a misrecognition of their GPU's capabilities.

The comments explore the possibility of a bug in XLA's requests to cuDNN, with users sharing HLO details and cuDNN versions. One user reports that cuDNN selects an SM100 kernel for bf16 on their setup, while another user notes that all kernels during autotuning are sm80, questioning if the GPU is misrecognized. The discussion includes traces and comparisons with PyTorch, which selects a faster bf16 kernel, raising questions about cuDNN's kernel selection logic.
Number of comments this week: 5

xla/service/gpu/kernels/cutlass_gemm_kernel_bf16xbf16_to_bf16.cu.cc trouble compiling cutlass: This issue involves a compilation error encountered when building JAX with CUDA on Windows, specifically related to the file cutlass_gemm_kernel_bf16xbf16_to_bf16.cu.cc. The user is seeking advice on whether a specific NVCC flag is needed to resolve an error in uint128.h, which is hindering the porting of JAX CUDA to Windows.

The comments discuss a potential solution by switching the compiler from clang-cl.exe to cl.exe, which reduced the number of errors significantly. Another comment suggests that the issue might be due to an incompatibility between Cutlass and MSVC, and recommends trying CUDA 12.9, which includes a workaround for the dim3 structure that might resolve the issue.
Number of comments this week: 2

Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

New nvshmem rule breaks the build: This issue pertains to a build failure encountered by the PyTorchXLA team while attempting to update their xla pin, which is linked to a new nvshmem rule that seems to be causing errors in the cuda_configure script. The error message indicates a problem with the repository_ctx object, specifically the absence of a getenv method, and the team is seeking guidance on whether they need to make updates on their end or if the issue requires a fix from the open_xla project, particularly in the cuda_configure settings.
Since there were fewer than 5 open issues, all of the open issues have been listed above.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 5
Summarized Issues:

CUDA Toolkit Path Issues: The problem arises when building TensorFlow with a custom CUDA installation, where the error "Can't find libdevice directory" occurs due to the TF_CUDA_TOOLKIT_PATH being unset. This may be caused by a bug in the Bazel file that fails to pass the determined path, leading to build failures.
issues/28590

Kernel Selection in XLA: In the XLA framework, there is an issue with the selection of cuDNN kernels, where the system defaults to sm80 kernels instead of potentially faster sm100 kernels for bf16 operations. This could be due to a misrecognition of the GPU type or limitations in kernel availability for certain GPU models.
issues/28665

JAX Compilation on Windows: A compilation error is encountered when building the JAX library with CUDA on Windows, specifically related to the cutlass_gemm_kernel_bf16xbf16_to_bf16.cu.cc file. The issue may stem from an incompatibility between Cutlass and MSVC, with a potential workaround available in CUDA 12.9.
issues/28669

Cross-Compiling XLA for ARM64: Users face difficulties when cross-compiling XLA from x86 to ARM64 using a custom GCC compiler, as the Bazel build system defaults to x86 binaries. This occurs despite using the --config=cross_compile_linux_arm64 configuration, hindering the cross-compilation process.
issues/28807

Data Transfer from PJRT Buffer: There is a need for a byte_strides field when transferring data from a PJRT Buffer back to the host, as the current method using the host_layout field is not universally supported. This is particularly problematic for data in column-major format, affecting compatibility with various plugins.
issues/28833

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 1
Summarized Issues:

Compilation Issues with PJRT Plugin for ROCm: Users encountered problems compiling a PJRT plugin for ROCm due to an ambiguous overloaded function call error. This issue was introduced by a specific commit and was resolved by adding an explicit type annotation, highlighting a potential compatibility issue with the GCC version being used.
openxla/xla/issues/28643

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 11
Key Open Pull Requests
1. Add Nvidia benchmarks: This pull request aims to introduce Nvidia benchmarks to the project, as evidenced by multiple commits that include adding the benchmarks, updating benchmark numbers, fixing XLA flags format, and updating configuration names, although it has not yet been merged.

URL: pull/28728

Merged: No

Associated Commits: c2699, e9c69, b079c, 48356, 535ae

2. Moved InstructionVerifier class declaration from implementation file to header file: This pull request involves refactoring by moving the InstructionVerifier class declaration from the implementation file to a header file, which enhances code organization, facilitates potential reuse across multiple compilation units, and allows external users, such as the team, to utilize the InstructionVerifier class.

URL: pull/28685

Merged: No

Associated Commits: 29ea6, 88fa3, 4b7fe

3. [XLA:CPU][oneDNN] Implement oneDNN primitives for custom calls in Thunk runtime: This pull request introduces the implementation of oneDNN primitives intended for use with custom calls in the Thunk runtime, serving as a foundational stub for future integration of oneDNN-backed custom calls, with the core logic for various oneDNN operations being established to facilitate easier review and integration in subsequent updates.

URL: pull/28615

Merged: No

Associated Commits: e8515, cb6ef

Other Open Pull Requests

Fabric Info Test Compatibility: This pull request modifies the fabric info test to ensure compatibility with lower CUDA driver versions. It removes the assumption of a minimum version of 550 and eliminates the requirement for empty fabric info on Hopper.
pull/28716

XLA:GPU Enhancements: Several pull requests focus on enhancing the XLA:GPU backend. One lowers the dynamic update slice operation into a command buffer when dependent on loop iteration, while another introduces a new sycl_kernel component with its corresponding test.
pull/28740, pull/28762

ROCm Framework Configuration: This pull request involves relocating the multigpu settings to a specific bazelrc file within the ROCm framework. The change is indicated by the commit message and the associated GitHub link.
pull/28629

Operation Fusion in Models: This pull request enables the fusion of matmul, bias addition, and addition operations in certain models. It adjusts the binary operands' shapes to ensure matching element counts and includes a test to verify this functionality.
pull/28671

CUDA Graph Concurrent Mode: This pull request aims to enable the CUDA graph concurrent mode by default in the XLA:GPU project. The change is indicated by the commit message and the title of the pull request.
pull/28735

NVTX and CUDA Updates: Two pull requests focus on updating NVTX and CUDA components. One upgrades NVTX to version 3.2.1 and annotates outputs to prevent false positives, while the other updates the cuda_redist_versions.bzl file to include new versions of CUDA, cuDNN, and NVSHMEM.
pull/28782, pull/28837

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 2
Key Closed Pull Requests
1. Add nvidia benchmarks: This pull request aimed to introduce NVIDIA benchmarks to the project, as indicated by the commit messages, but it was ultimately closed without being merged.

URL: pull/28715

Merged: No

Associated Commits: 3e256, a3d56

2. Test pipeline changes: This pull request, titled "Test pipeline changes," involves the creation of a test file as part of a proposed modification to the project's testing pipeline, but it was ultimately not merged into the main codebase.

URL: pull/28755

Merged: No

Associated Commits: 62a80

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

Google-ML-Automation
57
0
0
0

mraunak
33
3
0
0

alekstheod
15
3
0
0

akuegel
12
0
0
0

beckerhe
6
0
0
4

WillFroom
9
0
0
0

allanrenucci
6
0
0
3

amd-songpiao
6
2
0
0

terryysun
4
3
0
1

loislo
8
0
0
0

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Comments
Google-ML-Automation	57	0	0
mraunak	33	3	0
alekstheod	15	3	0
akuegel	12	0	0
beckerhe	6	0	4
WillFroom	9	0	0
allanrenucci	6	0	3
amd-songpiao	6	2	0
terryysun	4	3	1
loislo	8	0	0