Weekly GitHub Report for Xla: July 14, 2025 - July 21, 2025 (12:03:15)

                        July 21, 2025

            Weekly GitHub Report for Xla: July 14, 2025 - July 21, 2025 (12:03:15)

                    Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
To provide a summary, I would need the specific version release information, including the description and creation date. Please provide those details so I can assist you effectively.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

Missing CUDNN 9.10.2 for json hermetic, will CUDA 12.9.1 also be missing?: This issue highlights the absence of CUDNN 9.10.2 in the JSON hermetic files for building JAX from source, and raises concerns about the potential unavailability of CUDA 12.9.1, which is crucial for resolving numerous Windows compile errors. The user expresses frustration with the lack of support for a non-WSL Windows CUDA build of JAX/XLA, emphasizing the necessity due to better Amazon Windows API support compared to Linux.

The comments reveal that the mirror URLs for the newer versions of CUDNN and CUDA do not exist yet, indicating that the TensorFlow team has not mirrored these JSON files. A script is shared to test the availability of these files, but it encounters certificate revocation check errors, confirming the absence of the required files.
Number of comments this week: 2

Build error while trying to build algorithm.cc: This issue involves a build error encountered while attempting to compile the algorithm.cc file in the XLA project using Bazel, specifically related to a static assertion failure indicating that the result type must be constructible from the value type of the input range. The error appears to be linked to the use of STL algorithms that require a copy constructor, which is not available for the AllocationValue class, leading to the build failure.

The commenter suggests that the issue arises from STL algorithms using the copy constructor, causing the error. They resolved it by implementing a move constructor and a move assignment operator for the AllocationValue class and deleting the copy constructor to force the use of the move constructor. They also express surprise that others have not encountered this issue.
Number of comments this week: 1

GPU mocking hangs the compiler (in autotuning): This issue involves a problem with the GPU mocking feature in a GitHub project, where using GPU mocking to simulate multiple processes causes the autotuner sharding to hang when attempting to retrieve autotuning results from the mock processes. The provided JAX code snippet demonstrates the issue, which occurs during the compilation process when the autotuner is engaged.

The comment section provides a stack trace of the hang and suggests that the issue might be related to the use of a key-value store with GPU mocking. A potential solution is proposed, which involves avoiding the use of the key-value store in this context, and a related pull request is mentioned.
Number of comments this week: 1

[Proposal] Extract the Profiling Subsystem into a Dedicated OpenXLA Repository: This issue proposes extracting the XLA profiling subsystem into a dedicated repository within the OpenXLA organization to make it more accessible to other projects without requiring a dependency on the entire XLA monorepo. The proposal highlights the subsystem's modular and extensible design, which could benefit the broader machine learning ecosystem by providing a focused repository for high-performance C++ tracing infrastructure.

A commenter expressed interest in working on the proposal and requested guidance from an experienced member of the OpenXLA community.
Number of comments this week: 1

Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

New nvshmem rule breaks the build: This issue involves a build failure in the PyTorchXLA project due to a new nvshmem rule, which is causing errors related to the cuda_configure rule and the repository_ctx object lacking a getenv method. The problem appears to be linked to a recent pull request in the openxla repository, and the user is seeking guidance on whether updates are needed on their end or if a fix is required from the openxla side, particularly concerning the cuda_configure settings.
Failed to Parse MLIR generated by Torchax: This issue involves a problem with parsing MLIR generated by the Torchax export API when attempting to export a Torch model and compile it into an XLA binary. The user encounters errors related to an unregistered operation 'vhlo.rsqrt_v2' in the 'vhlo' dialect and a failure to deserialize a portable artifact using StableHLO_v1.9.5, despite using the specified versions of torch and torchxla and building the XLA repository from the same commit as torchxla 2.7.
upgrade protobuf dependency: This issue involves the need to upgrade the protobuf dependency in the project, as the current version is outdated. The outdated dependency is causing compatibility issues, specifically preventing the use of XLA with other libraries within the same Bazel WORKSPACE.
support bazel modules: This issue is about the request for the adoption of Bazel modules in the XLA package, as it is currently the only package in the user's Bazel build that lacks support for these modules. The user highlights the widespread adoption of Bazel modules and inquires if there are any plans to integrate them into the project.
Gpu collective performance model bug: This issue pertains to a bug in the gpu_collective_performance model file, where a recent change updated the lowLatencyBandwidth for AMD links but failed to make corresponding updates in the CUDA section. As a result, invoking the gpu_collective_performance model with H100 settings leads to a failure, indicating a discrepancy in the model's handling of different GPU configurations.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 7
Summarized Issues:

TFRT GPU Client Design Choice: The design choice in the TFRT GPU client implementation raises questions about its use of only one stream for handling GPU operations. This approach potentially misses the opportunity for parallel processing of memory copy and computation tasks.
issues/28859

XLA/JAX and Shader Binding Table: A developer seeks guidance on using XLA/JAX to generate modules in the shader binding table for ray tracing code. They question the feasibility of lowering kernels to PTX either ahead-of-time (AOT) or just-in-time (JIT) within the constraints of XLA, while using optixLaunch(...) to synchronize tensors.
issues/28893

Build Error in XLA Project: A build error occurs while compiling the algorithm.cc file in the XLA project using Bazel, related to a static assertion failure. The issue is potentially due to the use of STL algorithms that require a copy constructor, which was resolved by implementing a move constructor and move assignment operator for the AllocationValue class and deleting its copy constructor.
issues/28905

GPU Mocking and Autotuner Hang: Using GPU mocking to simulate multiple processes causes the autotuner in the JAX compiler to hang. This occurs when attempting to retrieve autotuning results from the mock processes, as demonstrated by a provided Python script during a matrix multiplication operation.
issues/28959

Missing JSON URLs for CUDNN and CUDA: The absence of JSON URLs for CUDNN 9.10.2 and CUDA 12.9.1 in the GitHub hermetic repository poses challenges for building jax from source on Windows. The TensorFlow team has not yet mirrored these versions on storage.googleapis.com, causing significant compilation challenges.
issues/28989

XLA Profiling Subsystem Extraction: There is a proposal to extract the XLA profiling subsystem into a dedicated OpenXLA repository. This would make its powerful, generic, and modular tracing components more accessible to the broader machine learning ecosystem without requiring dependency on the entire XLA framework.
issues/29007

MHLO to XLA HLO Conversion Failure: A failure occurs in converting the mhlo.dynamic_broadcast_in_dim operation from MHLO to XLA HLO using the mlir::ConvertMlirHloToHlo API. This is potentially due to missing or incorrect mlir::MlirToHloConversionOptions, resulting in an error during the execution of a JAX-based script that utilizes shape polymorphism and dynamic broadcasting.
issues/29030

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 0
Summarized Issues:
As of our latest update, there were no issues closed in the project this week.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 8
Key Open Pull Requests
1. [XLA:CPU][oneDNN] Add build flag to enable asynchronous support in oneDNN: This pull request introduces a build flag to optionally enable asynchronous execution support in oneDNN for the XLA:CPU backend, enhancing performance capabilities by allowing the use of an asynchronous version of oneDNN.

URL: pull/28883

Merged: No

Associated Commits: a30f8, 47c79

2. [Perf] Add expensive AllGather cost adjustment to default GPU Scheduler: This pull request proposes a performance enhancement to the default GPU Scheduler by adjusting the cost of AllGather operations to reflect their relative expense compared to AllReduce operations, as detailed in the commits and described in the pull request body.

URL: pull/28997

Merged: No

Associated Commits: f744b, a799a

3. [NVIDIA GPU] [XLA_GPU_MS_COLLECTIVE] Round-robin stream assignment for async communications: This pull request introduces a round-robin stream assignment algorithm for asynchronous collectives as part of the ongoing efforts to revive a previous pull request, with plans to integrate this algorithm into the pipeline following a refactoring of the stream assignment mechanism.

URL: pull/28919

Merged: No

Associated Commits: 84ccb

Other Open Pull Requests

HLO Representation Dumping: This pull request enables the dumping of optimized High-Level Optimizer (HLO) representations when deserializing. It makes these dumps accessible when utilizing the JAX compilation cache.
pull/28928

Reduce Scatter Subtraction Pattern: A subtraction pattern is introduced to the reduce scatter creator in the openxla/xla project. This pull request is currently not merged.
pull/28929

ROCm Device Description Improvements: Improvements are made to the ROCm device description by cleaning up the existing code. This pull request is open and not yet merged.
pull/28936

KV Store Null Setting for Mocked GPU Processes: This pull request addresses an issue by setting the KV store to null when GPU processes are mocked. It prevents hanging during operations like sharding autotuning by ensuring there are no other processes to communicate with.
pull/28962

SPMD Partitioning for Custom Calls: SPMD partitioning is introduced for custom calls handling block-scaled dot operations based on microscaling (MX) formats. This extends existing partitioning rules without affecting convolutions and dots not using MX types.
pull/29073

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 3
Key Closed Pull Requests
1. [XLA]Clamp num_workers to avoid partition overflow: This pull request addresses an issue in the XLA project where the number of worker threads in the parallel_for() function could exceed the number of tasks, potentially causing out-of-bounds access, by clamping the number of workers to ensure alignment between worker threads and task partitions, thereby preventing undefined behavior.

URL: pull/28877

Merged: No

Associated Commits: d72f4

2. Merging XLA extension BUILD file: This pull request involves the integration of an XLA extension BUILD file from the Elixir-NX repository into the OpenXLA project, as indicated by the reference to the original file's URL and the commit message, although it was ultimately not merged.

URL: pull/28980

Merged: No

Associated Commits: 04d78

3. [XLA:GPU] Add shared_memory_per_block_optin device info member: This pull request introduces a new device information member, shared_memory_per_block_optin, to the XLA:GPU project, enabling the querying of available shared memory in JAX to enhance the code generation process for custom kernels, although it was ultimately not merged.

URL: pull/28985

Merged: No

Associated Commits: 62d77

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 

Add subtraction pattern to reduce scatter creator
Toxicity Score: 0.55 (Defensive responses, Lack of resolution, Perceived dismissiveness)
This GitHub conversation involves username1 and username2, where username1 initially provides a solution that username2 critiques for not addressing the issue effectively. Username1 responds with a defensive tone, expressing frustration over the feedback. The conversation continues with username2 maintaining a critical stance, which seems to trigger further tension as username1 perceives the comments as dismissive. The interaction is marked by a lack of resolution and increasing defensiveness from both parties.

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

Google-ML-Automation
57
0
0
0

mraunak
27
2
0
0

akuegel
12
0
0
0

terryysun
5
4
0
1

beckerhe
6
0
0
4

othakkar
4
2
0
4

WillFroom
9
0
0
0

allanrenucci
6
0
0
3

bchetioui
8
0
0
1

alekstheod
7
1
0
0

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Comments
Google-ML-Automation	57	0	0
mraunak	27	2	0
akuegel	12	0	0
terryysun	5	4	1
beckerhe	6	0	4
othakkar	4	2	4
WillFroom	9	0	0
allanrenucci	6	0	3
bchetioui	8	0	1
alekstheod	7	1	0