Weekly Project News

Subscribe
Archives
July 21, 2025

Weekly GitHub Report for Xla: July 14, 2025 - July 21, 2025 (12:03:15)

Weekly GitHub Report for Xla

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

No recent version releases were found.

1.2 Version Information:

To provide a summary, I would need the specific version release information, including the description and creation date. Please provide those details so I can assist you effectively.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. Missing CUDNN 9.10.2 for json hermetic, will CUDA 12.9.1 also be missing?: This issue highlights the absence of CUDNN 9.10.2 in the JSON hermetic files for building JAX from source, and raises concerns about the potential unavailability of CUDA 12.9.1, which is crucial for resolving numerous Windows compile errors. The user expresses frustration with the lack of support for a non-WSL Windows CUDA build of JAX/XLA, emphasizing the necessity due to better Amazon Windows API support compared to Linux.

    • The comments reveal that the mirror URLs for the newer versions of CUDNN and CUDA do not exist yet, indicating that the TensorFlow team has not mirrored these JSON files. A script is shared to test the availability of these files, but it encounters certificate revocation check errors, confirming the absence of the required files.
    • Number of comments this week: 2
  2. Build error while trying to build algorithm.cc: This issue involves a build error encountered while attempting to compile the algorithm.cc file in the XLA project using Bazel, specifically related to a static assertion failure indicating that the result type must be constructible from the value type of the input range. The error appears to be linked to the use of STL algorithms that require a copy constructor, which is not available for the AllocationValue class, leading to the build failure.

    • The commenter suggests that the issue arises from STL algorithms using the copy constructor, causing the error. They resolved it by implementing a move constructor and a move assignment operator for the AllocationValue class and deleting the copy constructor to force the use of the move constructor. They also express surprise that others have not encountered this issue.
    • Number of comments this week: 1
  3. GPU mocking hangs the compiler (in autotuning): This issue involves a problem with the GPU mocking feature in a GitHub project, where using GPU mocking to simulate multiple processes causes the autotuner sharding to hang when attempting to retrieve autotuning results from the mock processes. The provided JAX code snippet demonstrates the issue, which occurs during the compilation process when the autotuner is engaged.

    • The comment section provides a stack trace of the hang and suggests that the issue might be related to the use of a key-value store with GPU mocking. A potential solution is proposed, which involves avoiding the use of the key-value store in this context, and a related pull request is mentioned.
    • Number of comments this week: 1
  4. [Proposal] Extract the Profiling Subsystem into a Dedicated OpenXLA Repository: This issue proposes extracting the XLA profiling subsystem into a dedicated repository within the OpenXLA organization to make it more accessible to other projects without requiring a dependency on the entire XLA monorepo. The proposal highlights the subsystem's modular and extensible design, which could benefit the broader machine learning ecosystem by providing a focused repository for high-performance C++ tracing infrastructure.

    • A commenter expressed interest in working on the proposal and requested guidance from an experienced member of the OpenXLA community.
    • Number of comments this week: 1

Since there were fewer than 5 open issues, all of the open issues have been listed above.

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. New nvshmem rule breaks the build: This issue involves a build failure in the PyTorchXLA project due to a new nvshmem rule, which is causing errors related to the cuda_configure rule and the repository_ctx object lacking a getenv method. The problem appears to be linked to a recent pull request in the openxla repository, and the user is seeking guidance on whether updates are needed on their end or if a fix is required from the openxla side, particularly concerning the cuda_configure settings.
  2. Failed to Parse MLIR generated by Torchax: This issue involves a problem with parsing MLIR generated by the Torchax export API when attempting to export a Torch model and compile it into an XLA binary. The user encounters errors related to an unregistered operation 'vhlo.rsqrt_v2' in the 'vhlo' dialect and a failure to deserialize a portable artifact using StableHLO_v1.9.5, despite using the specified versions of torch and torchxla and building the XLA repository from the same commit as torchxla 2.7.
  3. upgrade protobuf dependency: This issue involves the need to upgrade the protobuf dependency in the project, as the current version is outdated. The outdated dependency is causing compatibility issues, specifically preventing the use of XLA with other libraries within the same Bazel WORKSPACE.
  4. support bazel modules: This issue is about the request for the adoption of Bazel modules in the XLA package, as it is currently the only package in the user's Bazel build that lacks support for these modules. The user highlights the widespread adoption of Bazel modules and inquires if there are any plans to integrate them into the project.
  5. Gpu collective performance model bug: This issue pertains to a bug in the gpu_collective_performance model file, where a recent change updated the lowLatencyBandwidth for AMD links but failed to make corresponding updates in the CUDA section. As a result, invoking the gpu_collective_performance model with H100 settings leads to a failure, indicating a discrepancy in the model's handling of different GPU configurations.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 7

Summarized Issues:

  • TFRT GPU Client Design Choice: The design choice in the TFRT GPU client implementation raises questions about its use of only one stream for handling GPU operations. This approach potentially misses the opportunity for parallel processing of memory copy and computation tasks.
    • issues/28859
  • XLA/JAX and Shader Binding Table: A developer seeks guidance on using XLA/JAX to generate modules in the shader binding table for ray tracing code. They question the feasibility of lowering kernels to PTX either ahead-of-time (AOT) or just-in-time (JIT) within the constraints of XLA, while using optixLaunch(...) to synchronize tensors.
    • issues/28893
  • Build Error in XLA Project: A build error occurs while compiling the algorithm.cc file in the XLA project using Bazel, related to a static assertion failure. The issue is potentially due to the use of STL algorithms that require a copy constructor, which was resolved by implementing a move constructor and move assignment operator for the AllocationValue class and deleting its copy constructor.
    • issues/28905
  • GPU Mocking and Autotuner Hang: Using GPU mocking to simulate multiple processes causes the autotuner in the JAX compiler to hang. This occurs when attempting to retrieve autotuning results from the mock processes, as demonstrated by a provided Python script during a matrix multiplication operation.
    • issues/28959
  • Missing JSON URLs for CUDNN and CUDA: The absence of JSON URLs for CUDNN 9.10.2 and CUDA 12.9.1 in the GitHub hermetic repository poses challenges for building jax from source on Windows. The TensorFlow team has not yet mirrored these versions on storage.googleapis.com, causing significant compilation challenges.
    • issues/28989
  • XLA Profiling Subsystem Extraction: There is a proposal to extract the XLA profiling subsystem into a dedicated OpenXLA repository. This would make its powerful, generic, and modular tracing components more accessible to the broader machine learning ecosystem without requiring dependency on the entire XLA framework.
    • issues/29007
  • MHLO to XLA HLO Conversion Failure: A failure occurs in converting the mhlo.dynamic_broadcast_in_dim operation from MHLO to XLA HLO using the mlir::ConvertMlirHloToHlo API. This is potentially due to missing or incorrect mlir::MlirToHloConversionOptions, resulting in an error during the execution of a JAX-based script that utilizes shape polymorphism and dynamic broadcasting.
    • issues/29030

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 0

Summarized Issues:

As of our latest update, there were no issues closed in the project this week.

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 8

Key Open Pull Requests

1. [XLA:CPU][oneDNN] Add build flag to enable asynchronous support in oneDNN: This pull request introduces a build flag to optionally enable asynchronous execution support in oneDNN for the XLA:CPU backend, enhancing performance capabilities by allowing the use of an asynchronous version of oneDNN.

  • URL: pull/28883
  • Merged: No
  • Associated Commits: a30f8, 47c79

2. [Perf] Add expensive AllGather cost adjustment to default GPU Scheduler: This pull request proposes a performance enhancement to the default GPU Scheduler by adjusting the cost of AllGather operations to reflect their relative expense compared to AllReduce operations, as detailed in the commits and described in the pull request body.

  • URL: pull/28997
  • Merged: No
  • Associated Commits: f744b, a799a

3. [NVIDIA GPU] [XLA_GPU_MS_COLLECTIVE] Round-robin stream assignment for async communications: This pull request introduces a round-robin stream assignment algorithm for asynchronous collectives as part of the ongoing efforts to revive a previous pull request, with plans to integrate this algorithm into the pipeline following a refactoring of the stream assignment mechanism.

  • URL: pull/28919
  • Merged: No
  • Associated Commits: 84ccb

Other Open Pull Requests

  • HLO Representation Dumping: This pull request enables the dumping of optimized High-Level Optimizer (HLO) representations when deserializing. It makes these dumps accessible when utilizing the JAX compilation cache.
    • pull/28928
  • Reduce Scatter Subtraction Pattern: A subtraction pattern is introduced to the reduce scatter creator in the openxla/xla project. This pull request is currently not merged.
    • pull/28929
  • ROCm Device Description Improvements: Improvements are made to the ROCm device description by cleaning up the existing code. This pull request is open and not yet merged.
    • pull/28936
  • KV Store Null Setting for Mocked GPU Processes: This pull request addresses an issue by setting the KV store to null when GPU processes are mocked. It prevents hanging during operations like sharding autotuning by ensuring there are no other processes to communicate with.
    • pull/28962
  • SPMD Partitioning for Custom Calls: SPMD partitioning is introduced for custom calls handling block-scaled dot operations based on microscaling (MX) formats. This extends existing partitioning rules without affecting convolutions and dots not using MX types.
    • pull/29073

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 3

Key Closed Pull Requests

1. [XLA]Clamp num_workers to avoid partition overflow: This pull request addresses an issue in the XLA project where the number of worker threads in the parallel_for() function could exceed the number of tasks, potentially causing out-of-bounds access, by clamping the number of workers to ensure alignment between worker threads and task partitions, thereby preventing undefined behavior.

  • URL: pull/28877
  • Merged: No
  • Associated Commits: d72f4

2. Merging XLA extension BUILD file: This pull request involves the integration of an XLA extension BUILD file from the Elixir-NX repository into the OpenXLA project, as indicated by the reference to the original file's URL and the commit message, although it was ultimately not merged.

  • URL: pull/28980
  • Merged: No
  • Associated Commits: 04d78

3. [XLA:GPU] Add shared_memory_per_block_optin device info member: This pull request introduces a new device information member, shared_memory_per_block_optin, to the XLA:GPU project, enabling the querying of available shared memory in JAX to enhance the code generation process for custom kernels, although it was ultimately not merged.

  • URL: pull/28985
  • Merged: No
  • Associated Commits: 62d77

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

  1. Add subtraction pattern to reduce scatter creator
    • Toxicity Score: 0.55 (Defensive responses, Lack of resolution, Perceived dismissiveness)
    • This GitHub conversation involves username1 and username2, where username1 initially provides a solution that username2 critiques for not addressing the issue effectively. Username1 responds with a defensive tone, expressing frustration over the feedback. The conversation continues with username2 maintaining a critical stance, which seems to trigger further tension as username1 perceives the comments as dismissive. The interaction is marked by a lack of resolution and increasing defensiveness from both parties.

IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
Google-ML-Automation 57 0 0 0
mraunak 27 2 0 0
akuegel 12 0 0 0
terryysun 5 4 0 1
beckerhe 6 0 0 4
othakkar 4 2 0 4
WillFroom 9 0 0 0
allanrenucci 6 0 0 3
bchetioui 8 0 0 1
alekstheod 7 1 0 0

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.