Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Xla: September 22, 2025 - September 29, 2025 (12:01:28)

Weekly GitHub Report for Xla

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

No recent version releases were found.

1.2 Version Information:

Please provide the version release information you would like me to analyze and summarize.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. XLA:TPU mhlo.acosh / mhlo.acos cannot be translated to XLA HLO: This issue reports that the mhlo.acosh and mhlo.acos operations cannot be translated to XLA HLO after a recent XLA version update, causing errors during continuous integration. The problem does not occur on CPU or CUDA backends and appears to stem from missing lowering implementations for these operations in the TPU backend.

    • The comment suggests that the issue is likely due to a recent change where acosh became a real HLO operation, requiring a new lowering implementation that was previously handled by expanding chlo.acosh into other HLO ops.
    • Number of comments this week: 1

Since there were fewer than 5 open issues, all of the open issues have been listed above.

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. New nvshmem rule breaks the build: This issue reports a build failure caused by a new nvshmem rule introduced in a recent pull request, which leads to an error related to the absence of a getenv method in the repository_ctx object during CUDA configuration. The reporter is seeking guidance on whether they need to update their side to resolve this error, particularly in relation to changes mentioned for JAX, or if the fix must come from the open_xla project, along with an estimated timeline for such a resolution.
  2. Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the VHLO dialect. The user is attempting to compile the exported MLIR into an XLA binary using XLA AOT compilation but faces deserialization errors with StableHLO, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit.
  3. support bazel modules: This issue requests the adoption of Bazel modules within the project, highlighting that Bazel modules have seen significant adoption in the community. The user points out that XLA is currently the only package in their Bazel build that does not support these modules, implying a need for compatibility and modernization.
  4. Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the update to lowLatencyBandwidth for AMD links was applied without corresponding changes to the CUDA section. As a result, invoking the gpu_collective_performance model with H100 settings leads to a failure, indicating incomplete or inconsistent performance modeling for GPU collectives.
  5. Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user is unable to prevent the Bazel build system from defaulting to building an x86 binary despite using the --config=cross_compile_linux_arm64 flag and is seeking guidance on the correct approach.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 2

Summarized Issues:

  • Profiling granularity on GPU: The issue highlights a limitation in jax.profiler where only the duration of fused HLO operations is reported, rather than individual fine-grained HLO ops like add.3.3. This raises concerns about whether GPU asynchronous execution inherently restricts profiling to fused operations, preventing detailed timing analysis of each operation.
  • issues/31669
  • TPU backend compilation errors: The problem involves mhlo.acosh and mhlo.acos operations in MLIR failing to translate to XLA HLO on TPU backends after a recent update, causing compilation errors. These errors do not occur on CPU or CUDA backends, indicating a backend-specific translation issue.
  • issues/31796

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 0

Summarized Issues:

As of our latest update, there were no issues closed in the project this week.

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 15

Key Open Pull Requests

1. Support building with Bzlmod (native rules_python toolchain): This pull request aims to add support for building the project using Bzlmod with the native rules_python toolchain, including fixes for numpy headers, remote build execution configuration, shell rules, and continuous integration setup.

  • URL: pull/31718
  • Merged: No
  • Associated Commits: dee16, f96e8, 825b8, 17a4d, 2ef13, 5af8f, e725f

2. Fix libdevice search: This pull request improves the search mechanism for the CUDA libdevice path by preventing empty or invalid paths from being added, correctly handling runtime folder indices, and including the $CUDA_HOME environment variable as a valid search location to ensure compatibility with non-standard CUDA installations and multiple CUDA versions on HPC systems.

  • URL: pull/31886
  • Merged: No
  • Associated Commits: 01788, 90071, 905d0, 23eb5

3. [XLA:CPU][oneDNN] Enable oneDNN Layer-Norm Custom Calls in Thunk Runtime: This pull request enables support for oneDNN Layer-Norm operations in the XLA:CPU Thunk runtime by updating the thunk emitter, re-enabling the custom call rewrite in the oneDNN ops rewriter, and adding execution support for the oneDNN Layer-Norm operation.

  • URL: pull/31707
  • Merged: No
  • Associated Commits: 35147, 52660

Other Open Pull Requests

  • oneDNN CPU backend improvements: Multiple pull requests enhance the oneDNN integration in the XLA CPU backend by updating the threadpool interface to asynchronous, enabling support for Softmax operations, and modifying the dot operation rewriting pass to use DotLibraryRewriter when oneDNN is enabled. These changes improve efficiency, add new operation support, and refine the rewriting logic for better performance and compatibility.
  • pull/31747, pull/31745, pull/31810
  • Bug fixes related to GPU and parallel compute: Several pull requests fix bugs including an assertion error when no stream borrower is created for parallel computes, a crash caused by tuple color assignment in GPU code, escaping MLIR in error messages during Triton parsing failures, and skipping a flaky test in the ROCm environment to ensure CI stability. These fixes address stability and correctness issues in GPU execution and testing environments.
  • pull/31783, pull/31795, pull/31902, pull/31965
  • Documentation updates: Documentation improvements include adding a new "hlo_pass" overview file and updating the operation_semantics.md to better align Gather and Scatter operation descriptions with current code and StableHLO specifications. These updates enhance clarity and maintain alignment between documentation and implementation.
  • pull/31855, pull/31688
  • Triton GEMM emitter enhancements: A pull request updates the legacy Triton GEMM emitter to support block scaled dot fusions by enabling fusions around kScaledDot and emitting Triton's DotScaled operation for MXFP8. This allows the autotuner to select the faster scaled dot implementation between Triton and cuDNN, improving performance.
  • pull/31679
  • oneDNN graph rewrite support for dot operations: One pull request enables the oneDNN graph rewrite to support dot operations where operand dimension ranks are mismatched, expanding the applicability of oneDNN optimizations.
  • pull/31708
  • TextLiteralReader parsing bug fix: A pull request improves the parsing logic in TextLiteralReader by ignoring empty or whitespace-only lines, enforcing colon separators, and adding unit tests, fixing a bug identified by OSS-Fuzz. This ensures robust and correct parsing of text literals.
  • pull/31888

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 6

Key Closed Pull Requests

1. [DOC] operation_semantics formatting issues fix: This pull request addresses various formatting issues in the operation_semantics documentation, including adding missing periods, removing double spaces, fixing code block and unordered list display problems caused by spacing, and making spelling and grammar improvements.

  • URL: pull/31741
  • Merged: No
  • Associated Commits: 05eb8

2. Change the padded index value to be invalid in prefix scan of scatter_determinism_expander: This pull request addresses a critical bug in the ScatterDeterminismExpander by changing the padded index value used in the prefix scan algorithm from zero to an invalid value based on the operand tensor size, thereby preventing false matches with valid index 0 and ensuring correct accumulation in scatter operations, especially for scatter_set and scatter_add cases with duplicate or zero indices.

  • URL: pull/31746
  • Merged: No
  • Associated Commits: 093f6

3. [ROCm] Fix CI build break: This pull request addresses a CI build failure on ROCm by moving the nvptx_backend dependency behind a conditional guard that checks if CUDA is configured, thereby preventing compilation errors related to missing CUDA headers.

  • URL: pull/31764
  • Merged: No
  • Associated Commits: c4332

Other Closed Pull Requests

  • Fixes for class vs struct declaration inconsistencies: These pull requests resolve compilation and linker errors caused by mismatched declarations of classes and structs in the codebase. They ensure consistent use of struct and class keywords to fix build failures and linker issues on specific platforms and compiler versions.
  • pull/31770, pull/31804
  • Documentation improvements for HLO Passes: This pull request introduces new general overview documentation for HLO Passes to improve project documentation. It is currently proposed and has not been merged yet.
  • pull/31799

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
othakkar 16 7 0 9
athurdekoos 21 5 0 0
shawnwang18 16 1 0 0
meteorcloudy 12 2 0 2
sergachev 10 2 0 1
mraunak 8 1 0 0
draganmladjenovic 3 3 0 2
sergey-kozub 4 3 0 0
ScXfjiang 4 3 0 0
penpornk 0 0 0 7

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.