Weekly GitHub Report for Xla: October 06, 2025 - October 13, 2025 (12:01:15)

            Weekly GitHub Report for Xla: October 06, 2025 - October 13, 2025 (12:01:15)

                    Weekly GitHub Report for Xla
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
No recent version releases were found.
1.2 Version Information:
Please provide the version release information you would like me to analyze and summarize.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 
As of our latest update, there are no active issues with ongoing comments this week. 
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

New nvshmem rule breaks the build: This issue reports a build failure caused by a new nvshmem rule introduced in a recent pull request, which leads to an error where the repository_ctx object lacks a getenv method during the CUDA configuration step. The reporter is seeking guidance on whether they need to update their side to resolve this problem or if the fix must come from the open_xla project, specifically regarding the timing and details of a potential correction in the cuda_configure settings.
Failed to Parse MLIR generated by Torchax: This issue describes a problem encountered when exporting a PyTorch model to MLIR using the torch-xla torchax export API, where the generated MLIR fails to parse due to an unregistered operation 'vhlo.rsqrt_v2' in the 'vhlo' dialect. The user is attempting to compile the exported model with XLA AOT but faces deserialization errors with StableHLO_v1.9.5, despite using compatible versions of torch, torchxla, and building XLA from the corresponding commit, and has provided code snippets and bytecode samples to assist in troubleshooting.
support bazel modules: This issue requests the adoption of Bazel modules within the project, highlighting that Bazel modules have seen significant adoption in the community. The user points out that XLA is currently the only package in their Bazel build that does not support Bazel modules, implying a need for compatibility improvements.
Gpu collective performance model bug: This issue addresses a bug in the gpu_collective_performance model where the recent update to lowLatencyBandwidth for AMD links was not consistently applied to the CUDA section. As a result, invoking the gpu_collective_performance model with H100 GPU settings leads to a failure, indicating incomplete or inconsistent parameter updates within the model.
Cross compile to ARM with custom gcc: This issue concerns difficulties encountered when attempting to cross-compile the XLA project from an x86 architecture to ARM64 using a custom GCC compiler. The user reports that despite using the --config=cross_compile_linux_arm64 flag in the Bazel build system, the build process continues to produce an x86 binary, indicating a possible misconfiguration or missing step in the cross-compilation setup.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 5
Summarized Issues:

Bazel build and rule execution issues: There is a problem with the Bazel pywrap_library rule being executed both before and after select statements, causing incorrect success checks for None values wrapped in select. Additionally, a build failure occurs in TensorFlow 2.19 due to missing dependency declarations in the Bazel build rule for grappler devices, leading to undeclared inclusion errors for protobuf headers related to CUDA and XLA.  
issues/32358, issues/32497

JAX and CUDA backend runtime errors: The jax.nn.dot_product_attention function fails with an XlaRuntimeError on CUDA 13 using the cudnn backend in JAX 0.7.2, producing internal errors about no valid execution plans. This issue affects multiple GPU models and does not occur with the xla implementation or on CUDA 12.  
issues/32385

Python callback threading regression: A regression causes Python callbacks, which should run synchronously in the calling thread, to sometimes execute in different threads. This behavior can block the main thread and potentially lead to deadlocks when multiple callbacks are staged asynchronously.  
issues/32426

GPU backend fusion code generation inquiry: There is a question about whether the XLA GPU project plans to support fusion region code generation using alternative backends like cuTile or Pallas instead of the current Triton backend.  
issues/32512

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 0
Summarized Issues:
As of our latest update, there were no issues closed in the project this week.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 12
Key Open Pull Requests
1. Update gemma2 keras benchmark script - fix ttft, and use tokenizer: This pull request updates the gemma2 Keras benchmark script by fixing the calculation of time to first token (TTFT) to measure the time to the first generated token rather than the prompt token, and replaces word counting based on spaces with token counting using a tokenizer to improve accuracy.

URL: pull/32357

Merged: No

Associated Commits: 25178, 9b20e

2. [ROCm] Prepare asan builds to be rbe compatible, include sanitizer ignore lists as data dpependency: This pull request prepares AddressSanitizer (asan) builds to be compatible with remote build execution (rbe) by making them hermetic and including sanitizer ignore lists as a data dependency in the run_under script to ensure their availability in the rbe worker.

URL: pull/32475

Merged: No

Associated Commits: cae2e, 53bbb

3. [ROCm] Change misleading method name RocmComputeCapability::has_amd_matrix_core(): This pull request renames the method RocmComputeCapability::has_amd_matrix_core() to has_amd_mat_acc_instructions() to more accurately reflect that gfx11xx GPUs do not have matrix cores but do support the WMMA matrix acceleration instruction set.

URL: pull/32283

Merged: No

Associated Commits: 53fc5

Other Open Pull Requests

PjRt API modifications: This set of pull requests moves CrossHostSendBuffers and CrossHostReceiveBuffers into the PjRtClient as virtual functions to enable client-specific implementations. These changes simplify cross-process GPU transfers and prepare the system for future performance improvements like communicator caching and transfer aggregation with ncclGroup calls.  
pull/32295

XLA GPU memory optimization and backend improvements: These pull requests introduce host offloading support to the collective pipeliner in XLA:GPU, enabling asynchronous offloading of intermediate results to host memory and dynamic variable detection for better memory management. Additionally, the SyclStreamPool class is implemented for the XLA GPU oneAPI backend, and fixes are made to enable AMDGPU backend features by default to improve independence from filesystem layout.  
pull/32297, pull/32305, pull/32439

GPU collective operations enhancements: This pull request adds support for collectives with a non-minor-most last dimension in the sub-byte collective normalization pass on GPU. This aims to improve performance by enabling more efficient collectives without requiring type conversion.  
pull/32388

Bug fix in XLA CPU oneDNN integration: This pull request fixes a bug by modifying code in onednn_contraction_rewriter.cc to prevent unsafe subtraction from unsigned values, caused by changing the method of obtaining dimension sizes from a signed to an unsigned return type.  
pull/32378

Debug options consistency: This pull request ensures that all components within the compiler properly respect the debug options override set by users, enhancing consistency in debugging behavior.  
pull/32454

Removal of non-functional rocm_diagnostics module: This pull request deletes the rocm_diagnostics.cc file and removes the rocm_diagnostics module because it never functioned properly and did not provide meaningful information to users, serving as a cleanup contribution.  
pull/32504

oneDNN fusion updates on XLA CPU backend: This pull request updates the fusion of dot and elementwise operations for oneDNN by adding constraints to align with current oneDNN capabilities. It also includes a refactor of the oneDNN fusion tests and support code.  
pull/32521

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 13
Key Closed Pull Requests
1. Introduce rocm specific bazelrc: This pull request proposes the introduction of a ROCm-specific bazelrc configuration intended for use in ROCm continuous integration jobs to prepare for remote build execution and enable running all tests with the ROCm configuration during CI checks.

URL: pull/32272

Merged: No

Associated Commits: e3136, 8b85a, b342d, d8adb, 24942, b6cd9

2. [GPU] Use intrinsics to accelerate f4e2m1fn conversions on Blackwell.: This pull request proposes using GPU intrinsics to accelerate f4e2m1fn type conversions on Blackwell hardware, resulting in approximately a twofold performance improvement on microbenchmarks, and includes unit and execution tests to validate the changes.

URL: pull/32430

Merged: No

Associated Commits: d40e8, a0e9e, c8bb0

3. [ROCm] Fix build files leading to rocm rbe platform: This pull request aims to fix the build files related to the ROCm platform to resolve issues with the ROCm RBE build environment, although it was not merged.

URL: pull/32405

Merged: No

Associated Commits: 8e002, 0bdc1

Other Closed Pull Requests

Build system improvements and fixes for ROCm and CUDA compatibility: Multiple pull requests address build issues and compatibility improvements for ROCm and CUDA environments. These include moving dependencies to correct targets, adding missing dependencies, tagging CUDA-only libraries to avoid ROCm builds, and ensuring compatibility with CUDA 13.1, all aimed at stabilizing and improving the build process across different hardware configurations.  
[pull/32336, pull/32459, pull/32460, pull/32488]

Support for external OpenSSL/boringssl in XLA builds: One pull request introduces support for building XLA with user- or system-provided versions of boringssl/OpenSSL. This change allows unbundling OpenSSL code from the jaxlib package, enabling upgrades of OpenSSL without needing to rebuild jaxlib.  
[pull/32300]

Enhancements and bug fixes in GPU backend operations: Several pull requests propose performance improvements and bug fixes in the GPU backend. These include adding a global scale parameter to the block scaled dot custom call for better fusion and performance, fixing the collective pipeliner to respect opt-barriers in NVIDIA GPU backend, and modifying the BlockScalingRewriter pass to handle older cuDNN versions correctly.  
[pull/32366, pull/32389, pull/32525]

Improvements to rocm_device_libs and build system robustness: One pull request replaces a problematic hack in rocm_device_libs by introducing a robust method to include clang headers compatible with both traditional WORKSPACE and Bzlmod build systems. This avoids manual path crafting that breaks with Bzlmod's canonical repository naming.  
[pull/32416]

Preparation for sanitizer ignore lists integration in GPU run scripts: A pull request prepares the parallel GPU run script to integrate sanitizer ignore lists as a dependency, enabling their delivery to the remote build execution (rbe) worker.  
[pull/32406]

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

alekstheod
29
7
0
1

meteorcloudy
29
4
0
2

sergachev
10
5
0
1

othakkar
8
5
0
2

draganmladjenovic
7
5
0
2

amd-songpiao
7
2
1
2

athurdekoos
9
2
0
0

rao-ashish
3
3
0
5

dimvar
5
4
0
0

terryysun
7
1
0
0

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
alekstheod	29	7	0	1
meteorcloudy	29	4	0	2
sergachev	10	5	0	1
othakkar	8	5	0	2
draganmladjenovic	7	5	0	2
amd-songpiao	7	2	1	2
athurdekoos	9	2	0	0
rao-ashish	3	3	0	5
dimvar	5	4	0	0
terryysun	7	1	0	0