Weekly GitHub Report for Pytorch: March 03, 2025 - March 10, 2025

            Weekly GitHub Report for Pytorch: March 03, 2025 - March 10, 2025

            Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant updates including support for torch.compile with Python 3.13, a new performance-related feature torch.compiler.set_stance, and FP16 support on X86 CPUs. Notably, the release marks a shift away from publishing on Conda, with a focus on using official wheel packages, and introduces a backward compatibility-breaking change by setting weights_only=True as the default for torch.load.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 
As of our latest update, there are no active issues with ongoing comments this week. 
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 
As of our latest update, there are no stale issues for the project this week. 
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 0
Summarized Issues:
As of our latest update, there are no open issues for the project this week.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 41
Summarized Issues:

Real-Tensor Tracing and Dynamic Shapes in PyTorch: This issue involves a failure in real-tensor tracing during the export of dynamic shapes in PyTorch, caused by a division by zero error in the _reshape_view_helper function. The problem highlights the challenges of reconciling fake-tensor and real-tensor tracing paths and suggests potential solutions such as modifying metas for real-tensor tracing or disabling size-oblivious guards.
issues/147402

Errors in PyTorch's Singular Value Decomposition and Convolution Functions: An INTERNAL ASSERT FAILED error occurs in PyTorch's torch.svd function when performing singular value decomposition on a very large matrix, likely due to the use of a 32-bit LAPACK API. Additionally, a bug in the torch.nn.functional.conv1d function with specific input parameters can lead to a "Floating point exception (core dumped)" error, related to a known issue with oneDNN.
issues/147457, issues/147458

Integration and Test Failures in PyTorch and Triton: Test failures occur in the FlexDecoding component of the Triton project when integrated with PyTorch, related to an assertion error indicating an invalid stage for an operation. Additionally, a segmentation fault occurs in the cpp_wrapper component of the Triton upstream project when running a specific unit test on ROCm, highlighting compatibility issues across different hardware platforms.
issues/147468, issues/147734

Numerical Stability and NCCL Version Errors in PyTorch: A proposal to improve numerical stability in the torch.linalg.eigh function on GPUs involves adding a small epsilon to the denominator in the backward computation. Meanwhile, errors related to mismatched NCCL version information and a CUDA function failure with an "invalid argument" error occur after installing PyTorch with CUDA 12.8.
issues/147544, issues/147575

Sharding Strategy and Export Errors in PyTorch: The need to register a sharding strategy for the aten.amax.default operator in Dtensor addresses errors encountered with float8 rowwise scaling. Additionally, a bug in the torch.export.export function prevents successful exporting of a convolutional neural network with a batch normalization layer on a GPU.
issues/147578, issues/147623

GPU Compute and Build Failures in PyTorch: A runtime error occurs when performing GPU compute tasks using PyTorch with the ROCm/HIP backend on an AMD Radeon RX 7600 XT, with the error "HIP error: invalid device function" during tensor allocation. Additionally, a build failure of torch_cuda.dll in a PyTorch project is due to an unresolved external symbol error related to _cudnn_attention_forward.
issues/147626, issues/147671

ONNX Export and Documentation Errors in PyTorch: Adding a _capture_strategy field to record the strategy used in creating an ONNX program helps in guarding tests and identifying regressions. A documentation error in PyTorch's register_forward_hook method needs correction to accurately reflect the code structure and functionality.
issues/147674, issues/147696

Stream Management and Segmentation Faults in PyTorch: A feature request for a stream management API in PyTorch's ProcessGroupNCCL addresses asynchronous communication challenges. Meanwhile, a segmentation fault occurs in the cpp_wrapper component of the Triton upstream project when running a specific unit test on ROCm.
issues/147729, issues/147734

Accuracy Problems and Docker Builds in PyTorch: Accuracy problems in the unit tests for the quantile operation on ROCm occur when updating Triton for version 3.3. Additionally, transitioning the project's Docker builds to utilize public Amazon Elastic Container Registry (ECR) images is motivated by Docker Hub's impending rate limiting policy changes.
issues/147736, issues/147748

Test Failures and Function Identifier Reuse in PyTorch: A test named 'test_custom_hook_custom_stream' was disabled due to failures on the main branch, related to a 'HIP error: invalid device ordinal' on ROCm platforms. Additionally, a bug in the PyTorch project involves decorators like torch.compiler.allow_in_graph not properly handling the reuse of function identifiers.
issues/147767, issues/147777

Gradient and Index Errors in PyTorch: A bug in the PyTorch library involves incorrect gradients of the torch.nn.functional.hardswish function at boundary points. Additionally, an "IndexError: tuple index out of range" occurs when running a Python script to load and generate text from a model in vLLM.
issues/147801, issues/147839

Inconsistent Outputs and Memory Layout in PyTorch: A bug in the fx_graph_runnable.py file involves the opt_output being inconsistent with the actual output during the run_repro(acc=True) test. Additionally, a bug in the PyTorch library involves the torch.cdist function producing inconsistent results when run in eager mode compared to other backends.
issues/147850, issues/148064

ARM64 Support and Bitshift Errors in PyTorch: The integration and support of Triton for aarch64 and SBSA architectures are motivated by the availability of Linux ARM64 hosted runners on GitHub. Additionally, a bug in the PyTorch MPS backend involves the bitshift << operation producing incorrect results.
issues/147857, issues/147889

ONNX Export and Type Errors in PyTorch: A bug in PyTorch 2.6.0 involves an error when exporting a model using ONNX that includes a slice operation on complex tensors. Additionally, a "TypeError" occurs when attempting to export an ONNX model to an IO buffer using torch.onnx.export with external_data set to True.
issues/147896, issues/147909

Data Type and Backend Inconsistencies in PyTorch: A bug where specifying a float16 data type for torch.arange results in an invalid ONNX graph due to a type constraint in the ONNX Range operator. Additionally, a bug in the PyTorch library involves the combination of nn.Tanhshrink and atan2 functions producing inconsistent output when using the CPP backend.
issues/148041, issues/148241

Profiling and Checkpointing Issues in PyTorch: The inability of PyTorch's profiling tool to capture the runtime of the ATen Scaled Dot Product Attention (SDPA) kernel when using torch.compile. Additionally, a problem with the PyTorch distributed checkpointer involves the absence of a strict parameter in the dcp.load() function.
issues/148225, issues/148252

LazyConvTranspose1d and Sparse Tensor Errors in PyTorch: A bug in the PyTorch library involves using torch.nn.LazyConvTranspose1d with an excessively large stride value resulting in a Floating point exception (core dumped) error. Additionally, using the torch.sparse.sum function with a specific sparse tensor input can lead to a segmentation fault.
issues/148275, issues/148276

NamedTensorUtils and Autograd Errors in PyTorch: An "INTERNAL ASSERT FAILED" error occurs in PyTorch, specifically in the NamedTensorUtils.cpp file when attempting to create a tensor with an empty list of names. Additionally, a redundant try block in the backward() function within the autograd.py file should be addressed by checking if info._backward_fn is None.
issues/148278, issues/148115

Compilation and Quantization Errors in PyTorch: Excessive memory usage and out-of-memory (OOM) errors occur during the startup of compilation when using torch.compile with in-place operations on a large tensor. Additionally, a bug in PyTorch's Dynamo module involves handling list comparisons during the export process for training.
issues/148165, issues/148179

HPU Profiling and Matmul Errors in PyTorch: Adding a profiler activity specifically for HPU devices facilitates accurate profiling by distinguishing them from other devices. Additionally, a bug in the PyTorch library involves the torch.matmul() function failing to handle dynamic shapes correctly during ONNX model export.
issues/148181, issues/148192

Symbol Exposure and Test Failures in PyTorch: A regression in the PyTorch library involves missing symbols, specifically functions from python_arg_parser.h, in the torch_python DLL after a recent update. Additionally, a test failure in the PyTorch project, specifically the test_reference_numerics_normal test, encounters a TypeError due to the unsupported use of the numpy boolean negative operator.
issues/148208, issues/148143

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 0
As of our latest update, there are no open pull requests for the project this week.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 86
Key Closed Pull Requests
1. cpp_wrapper: reduce memory usage by removing unneeded temporaries: This pull request aims to reduce memory usage in the cpp_wrapper mode of a PyTorch project by refactoring reinterpret_view calls to return temporary RAII tensor objects, eliminating unnecessary temporary tensor handles, and deleting input tensor lists after casting, thereby aligning its memory efficiency with the default inductor mode.

URL: pull/147403

Merged: No

Associated Commits: 01424, 67582, eb4f8, 20c1a, a6f57, 1c1b4, aae0f, 3806a, ebf67, 91ceb, 4d5ed, 35f9d, b6bf5, 7deb4, 4ebac, d0da4, e9873, b7ec2

2. [WIP][XPU][Inductor] Update Intel triton for release 2.7.: This pull request aimed to update the Intel Triton component for the XPU Inductor in preparation for the release of version 2.7, as part of a stack of related changes, but it was ultimately not merged.

URL: pull/147727

Merged: No

Associated Commits: b094f, 8fc9d, 04fe6, 32665, 54308, d4531, 1ef35, b0f9d, 54373, 4db80

3. Make record/storage alignment in torch.save configurable: This pull request aims to make the record and storage alignment in the torch.save function configurable, allowing users to adjust how data is stored and aligned when saving PyTorch models, although it was ultimately not merged into the main project.

URL: pull/147788

Merged: No

Associated Commits: 3db3e, 7e4b3, 57b2c, 326b6, b66d5, fa969, 627e4, 71142, e7f69, 220f8

Other Closed Pull Requests

Delayed Compilation Enhancements: This topic covers the introduction of a new eager_then_compile stance and an experimental attempt to introduce a delayed compilation feature in the PyTorch project. The eager_then_compile stance aims to reduce compile times and improve usability by initially running in eager mode, while the experimental feature involved multiple updates and discussions but was ultimately closed without being merged.
pull/147983, pull/147591

FakeTensorMode Improvements: These pull requests focus on enhancing the torch.load function under FakeTensorMode by adding checkpoint offsets to untyped storages and ensuring correct device loading for FakeTensors. Although the checkpoint offsets feature was not merged, the device loading improvements addressed issues in specific functions.
pull/147787, pull/147786

PyTorch Serialization Enhancements: This topic includes efforts to enhance the torch.serialization.skip_data functionality with the torch.load method and eliminate unbacked renamings in the export process. The serialization enhancement was not merged, while the unbacked renamings pull request introduced a new pass to recompute unbacked bindings.
pull/148018, pull/147574

Optimization and Refactoring: These pull requests aim to optimize the PyTorch codebase by removing unnecessary tensor clones and redundant variable calls, and address a regression issue in the evaluate_expr function. The optimization pull request was not merged, while the regression fix included refactoring and simplification of function calls.
pull/148159, pull/147836

XPU and OneDNN Enhancements: This topic covers enabling SDPA on the XPU backend as part of the OneDNN Upstreaming plan and adding support for the XPU device to LayerNormKernel devices. The SDPA enhancement involved adding new files and modifying test cases, while the LayerNormKernel support was not merged.
pull/147614, pull/148081

Sharding and DeviceMesh Improvements: These pull requests focus on refactoring sharding propagation to handle cross mesh computation and enhancing the DeviceMesh.get_group method. The sharding refactor allows more flexibility for operators, while the DeviceMesh improvement includes adding tests and updating relevant files.
pull/147869, pull/147741

Documentation and Testing Updates: This topic includes adding a note to the PyTorch documentation about Intel® Deep Learning Essentials runtime packages and renaming a test file from "test_graph_break_messages" to "test_error_messages". The documentation update cautions against standalone oneAPI installation, while the test file renaming involves multiple commits and notifications.
pull/148168, pull/148220

Backend and Kernel Enhancements: These pull requests introduce support for rowwise scaling in scaled GEMM operations and enable XPU support for the Inductor MM Triton Kernel Benchmark. The GEMM enhancement includes various fixes and new unit tests, while the XPU support addresses a test case regression issue.
pull/148238, pull/148237

Dynamo Component Improvements: This topic covers addressing the issue of exceeding the recompile limit in the Dynamo component and removing internal stack traces for graph breaks. The recompile limit fix implements a recursive mechanism, while the stack trace removal was not merged.
pull/148021, pull/148205

Miscellaneous Enhancements: These pull requests address issues with the gather_object and scatter_object_list functions and fix the decomposition for the linspace function. The gather and scatter fix ensures correct rank usage, while the linspace fix prevents non-functional operations for functional operators.
pull/147675, pull/147997

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 

[wip][aot] annotated fwd graph dynamic tensor outputs with mark_dynamic
Toxicity Score: 0.55 (Frustration expressed, defensive responses, mediation attempts, escalating dissatisfaction.)
This GitHub conversation involves several users discussing a pull request, with username1 expressing frustration over the lack of progress and username2 responding defensively. The tone shifts from collaborative to tense as username3 attempts to mediate, but username1's continued dissatisfaction escalates the situation.

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

mikaylagawarecki
80
4
1
12

williamwen42
61
5
2
18

BoyuanFeng
80
2
0
2

zou3519
38
7
4
35

clee2000
62
3
3
0

jansel
23
4
0
38

bobrenjc93
47
6
0
10

malfet
38
2
1
19

oulgen
52
2
0
5

justinchuby
31
4
1
21

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
mikaylagawarecki	80	4	1	12
williamwen42	61	5	2	18
BoyuanFeng	80	2	0	2
zou3519	38	7	4	35
clee2000	62	3	3	0
jansel	23	4	0	38
bobrenjc93	47	6	0	10
malfet	38	2	1	19
oulgen	52	2	0	5
justinchuby	31	4	1	21