Weekly GitHub Report for Pytorch: March 03, 2025 - March 10, 2025
Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant updates including support for torch.compile
with Python 3.13, a new performance-related feature torch.compiler.set_stance
, and FP16 support on X86 CPUs. Notably, the release marks a shift away from publishing on Conda, with a focus on using official wheel packages, and introduces a backward compatibility-breaking change by setting weights_only=True
as the default for torch.load
.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
As of our latest update, there are no active issues with ongoing comments this week.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
As of our latest update, there are no stale issues for the project this week.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 0
Summarized Issues:
As of our latest update, there are no open issues for the project this week.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 41
Summarized Issues:
- Real-Tensor Tracing and Dynamic Shapes in PyTorch: This issue involves a failure in real-tensor tracing during the export of dynamic shapes in PyTorch, caused by a division by zero error in the
_reshape_view_helper
function. The problem highlights the challenges of reconciling fake-tensor and real-tensor tracing paths and suggests potential solutions such as modifying metas for real-tensor tracing or disabling size-oblivious guards.
- Errors in PyTorch's Singular Value Decomposition and Convolution Functions: An
INTERNAL ASSERT FAILED
error occurs in PyTorch'storch.svd
function when performing singular value decomposition on a very large matrix, likely due to the use of a 32-bit LAPACK API. Additionally, a bug in thetorch.nn.functional.conv1d
function with specific input parameters can lead to a "Floating point exception (core dumped)" error, related to a known issue with oneDNN.
- Integration and Test Failures in PyTorch and Triton: Test failures occur in the FlexDecoding component of the Triton project when integrated with PyTorch, related to an assertion error indicating an invalid stage for an operation. Additionally, a segmentation fault occurs in the cpp_wrapper component of the Triton upstream project when running a specific unit test on ROCm, highlighting compatibility issues across different hardware platforms.
- Numerical Stability and NCCL Version Errors in PyTorch: A proposal to improve numerical stability in the
torch.linalg.eigh
function on GPUs involves adding a small epsilon to the denominator in the backward computation. Meanwhile, errors related to mismatched NCCL version information and a CUDA function failure with an "invalid argument" error occur after installing PyTorch with CUDA 12.8.
- Sharding Strategy and Export Errors in PyTorch: The need to register a sharding strategy for the
aten.amax.default
operator in Dtensor addresses errors encountered with float8 rowwise scaling. Additionally, a bug in thetorch.export.export
function prevents successful exporting of a convolutional neural network with a batch normalization layer on a GPU.
- GPU Compute and Build Failures in PyTorch: A runtime error occurs when performing GPU compute tasks using PyTorch with the ROCm/HIP backend on an AMD Radeon RX 7600 XT, with the error "HIP error: invalid device function" during tensor allocation. Additionally, a build failure of
torch_cuda.dll
in a PyTorch project is due to an unresolved external symbol error related to_cudnn_attention_forward
.
- ONNX Export and Documentation Errors in PyTorch: Adding a
_capture_strategy
field to record the strategy used in creating an ONNX program helps in guarding tests and identifying regressions. A documentation error in PyTorch'sregister_forward_hook
method needs correction to accurately reflect the code structure and functionality.
- Stream Management and Segmentation Faults in PyTorch: A feature request for a stream management API in PyTorch's ProcessGroupNCCL addresses asynchronous communication challenges. Meanwhile, a segmentation fault occurs in the cpp_wrapper component of the Triton upstream project when running a specific unit test on ROCm.
- Accuracy Problems and Docker Builds in PyTorch: Accuracy problems in the unit tests for the
quantile
operation on ROCm occur when updating Triton for version 3.3. Additionally, transitioning the project's Docker builds to utilize public Amazon Elastic Container Registry (ECR) images is motivated by Docker Hub's impending rate limiting policy changes.
- Test Failures and Function Identifier Reuse in PyTorch: A test named 'test_custom_hook_custom_stream' was disabled due to failures on the main branch, related to a 'HIP error: invalid device ordinal' on ROCm platforms. Additionally, a bug in the PyTorch project involves decorators like
torch.compiler.allow_in_graph
not properly handling the reuse of function identifiers.
- Gradient and Index Errors in PyTorch: A bug in the PyTorch library involves incorrect gradients of the
torch.nn.functional.hardswish
function at boundary points. Additionally, an "IndexError: tuple index out of range" occurs when running a Python script to load and generate text from a model in vLLM.
- Inconsistent Outputs and Memory Layout in PyTorch: A bug in the
fx_graph_runnable.py
file involves theopt_output
being inconsistent with the actual output during therun_repro(acc=True)
test. Additionally, a bug in the PyTorch library involves thetorch.cdist
function producing inconsistent results when run in eager mode compared to other backends.
- ARM64 Support and Bitshift Errors in PyTorch: The integration and support of Triton for aarch64 and SBSA architectures are motivated by the availability of Linux ARM64 hosted runners on GitHub. Additionally, a bug in the PyTorch MPS backend involves the bitshift
<<
operation producing incorrect results.
- ONNX Export and Type Errors in PyTorch: A bug in PyTorch 2.6.0 involves an error when exporting a model using ONNX that includes a slice operation on complex tensors. Additionally, a "TypeError" occurs when attempting to export an ONNX model to an IO buffer using
torch.onnx.export
withexternal_data
set to True.
- Data Type and Backend Inconsistencies in PyTorch: A bug where specifying a
float16
data type fortorch.arange
results in an invalid ONNX graph due to a type constraint in the ONNXRange
operator. Additionally, a bug in the PyTorch library involves the combination ofnn.Tanhshrink
andatan2
functions producing inconsistent output when using the CPP backend.
- Profiling and Checkpointing Issues in PyTorch: The inability of PyTorch's profiling tool to capture the runtime of the ATen Scaled Dot Product Attention (SDPA) kernel when using
torch.compile
. Additionally, a problem with the PyTorch distributed checkpointer involves the absence of astrict
parameter in thedcp.load()
function.
- LazyConvTranspose1d and Sparse Tensor Errors in PyTorch: A bug in the PyTorch library involves using
torch.nn.LazyConvTranspose1d
with an excessively large stride value resulting in aFloating point exception (core dumped)
error. Additionally, using thetorch.sparse.sum
function with a specific sparse tensor input can lead to a segmentation fault.
- NamedTensorUtils and Autograd Errors in PyTorch: An "INTERNAL ASSERT FAILED" error occurs in PyTorch, specifically in the NamedTensorUtils.cpp file when attempting to create a tensor with an empty list of names. Additionally, a redundant try block in the
backward()
function within theautograd.py
file should be addressed by checking ifinfo._backward_fn
isNone
.
- Compilation and Quantization Errors in PyTorch: Excessive memory usage and out-of-memory (OOM) errors occur during the startup of compilation when using
torch.compile
with in-place operations on a large tensor. Additionally, a bug in PyTorch's Dynamo module involves handling list comparisons during the export process for training.
- HPU Profiling and Matmul Errors in PyTorch: Adding a profiler activity specifically for HPU devices facilitates accurate profiling by distinguishing them from other devices. Additionally, a bug in the PyTorch library involves the
torch.matmul()
function failing to handle dynamic shapes correctly during ONNX model export.
- Symbol Exposure and Test Failures in PyTorch: A regression in the PyTorch library involves missing symbols, specifically functions from
python_arg_parser.h
, in thetorch_python
DLL after a recent update. Additionally, a test failure in the PyTorch project, specifically thetest_reference_numerics_normal
test, encounters aTypeError
due to the unsupported use of the numpy boolean negative operator.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 0
As of our latest update, there are no open pull requests for the project this week.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 86
Key Closed Pull Requests
1. cpp_wrapper: reduce memory usage by removing unneeded temporaries: This pull request aims to reduce memory usage in the cpp_wrapper
mode of a PyTorch project by refactoring reinterpret_view
calls to return temporary RAII tensor objects, eliminating unnecessary temporary tensor handles, and deleting input tensor lists after casting, thereby aligning its memory efficiency with the default inductor mode.
- URL: pull/147403
- Merged: No
- Associated Commits: 01424, 67582, eb4f8, 20c1a, a6f57, 1c1b4, aae0f, 3806a, ebf67, 91ceb, 4d5ed, 35f9d, b6bf5, 7deb4, 4ebac, d0da4, e9873, b7ec2
2. [WIP][XPU][Inductor] Update Intel triton for release 2.7.: This pull request aimed to update the Intel Triton component for the XPU Inductor in preparation for the release of version 2.7, as part of a stack of related changes, but it was ultimately not merged.
- URL: pull/147727
- Merged: No
3. Make record/storage alignment in torch.save configurable: This pull request aims to make the record and storage alignment in the torch.save
function configurable, allowing users to adjust how data is stored and aligned when saving PyTorch models, although it was ultimately not merged into the main project.
- URL: pull/147788
- Merged: No
Other Closed Pull Requests
- Delayed Compilation Enhancements: This topic covers the introduction of a new
eager_then_compile
stance and an experimental attempt to introduce a delayed compilation feature in the PyTorch project. Theeager_then_compile
stance aims to reduce compile times and improve usability by initially running in eager mode, while the experimental feature involved multiple updates and discussions but was ultimately closed without being merged.
- FakeTensorMode Improvements: These pull requests focus on enhancing the
torch.load
function underFakeTensorMode
by adding checkpoint offsets to untyped storages and ensuring correct device loading for FakeTensors. Although the checkpoint offsets feature was not merged, the device loading improvements addressed issues in specific functions.
- PyTorch Serialization Enhancements: This topic includes efforts to enhance the
torch.serialization.skip_data
functionality with thetorch.load
method and eliminate unbacked renamings in the export process. The serialization enhancement was not merged, while the unbacked renamings pull request introduced a new pass to recompute unbacked bindings.
- Optimization and Refactoring: These pull requests aim to optimize the PyTorch codebase by removing unnecessary tensor clones and redundant variable calls, and address a regression issue in the
evaluate_expr
function. The optimization pull request was not merged, while the regression fix included refactoring and simplification of function calls.
- XPU and OneDNN Enhancements: This topic covers enabling SDPA on the XPU backend as part of the OneDNN Upstreaming plan and adding support for the XPU device to LayerNormKernel devices. The SDPA enhancement involved adding new files and modifying test cases, while the LayerNormKernel support was not merged.
- Sharding and DeviceMesh Improvements: These pull requests focus on refactoring sharding propagation to handle cross mesh computation and enhancing the
DeviceMesh.get_group
method. The sharding refactor allows more flexibility for operators, while the DeviceMesh improvement includes adding tests and updating relevant files.
- Documentation and Testing Updates: This topic includes adding a note to the PyTorch documentation about Intel® Deep Learning Essentials runtime packages and renaming a test file from "test_graph_break_messages" to "test_error_messages". The documentation update cautions against standalone oneAPI installation, while the test file renaming involves multiple commits and notifications.
- Backend and Kernel Enhancements: These pull requests introduce support for rowwise scaling in scaled GEMM operations and enable XPU support for the Inductor MM Triton Kernel Benchmark. The GEMM enhancement includes various fixes and new unit tests, while the XPU support addresses a test case regression issue.
- Dynamo Component Improvements: This topic covers addressing the issue of exceeding the recompile limit in the Dynamo component and removing internal stack traces for graph breaks. The recompile limit fix implements a recursive mechanism, while the stack trace removal was not merged.
- Miscellaneous Enhancements: These pull requests address issues with the
gather_object
andscatter_object_list
functions and fix the decomposition for thelinspace
function. The gather and scatter fix ensures correct rank usage, while the linspace fix prevents non-functional operations for functional operators.
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
- [wip][aot] annotated fwd graph dynamic tensor outputs with mark_dynamic
- Toxicity Score: 0.55 (Frustration expressed, defensive responses, mediation attempts, escalating dissatisfaction.)
- This GitHub conversation involves several users discussing a pull request, with username1 expressing frustration over the lack of progress and username2 responding defensively. The tone shifts from collaborative to tense as username3 attempts to mediate, but username1's continued dissatisfaction escalates the situation.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
mikaylagawarecki | 80 | 4 | 1 | 12 |
williamwen42 | 61 | 5 | 2 | 18 |
BoyuanFeng | 80 | 2 | 0 | 2 |
zou3519 | 38 | 7 | 4 | 35 |
clee2000 | 62 | 3 | 3 | 0 |
jansel | 23 | 4 | 0 | 38 |
bobrenjc93 | 47 | 6 | 0 | 10 |
malfet | 38 | 2 | 1 | 19 |
oulgen | 52 | 2 | 0 | 5 |
justinchuby | 31 | 4 | 1 | 21 |