Weekly GitHub Report for Pytorch: March 10, 2025 - March 17, 2025
Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
The PyTorch 2.6 release, created on January 29, 2025, introduces significant updates including support for torch.compile
with Python 3.13, a new performance-related feature torch.compiler.set_stance
, and enhancements to AOTInductor. Notable changes include the deprecation of PyTorch's official Anaconda channel, the introduction of FP16 support on X86 CPUs, and a backward compatibility-breaking change in the default behavior of torch.load
.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
As of our latest update, there are no active issues with ongoing comments this week.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
As of our latest update, there are no stale issues for the project this week.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 0
Summarized Issues:
As of our latest update, there are no open issues for the project this week.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 10
Summarized Issues:
- PyTorch Dynamo Bugs: Issues in PyTorch's Dynamo component include a bug where constant tensors created with
torch.tensor
do not recompile correctly with device guards when the ambient device changes, leading to CUDA device mismatches. Another issue involves theSETUP_WITH
implementation deviating from CPython documentation, causing crashes due to incorrect stack handling during exception unwinding.
- Gradient and Functionality Issues in PyTorch: PyTorch faces challenges with gradient support and function accuracy, such as the lack of gradient support for
residuals
intorch.linalg.lstsq
, which affects computational efficiency. Additionally, thetorch.nn.functional.hardswish
function has incorrect gradient calculations at boundary points, leading to unexpected results.
- Torchinductor Backend and Dtype Support: The torchinductor backend in PyTorch struggles with dtype support, particularly with
torch.float8_e8m0fnu
, where the current implementation fails to return tensors correctly, impacting MX workflows. Furthermore, there is a need for roundtrip casting support between float32 or bfloat16 and the e8m0 format to enable efficient operations without errors.
- Model Accuracy and Backend Errors: Certain models like
DebertaV2ForMaskedLM
andeca_halonext26ts
experience accuracy failures during themax_autotune
process due to a suspected commit causing aLoweringException
error. This results in anAssertionError
related to theView
operation in the PyTorch Inductor backend, affecting model performance.
- CUDA Backend Discrepancies: The
nn.MultiheadAttention
module in PyTorch shows significant output discrepancies when using the CUDA backend with the Triton compiler compared to the CPU, particularly when applying thetorch.reciprocal
function. This suggests a potential bug or tolerance issue in the CUDA implementation that needs addressing.
- FSDP2 Module and Parameter Management: The
fully_shard
function in PyTorch's FSDP2 module has unclear behavior regarding theignored_params
feature, leading to errors during forward computation due to mixed device types. This raises questions about the management and device movement ofbuffers
andignored_params
.
- Project Setup and Compatibility: Updating the
setup.py
file to use the recursive glob feature for thepackage_data
field in setuptools v62.3.0 is necessary to simplify header file inclusion. This update ensures compatibility with the project's minimum supported Python version, streamlining the setup process.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 0
As of our latest update, there are no open pull requests for the project this week.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 24
Key Closed Pull Requests
1. [CUDAGraph] Graph Partition: This pull request implements a CUDA graph partitioning feature in PyTorch, building on a previous inductor graph partitioning effort, to enable more efficient execution by allowing CUDA graphs to be used even when CPU operations are present, as demonstrated through a Python example and various code improvements and tests included in the commits.
- URL: pull/147648
- Merged: No
- Associated Commits: bcf8c, 8eaf0, 0f84d, 4c9b7, fc377, b4756, d81f1, b552e, 011ad, 3577e, 87825, ef010, 86daf, 6d450, 4f057, 5e44a, dc329, b7cd3, 7d749, d4415, 24991, 914db, cda3a, dc7ad, 1c44f, f2e4f, e2c61, 58815, 6552f, 3aef7, 9461c, 4719f, e1cce, 49d14, 77704, 877e9, 5e34e, c70c4, da3d8, ed1ce, bb180, 091bd, d7db6, 73213, 7749a, 6f2e2, dd113, 8b2c6, 0881a, 47504, 85a4f, dde1c, 70215, eeeeb, 5aade, 8026a, 45bec, 139bb
2. Force build to conform C++ standard on windows by adding /permissive-
flag: This pull request addresses the need for the PyTorch project to conform to the C++ standard on Windows by adding the /permissive-
flag to the torch_compile_options
, which resolves issues such as error C2440 when converting string literals to non-const pointers and ensures compatibility with Visual Studio's default settings for new projects.
- URL: pull/147367
- Merged: No
- Associated Commits: 983bd, 28fef, 4ac80, ac342, 53d1b, e69e1, 43e17, cea07, 30c92, fe650, 6e4a6, c7cf5, 59531, f9413
3. [fx] Move Node._prepend/Node._remove_from_list to C++: This pull request involves moving the methods Node._prepend
and Node._remove_from_list
from Python to C++ in the PyTorch project, resulting in improved performance as demonstrated by a microbenchmark that shows a reduction in function calls and execution time.
- URL: pull/148261
- Merged: No
Other Closed Pull Requests
- Performance Enhancements by Moving Functions to C++: Several pull requests focus on improving the performance of the PyTorch library by moving functions like
Node._update_args_kwargs
andmap_aggregate
to C++. These changes aim to reduce function calls and execution time during symbolic tracing, as demonstrated by microbenchmarking results, although not all were merged.
- Distributed Job Stability: A pull request addresses potential hangs in distributed jobs by banning compiler-driven recomputation of collectives. It ensures consistent decisions across ranks and proposes future enhancements like an
spmd_mode
flag for safe collective recomputation.
- Backend and Build Process Improvements: Multiple pull requests aim to enhance the PyTorch build process and backend functionality. These include using CK as the backend for memory-efficient attention in ROCm, ensuring the CK submodule's
config.h
file is used, and enabling XPU to utilize Visual Studio 2019 for building.
- Meta Functions and Backend Modifications: A pull request adds meta functions for "out" variants of certain
aten
functions, addressing a specific issue. Another modifies the Cutlass backend by removing an assertion that prevented self-multiplication.
- Compatibility and Compilation Adjustments: Several pull requests focus on compatibility and compilation issues, such as fixing atomic operations on ARMv8-A architecture and ensuring correct parsing of OpenMP flags by clang-cl on Windows.
- Gradient and Indexing Enhancements: Pull requests address issues in gradient computation for
torch.nn.functional.hardswish
and enhance backwards indexing functionality on ROCm when the stride is not equal to one.
- Continuous Integration and Dispatch Logic Updates: A pull request proposes changes to the continuous integration process to prevent workspace cleaning, while another updates dispatch logic for
linear
layers using BF16 to utilize oneDNN for better performance.
- Quantization and Setuptools Enhancements: A pull request enables a fast path for statically quantized matrix multiplications on AArch64, integrating the Arm Compute Library for significant performance improvements. Another aims to enhance the build process by adding recursive glob support to setuptools.
- Unimplemented Function Replacement: A pull request involves replacing the
unimplemented
function withunimplemented_v2
in a specific file, as part of addressing an issue, although it has not been merged yet.
- SymPy Library and Torch Compile Adjustments: Pull requests address an issue with floating-point number printing in the SymPy library and ensure
torch.compile
respects thepriority_order
setting of thesdpa_kernel
.
- Drafting and Stride Consistency: A pull request involves drafting a stable version of the Torch library, while another ensures stride consistency in a while loop for the
body_fn
function, although neither was merged.
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
mikaylagawarecki | 80 | 0 | 1 | 7 |
williamwen42 | 61 | 2 | 2 | 11 |
zou3519 | 38 | 7 | 4 | 27 |
clee2000 | 62 | 3 | 3 | 0 |
BoyuanFeng | 61 | 1 | 0 | 2 |
malfet | 38 | 1 | 1 | 19 |
justinchuby | 31 | 4 | 1 | 21 |
jansel | 23 | 4 | 0 | 28 |
oulgen | 52 | 0 | 0 | 0 |
bobrenjc93 | 47 | 0 | 0 | 3 |