Weekly GitHub Report for Pytorch: March 10, 2025 - March 17, 2025

                        March 17, 2025

            Weekly GitHub Report for Pytorch: March 10, 2025 - March 17, 2025

                    Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
The PyTorch 2.6 release, created on January 29, 2025, introduces significant updates including support for torch.compile with Python 3.13, a new performance-related feature torch.compiler.set_stance, and enhancements to AOTInductor. Notable changes include the deprecation of PyTorch's official Anaconda channel, the introduction of FP16 support on X86 CPUs, and a backward compatibility-breaking change in the default behavior of torch.load.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 
As of our latest update, there are no active issues with ongoing comments this week. 
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 
As of our latest update, there are no stale issues for the project this week. 
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 0
Summarized Issues:
As of our latest update, there are no open issues for the project this week.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 10
Summarized Issues:

PyTorch Dynamo Bugs: Issues in PyTorch's Dynamo component include a bug where constant tensors created with torch.tensor do not recompile correctly with device guards when the ambient device changes, leading to CUDA device mismatches. Another issue involves the SETUP_WITH implementation deviating from CPython documentation, causing crashes due to incorrect stack handling during exception unwinding.
issues/147405, issues/147776

Gradient and Functionality Issues in PyTorch: PyTorch faces challenges with gradient support and function accuracy, such as the lack of gradient support for residuals in torch.linalg.lstsq, which affects computational efficiency. Additionally, the torch.nn.functional.hardswish function has incorrect gradient calculations at boundary points, leading to unexpected results.
issues/147543, issues/147801

Torchinductor Backend and Dtype Support: The torchinductor backend in PyTorch struggles with dtype support, particularly with torch.float8_e8m0fnu, where the current implementation fails to return tensors correctly, impacting MX workflows. Furthermore, there is a need for roundtrip casting support between float32 or bfloat16 and the e8m0 format to enable efficient operations without errors.
issues/147873, issues/147875

Model Accuracy and Backend Errors: Certain models like DebertaV2ForMaskedLM and eca_halonext26ts experience accuracy failures during the max_autotune process due to a suspected commit causing a LoweringException error. This results in an AssertionError related to the View operation in the PyTorch Inductor backend, affecting model performance.
issues/148074

CUDA Backend Discrepancies: The nn.MultiheadAttention module in PyTorch shows significant output discrepancies when using the CUDA backend with the Triton compiler compared to the CPU, particularly when applying the torch.reciprocal function. This suggests a potential bug or tolerance issue in the CUDA implementation that needs addressing.
issues/148153

FSDP2 Module and Parameter Management: The fully_shard function in PyTorch's FSDP2 module has unclear behavior regarding the ignored_params feature, leading to errors during forward computation due to mixed device types. This raises questions about the management and device movement of buffers and ignored_params.
issues/148242

Project Setup and Compatibility: Updating the setup.py file to use the recursive glob feature for the package_data field in setuptools v62.3.0 is necessary to simplify header file inclusion. This update ensures compatibility with the project's minimum supported Python version, streamlining the setup process.
issues/148256

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 0
As of our latest update, there are no open pull requests for the project this week.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 24
Key Closed Pull Requests
1. [CUDAGraph] Graph Partition: This pull request implements a CUDA graph partitioning feature in PyTorch, building on a previous inductor graph partitioning effort, to enable more efficient execution by allowing CUDA graphs to be used even when CPU operations are present, as demonstrated through a Python example and various code improvements and tests included in the commits.

URL: pull/147648

Merged: No

Associated Commits: bcf8c, 8eaf0, 0f84d, 4c9b7, fc377, b4756, d81f1, b552e, 011ad, 3577e, 87825, ef010, 86daf, 6d450, 4f057, 5e44a, dc329, b7cd3, 7d749, d4415, 24991, 914db, cda3a, dc7ad, 1c44f, f2e4f, e2c61, 58815, 6552f, 3aef7, 9461c, 4719f, e1cce, 49d14, 77704, 877e9, 5e34e, c70c4, da3d8, ed1ce, bb180, 091bd, d7db6, 73213, 7749a, 6f2e2, dd113, 8b2c6, 0881a, 47504, 85a4f, dde1c, 70215, eeeeb, 5aade, 8026a, 45bec, 139bb

2. Force build to conform C++ standard on windows by adding /permissive- flag: This pull request addresses the need for the PyTorch project to conform to the C++ standard on Windows by adding the /permissive- flag to the torch_compile_options, which resolves issues such as error C2440 when converting string literals to non-const pointers and ensures compatibility with Visual Studio's default settings for new projects.

URL: pull/147367

Merged: No

Associated Commits: 983bd, 28fef, 4ac80, ac342, 53d1b, e69e1, 43e17, cea07, 30c92, fe650, 6e4a6, c7cf5, 59531, f9413

3. [fx] Move Node._prepend/Node._remove_from_list to C++: This pull request involves moving the methods Node._prepend and Node._remove_from_list from Python to C++ in the PyTorch project, resulting in improved performance as demonstrated by a microbenchmark that shows a reduction in function calls and execution time.

URL: pull/148261

Merged: No

Associated Commits: f0f85, 88a92, 9fca6, 0c394, abd68, db402, 9401c, 621c3, a3ab2

Other Closed Pull Requests

Performance Enhancements by Moving Functions to C++: Several pull requests focus on improving the performance of the PyTorch library by moving functions like Node._update_args_kwargs and map_aggregate to C++. These changes aim to reduce function calls and execution time during symbolic tracing, as demonstrated by microbenchmarking results, although not all were merged.
pull/148260, pull/148243

Distributed Job Stability: A pull request addresses potential hangs in distributed jobs by banning compiler-driven recomputation of collectives. It ensures consistent decisions across ranks and proposes future enhancements like an spmd_mode flag for safe collective recomputation.
pull/147561

Backend and Build Process Improvements: Multiple pull requests aim to enhance the PyTorch build process and backend functionality. These include using CK as the backend for memory-efficient attention in ROCm, ensuring the CK submodule's config.h file is used, and enabling XPU to utilize Visual Studio 2019 for building.
pull/147778, pull/147993, pull/147448

Meta Functions and Backend Modifications: A pull request adds meta functions for "out" variants of certain aten functions, addressing a specific issue. Another modifies the Cutlass backend by removing an assertion that prevented self-multiplication.
pull/147350, pull/148233

Compatibility and Compilation Adjustments: Several pull requests focus on compatibility and compilation issues, such as fixing atomic operations on ARMv8-A architecture and ensuring correct parsing of OpenMP flags by clang-cl on Windows.
pull/148070, pull/148097

Gradient and Indexing Enhancements: Pull requests address issues in gradient computation for torch.nn.functional.hardswish and enhance backwards indexing functionality on ROCm when the stride is not equal to one.
pull/148049, pull/147630

Continuous Integration and Dispatch Logic Updates: A pull request proposes changes to the continuous integration process to prevent workspace cleaning, while another updates dispatch logic for linear layers using BF16 to utilize oneDNN for better performance.
pull/147994, pull/148197

Quantization and Setuptools Enhancements: A pull request enables a fast path for statically quantized matrix multiplications on AArch64, integrating the Arm Compute Library for significant performance improvements. Another aims to enhance the build process by adding recursive glob support to setuptools.
pull/147337, pull/148258

Unimplemented Function Replacement: A pull request involves replacing the unimplemented function with unimplemented_v2 in a specific file, as part of addressing an issue, although it has not been merged yet.
pull/148177

SymPy Library and Torch Compile Adjustments: Pull requests address an issue with floating-point number printing in the SymPy library and ensure torch.compile respects the priority_order setting of the sdpa_kernel.
pull/147552, pull/147768

Drafting and Stride Consistency: A pull request involves drafting a stable version of the Torch library, while another ensures stride consistency in a while loop for the body_fn function, although neither was merged.
pull/148124, pull/148002

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

mikaylagawarecki
80
0
1
7

williamwen42
61
2
2
11

zou3519
38
7
4
27

clee2000
62
3
3
0

BoyuanFeng
61
1
0
2

malfet
38
1
1
19

justinchuby
31
4
1
21

jansel
23
4
0
28

oulgen
52
0
0
0

bobrenjc93
47
0
0
3

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
mikaylagawarecki	80	0	1	7
williamwen42	61	2	2	11
zou3519	38	7	4	27
clee2000	62	3	3	0
BoyuanFeng	61	1	0	2
malfet	38	1	1	19
justinchuby	31	4	1	21
jansel	23	4	0	28
oulgen	52	0	0	0
bobrenjc93	47	0	0	3