Weekly GitHub Report for Pytorch: April 14, 2025 - April 21, 2025 (12:02:13)

            Weekly GitHub Report for Pytorch: April 14, 2025 - April 21, 2025 (12:02:13)

            Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant updates, including support for Python 3.13 with torch.compile, a new torch.compiler.set_stance feature for dynamic compilation control, and FP16 support on X86 CPUs. Notably, the release also marks a shift away from publishing on Conda, with a focus on using Manylinux 2.28 for Linux builds, and introduces a backward-incompatible change by setting weights_only=True as the default for torch.load.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

index_select performance: This issue discusses the performance differences between the index_select and gather functions in PyTorch, with benchmarking results indicating that gather is consistently faster and more expressive than index_select. The issue suggests that if further benchmarking supports these findings, the gather function should be expanded to cover all data types supported by index_select, and the index_select kernels should be removed in favor of calling gather.

The comments discuss various aspects of the performance differences, including specific benchmarking results, the impact of recent code optimizations, and the historical context of index_select as a legacy function. Participants also explore the potential for index_select to call gather directly and clarify the differences between various indexing methods in PyTorch.
Number of comments this week: 12

[ONNX]  exported nodes of Multi-head attention can be simplified: This issue involves the export of the nn.multiheadattention layer from PyTorch to ONNX, where the user observes unexpected additional operations in the exported model. The user is questioning whether these additional operations are a bug or a feature of the export process.

The comments discuss the user's expectations and provide a code snippet of the wrapper used around nn.multiheadattention. A request is made for a reproducible script, which the user provides via a GitHub repository. It is explained that the additional operations are due to PyTorch's implementation, and suggestions are made for potential optimizations, including using torch.onnx.export(..., dynamo=True) and considering graph rewrite rules. An optimized version is shared in the user's repository.
Number of comments this week: 7

Sparse tensor conversion performance issues (CPU/GPU): This issue highlights performance concerns related to the conversion of sparse tensors in PyTorch, specifically when converting from dense to sparse formats like COO and CSR on both CPU and GPU. The user reports significant differences in memory usage and processing time between these two conversion methods, with CSR showing unexpectedly high memory consumption and time delays.

The comments discuss potential causes for the performance spikes, with contributors suggesting optimizations and sharing code modifications to address the issues. A proposed fix involves optimizing the memory usage by altering the code responsible for generating indices, and there is a consensus that the memory characteristics of COO and CSR conversions should be comparable. A write-up on the topic has been shared for further discussion.
Number of comments this week: 7

Compatibility with SymPy 1.14.0: This issue is about ensuring compatibility between the new prerelease of SymPy 1.14.0rc1 and the current release of PyTorch, specifically torch==2.6.0, to prevent any potential problems when the final version of SymPy 1.14.0 is released. The user is seeking confirmation on whether the new SymPy version will cause any issues with PyTorch and is asking if PyTorch's continuous integration (CI) tests this prerelease version.

The comments discuss running tests to check compatibility, with initial tests passing locally and in CI. There is a concern about potential issues with the current release torch==2.6.0, but further tests indicate no significant problems. The discussion also mentions the upcoming release of torch==2.7.0, which is expected to be similar to the main branch, providing confidence in compatibility.
Number of comments this week: 6

[export] Warn users when 0/1 specialization happens: This issue addresses the confusion users experience when an axis specified as dynamic is unexpectedly specialized, particularly when a dynamic batch size is intended but an example with batch_size=1 is provided. The proposal is to emit a warning suggesting users change the example dimension size to greater than one to avoid this specialization.

The comments discuss the use of Dim.AUTO versus Dim.DYNAMIC in ONNX export, with Dim.AUTO being used due to constraints with Dim.DYNAMIC. There is a conversation about whether falling back to static dimensions can sometimes allow a model to export correctly, and it is noted that users might set more axes to dynamic than necessary.
Number of comments this week: 6

2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler': This issue involves an ImportError encountered when attempting to import 'triton_key' from 'triton.compiler.compiler', which is causing a backend compiler failure in a PyTorch environment. The error occurs within a Python script that utilizes the OotdPipeline and attempts to compile certain components with Torch's compile function, specifically when using the 'inductor' backend.
Alternate algorithm for computing MaxPool2D under specific condition.: This issue proposes an alternative algorithm for computing the MaxPool2D operation in PyTorch when the stride is equal to 1, suggesting that a kernel size of 5 can be represented by two MaxPool2D operations with a kernel size of 3, and similarly for other kernel sizes. The motivation behind this approach is to reduce computational costs on the CPU by modifying the MaxPool2D layer directly, as demonstrated by testing code that shows a significant speedup in execution time.
cuda_utils.so: failed to map segment from shared object: This issue involves a bug encountered when running a script in a Docker environment with a tmpfs permission set to 1777, where the execution of a cached cuda_utils.so file in the /tmp directory fails due to the absence of the execution bit, despite the directories having the correct permissions. The error occurs during the execution of a PyTorch model, specifically when attempting to map a segment from the shared object, resulting in an ImportError and a BackendCompilerFailed exception, which suggests a problem with the execution rights of the compiled CUDA utilities.
Enable UFMT on all files in PyTorch: This issue involves enabling uniform formatting (UFMT) across all files in the PyTorch codebase, as currently, approximately 1,500 files are excluded from this formatting process. The task requires removing file names from the exclude_patterns in the UFMT section of the .lintrunner.toml file and running a specific command to apply the formatting, with additional preparatory work needed to address known issues such as import cycles and misplaced annotations before the UFMT changes are committed.
[JIT archive] Add a flag to not include debug files: This issue proposes the addition of a flag to the torch.jit.save() function in PyTorch to exclude .debug_pkl files, which are primarily used for debugging purposes and can significantly increase the file size of TorchScript models compared to ONNX models. The motivation behind this feature request is to reduce the size of JIT archives, particularly for small models with quantization, to facilitate more efficient deployment on mobile devices where storage space is limited.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 106
Summarized Issues:

Compilation and Execution Errors: Compilation and execution errors are prevalent in PyTorch, affecting various functionalities. Users report issues such as assertion errors, runtime errors, and incorrect outputs when using features like torch.compile, torch.export, and torch.vmap, often due to device mismatches or unsupported operations. These errors hinder the successful execution of models and require workarounds or fixes to ensure compatibility across different backends and devices.
pytorch/pytorch/issues/151196, pytorch/pytorch/issues/151202, pytorch/pytorch/issues/151204, pytorch/pytorch/issues/151290, pytorch/pytorch/issues/151328, pytorch/pytorch/issues/151470, pytorch/pytorch/issues/151670, pytorch/pytorch/issues/151737

Backend and Device Discrepancies: Discrepancies between different backends and devices are a common issue in PyTorch, leading to inconsistent results and errors. Users experience problems with functions like torch.nn.PairwiseDistance, torch.outer, and torch.linalg.inv, where outputs vary across backends such as Triton, CPP, and Inductor, often due to precision differences or unsupported operations.
pytorch/pytorch/issues/151198, pytorch/pytorch/issues/151400, pytorch/pytorch/issues/151309, pytorch/pytorch/issues/151744

Export and Serialization Issues: Exporting models and handling serialized data across different architectures and formats pose challenges in PyTorch. Users report issues with torch.export, ONNX export, and TorchScript models, where metadata is not preserved, and models fail to load correctly on different architectures, leading to errors and incorrect outputs.
pytorch/pytorch/issues/151200, pytorch/pytorch/issues/151209, pytorch/pytorch/issues/151428, pytorch/pytorch/issues/151476

Performance and Optimization Concerns: Performance issues and optimization challenges are frequently reported in PyTorch, affecting both training and inference. Users encounter problems with high memory usage, slow processing times, and performance regressions, particularly when using features like torch.vmap, torch.compile, and dynamic shapes, prompting requests for optimizations and improvements.
pytorch/pytorch/issues/151286, pytorch/pytorch/issues/151536, pytorch/pytorch/issues/151612, pytorch/pytorch/issues/151705

Documentation and Usability Enhancements: Users frequently request improvements in PyTorch's documentation and usability, highlighting unclear or incorrect information. Issues include misleading documentation for functions like torch.nn.utils.clip_grads_with_norm_ and torch.export, as well as requests for better error messages and user guidance to enhance the overall user experience.
pytorch/pytorch/issues/151222, pytorch/pytorch/issues/151636, pytorch/pytorch/issues/151554, pytorch/pytorch/issues/151559

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 42
Summarized Issues:

PyTorch Functionality Issues: This category includes various issues related to the functionality of PyTorch features and operations. For instance, the torch.set_flush_denormal function does not affect float16 on x86_64 architecture, raising questions about its limitations. Additionally, the torch.compile function fails to handle certain operations and data types, such as quantize_activation on CPU and rms_norm on MPS, leading to errors and incorrect outputs.
pytorch/pytorch/issues/150007, pytorch/pytorch/issues/150605, pytorch/pytorch/issues/150629, pytorch/pytorch/issues/150848

Error Messages and Debugging: Several issues highlight the need for improved error messages and debugging support in PyTorch. For example, vague error messages when running ROCm-specific tests on non-ROCm machines and unclear dynamic shape constraint violations make troubleshooting difficult. Enhancements in error clarity and guidance are suggested to aid users in resolving these problems.
pytorch/pytorch/issues/150041, pytorch/pytorch/issues/151356

Backend and Device Compatibility: Issues in this category focus on compatibility problems with different backends and devices. The MPS backend has several bugs, such as incorrect dtype handling in torch.isin() and unimplemented operators like aten::_linalg_solve_ex.result. Additionally, there are concerns about CUDA version support and device compatibility, such as the lack of support for CUDA 12.1 in PyTorch 2.6.0.
pytorch/pytorch/issues/150967, pytorch/pytorch/issues/151189, pytorch/pytorch/issues/151285, pytorch/pytorch/issues/151443, pytorch/pytorch/issues/151448

Compilation and Export Issues: This group includes issues related to the compilation and export processes in PyTorch. Problems such as the failure of torch.onnx.export to handle dynamic input sizes and the inability to export certain models due to dtype mismatches are highlighted. These issues suggest the need for improvements in the export functionality to handle various scenarios more robustly.
pytorch/pytorch/issues/151010, pytorch/pytorch/issues/151101, pytorch/pytorch/issues/151430

Performance and Optimization: Performance-related issues are also prevalent, such as the significant degradation in Triton operator performance for scaled_dot_product_attention. Suggestions include using alternative operators to improve execution times, particularly in cross-attention scenarios, indicating a need for optimization in PyTorch's compilation strategies.
pytorch/pytorch/issues/151287

Testing and Validation: Issues in this category focus on the need for better testing and validation processes. For instance, the disabling of certain tests due to gradient accuracy issues and the migration of ONNX exporter tests to a new framework highlight the ongoing efforts to ensure the reliability and correctness of PyTorch's features.
pytorch/pytorch/issues/150099, pytorch/pytorch/issues/151699

Documentation Discrepancies: Several issues point out discrepancies between PyTorch's documentation and its actual behavior. These include the behavior of functions like torch.cdist() and torch.nn.Upsample(), where the documentation does not accurately reflect the implementation, leading to confusion and the need for documentation updates.
pytorch/pytorch/issues/151101, pytorch/pytorch/issues/151103

Infrastructure and Build Concerns: This category includes issues related to the infrastructure and build processes of PyTorch. Problems such as build errors with specific compilers and the use of deprecated RPATH in the aarch64 CPU wheel suggest the need for updates and maintenance in PyTorch's build system to ensure compatibility and stability.
pytorch/pytorch/issues/151316, pytorch/pytorch/issues/151550

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 175
Key Open Pull Requests
1. Implement avg_pool3d for MPS backend: This pull request implements the avg_pool3d operation for the MPS backend in PyTorch using a custom Metal shader, enabling users with Apple Silicon GPUs to perform 3D average pooling operations without reverting to CPU, and includes a C++ interface, support for forward and backward passes, comprehensive test cases, and fixes for issues related to Metal command buffer handling and non-contiguous tensors, addressing issues #141287 and #141044.

URL: pull/151742

Merged: No

Associated Commits: 795a7, 4b6c1, 1295e, 82ec0, 45378, 1cd86, 24dd4, 97e0f, e6830, abdad, e18f8, d7bcd, fcea8, fa0cb, 8e959, e45e6, 62abb, 42217, 760f5, c96cc, 9320d, 50975, a835c, 7dcf3, ca868, e6541, b915a, af53d, aed0a, f8ea1, 00eed, 5651b, 25e76, b060f, 47f18, adf75, 598ab

2. [Easy] Fix the function signature of torch.Event: This pull request addresses a discrepancy between the declaration and implementation of the torch.Event function signature in the PyTorch library, proposing a decision on whether to set the enable_timing parameter to False for consistency with torch.cuda.Event or to True to avoid breaking backward compatibility.

URL: pull/151221

Merged: No

Associated Commits: 25eb3, b2acb, 2fcd3, e5a14, 0f9d1, 0fa2b, be90c, 7cfc4, 9e957, e2f73, 5f5c4, 0df2b, dabbf, bf985

3. Broken Links GHA: This pull request introduces a GitHub Action that runs monthly to check for broken links within the repository, and if any are found, it automatically creates an issue listing the problematic links.

URL: pull/151454

Merged: No

Associated Commits: 12318, 22c4f, 6cb4b, 8887f, 73a25, 49982, 261ab, 41ccd, f1f70, c6d60, 7618b, 5329e, d59d2, 00b3f

Other Open Pull Requests

Documentation Enhancements for torch.Event: This topic involves improving the documentation of torch.Event by adding detailed function or class signatures and correcting the display of torch.Event.wait and torch.Event.record. The pull request aims to fix and enhance the documentation to provide clearer and more accurate information for users.
pull/151411

Performance Improvements for CK Gemm on ROCm: The pull request introduces initial changes to improve the performance of CK Gemm on ROCm by reorganizing the CK Gemm code into a dedicated folder. It also adds logic to call CK Gemm with specific templates based on input tensor sizes, adapting the gemm selection logic from the FBGEMM project.
pull/151465

Dynamic Shapes and Symbolic Shapes Enhancements: This topic covers enhancements to the PyTorch project by experimenting with the bound_sympy tool to enable size-oblivious maximum reasoning for dynamic shapes. It addresses compile-time regressions and involves multiple updates to the symbolic_shapes.py file.
pull/151242, pull/151271

Test Skipping Decorators and Class-Level Support: The pull requests address issues with the skipIfXpu and skipIfHpu decorators incorrectly disabling tests when applied to a class. They enhance the functionality by enabling class-level skipping, as part of a series of changes tracked through the ghstack tool.
pull/151315, pull/151420

Infrastructure for Built-in Operations: This topic involves reapplying a previous update to implement infrastructure for handling built-in operations such as min, max, and math.pow. The pull request is part of a stack of changes managed by ghstack, with multiple updates and revisions to ensure non-strict behavior for these operations.
pull/151348

Deprecation of Legacy Host Allocator APIs: The pull requests aim to deprecate the legacy host allocator APIs in favor of a unified API, getHostAllocator(device_type), providing a more streamlined and consistent interface for memory allocation and management tasks. They also plan to move the is_pinned function from AcceleratorHookInterface to HostAllocator and deprecate getPinnedMemoryAllocator.
pull/151437, pull/151439, pull/151531

CUDAAllocator Simplification: This pull request simplifies and reduces redundancy in the CUDAAllocator by removing the custom raw_alloc and raw_delete methods. It uses the existing raw_allocate and raw_deallocate methods from c10::Allocator, which are now virtual to allow for customization by other allocators.
pull/151305

Cutlass Component Enhancements: The pull requests address fixes for end-to-end (e2e) compilation issues related to argument rendering in the Cutlass component. They also enhance the Cutlass library by adding epilogue inputs and outputs to the def_kernel function, as part of a series of related updates tracked through the ghstack tool.
pull/151405, pull/151406

Caching and Fake Tensors: This pull request introduces a feature for caching fake tensors when the output is None, as part of a series of changes in the PyTorch project. It includes multiple updates and commits refining the implementation.
pull/151410

torch.arange() Precision Fix: This pull request addresses a corner case issue in the torch.arange() function where casting start, end, or step values to int64_t could lead to precision loss. It implements a workaround using double arithmetic for values within the exact representable range of double for consistency across devices.
pull/151206

MixtureSameFamily Distribution Bug Fix: This pull request addresses a bug in the PyTorch library related to the MixtureSameFamily distribution by ensuring that sample validation occurs after padding in the log_prob method. It corrects the support to match the component distribution with the first event dimension removed.
pull/151317

ROCm CI Environment Upgrade: This pull request aims to upgrade the ROCm Continuous Integration (CI) environment to ROCm version 6.4. It involves updates to all ROCm GitHub workflows to use the Jammy distribution and modifications to the install_rocm.sh script.
pull/151368

Test Skipping and SM89 Tests: This pull request addresses the need to skip Triton tests for MPS and modifies the reason for skipping SM89 tests to not rely on the IS_BIG_GPU condition. It combines improvements from two previous pull requests.
pull/151506

Guard Checking Logic Refactor: This pull request refactors the guard checking logic by lifting it to the AOTAutogradCache. It involves creating a new GuardedCache class and adding a check_guard_hit lambda to FXGraphCache.
pull/151563

Generalized Installation Process: This pull request aims to generalize the installation process to accommodate inputs that are neither explicitly defined nor capable of being flattened by pytree. It is part of a series of updates and commits in the PyTorch project.
pull/151588

Enhancements for Static Value Detection: This pull request introduces enhancements to the PyTorch project by allowing the use of statically_known_true in user code. It adds a new function has_static_value to determine if an input has a static boolean, float, or integer value.
pull/151601

Compile-Time Traces for invoke_subgraph: This pull request introduces compile-time traces for the "invoke_subgraph" feature in the PyTorch project. It is part of a series of related changes managed through the ghstack tool.
pull/151409

DTensor HOP Dispatch Feature: This pull request introduces a feature called "DTensor HOP dispatch" to the PyTorch project. It involves multiple updates and is part of a stack of related changes, with testing focused on distributed tensor attention functionality.
pull/151497

Autocast Context Manager Handling: This pull request addresses the handling of the autocast context manager within the hierarchical compile process in the PyTorch project. It is part of a series of related changes tracked by ghstack.
pull/151294

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 238
Key Closed Pull Requests
1. Implement fexp for avx2 and avx512: This pull request implements a fast exponential computation (fexp) for AVX2 and AVX512 architectures to optimize flash attention on X86 with F16 support, based on a 2015 paper by Malossi et al., achieving up to 20% faster performance for mixed precision flash attention compared to the current implementation, with precision valid in hybrid mode fp32 -> f16 due to casting during store operations, as demonstrated by benchmarks on a Xeon 6972P machine.

URL: pull/151434

Merged: No

Associated Commits: ffbd8, b5fa1, a52fd, f1ab9, df83f, fed42, 6f522, 72bcf, 3ef90, ddc41, 9503d, e30d7, 72e61, 756ed, 86968, 15111, b8ac9, 063c9, ad359, 23ba1, 63ff9, baed5, 20ad3, 767b6, 992b2, b74dd, 32a7e, 555e7, 7b71e, af779, 17fba, db9fa, d4955, 144fd, 3de4b, b3b50, 5920b, 17b07, 76a16, cfba6, 100d1, 3ce4f, ba7e3, 3beb3, 9db67, 24ead, 601f9, 4c7c6, 7eb42, 1a71d, 05a99, 11c49, 710a9, 37d34, efadc, 89359, d4e86, 2b00c, 84107, a8ec5, cd7ab, 95fcf, e11f7, cce99, 56bc3, ad432, 433f1, 7ec0e, 33439, 767a1, f7eb1, 93fb6, 913a7, 504b4, dca15, dc656, 20771, 34daf, 40ba5, 1eec6, 23612, 9f133, adf1a, 14f2b, 4dc1e, 34046, 32207, 12bc1, 0bb49, 27a22, de67c, 92aad, a63a2, b9a66, b4da5, 9a918, d5213, 757ae, f2f52, 90b50, fbb55, 13a85, b6387, b303d, 40535, d7d47, 34a56, 4f4a8, 4ed7e, 2412a, 0ec38, d6ec8, 80c60, 0316d, 82566, 6e949, 8c787, 29fe9, dd59e, 51449, 2f66c, b7ec7, 3dcb5, 69507, 5b445, 836bd, 36c5d, ba255, 2ff8b, fd214, 8d740, 2265b, 71670, f361b, ea571, 83b88, 73ccc, 3301c, c9d43, e15c8, 5e8bb, 46fc5, ef561, 8447e, 4b3b8, 4318e, c4b1b, 6aaeb, 20cfc, 7a592, 9f0cb, f4f6e, d4eb2, 4874c, 3c33f, 95fe6, a9274, 81dd4, d388f, 2b90f, 84764, f5fd7, c1983, 9b183, 382e1, a3ce8, 714df, 5578b, 3c46a, 43c32, 042fc, cbdd4, 96280, 9bb11, 11904, 27354, 54295, 815a3, 9b64e, 16b07, 11c0c, e3bc4, 1d3ae, db0af, 17e36, b4ce4, f413a, 3aafc, 837ed, d2b00, 950de, cf99c, 52142, 19dd7, bcc56, 157c7, 9f200, 7be16, c7962, 4a017, 1380d, 2536b, 848c7, 69508, 85cf8, a46fe, 60469, 9553e, 18ada, 2b229, ddeaf, 94479, 7c71f, 74797, dc848, 0351c, 328a0, 9b7d8, b9865, b679c, fc5f8, 03818, 41ecf, 7d4be, 2c780, 4e4fc, 07c9b, 43f92, 89bb4, 58af4, 5600b, ed199, 433d7, ba083, f0ced, 2e490, 91661, b6536, 8d1e5, 24a75, 3d3bb, 40868, c8a27, fc72f, 8cc68, 5c334, 8875d, 88b9f, 3b768, f1612

2. Maxpool Perf Improvement targeting resnet scenarios: This pull request aims to improve the performance of the max pooling operation, specifically targeting scenarios involving ResNet architectures, as indicated by the title "Maxpool Perf Improvement targeting resnet scenarios."

URL: pull/151720

Merged: No

Associated Commits: e1306, 4a3da, ba275, cea56, 612fc, 828d6, 6b14e, 9e315, 98870, 813e0, a889c, 4b5bf, 1db2a, 39209, 19544, b7de7, fb276, becdf, ffed7, c5e52, dd732, 4e6a9, 7a007, 7c550, 6e867, 9b80d, 4b030, cbe47, 2e2c0, 17157, ecd33, c2578, 39641, 6101a, 24bd8, aa574, bc421, 051df, 550ed, 57717, d80f5, 69ed7, 70298, 17d25, 058d3, 8a71e, 8af31, f8c4c, 3a541, 1a0b1, 0b45a, 0b1b6, 1de13, 53752, 783a6, 119e7, 417a0, 32f58, 7d26c, 4be8e, ff940, f6ad6, da863, 50eb2, bb7fd, 8fa58, d8002, ba7c2, 00314, dc956, b253b, 757cb, 8a12b, db943, 5e412, 6efce, 79740, b654f, 362be, 2acd2, aaa31, ed475, 09af6, 088b8, dca35, 02220, 6df27, dca53, 08c07, d88da, feade, 5b76f, 8ec01, 06b6a, f0fb4, 2096c, 75e26, 0e2b4, 9ba9a, 6a4c4, 17dc4, 49c8b, 72908, 7d7ec, 5d212, c3ba1, f0927, abbfe, f069c, f8544, 1ed41, 826ee, b0ea6, e814e, f6389, 78a47, f0207, c5667, d8a7a, 45946, 61ba0, 01137, 0a0be, 78426, a4935, f929e, 33911, 4bed2, 60cb6, dacf5, ff48a, 80f18, 47074, b5380, 5d018, 6a281, 8b752, 8b59e, 38c82, a6f13, 4b515, 8e47c, 23d1a, f27cc, a07b6, 80e18, 1550e, 8ecf0, 4ed5c, 595a2, 8b7ad, 58e54, a02ca, 0e782, c040e, dda59, e481a, a1efa, 4b826, 5eaa4, 0e8bd, 2b906, 01a0b, fb716, 94e61, 94412, fe82c, 973af, cb954, 5a980, 45e62, f2ca4, a92b4, b2e4f, 0119c, 6d856, a0985, a0aaf, 5ca0b, 64033, fc456, 84209, 6c84d, 74fb9

3. Propagate callable parameter types using ParamSpec (#142306): This pull request aims to enhance the PyTorch codebase by propagating callable parameter types using ParamSpec, addressing partial issues related to type annotations and mypy compatibility, and includes various commits that reorder function parameters, adjust return types, and make formatting changes to satisfy linters and avoid type errors.

URL: pull/151014

Merged: No

Associated Commits: b83b6, f7218, ce461, fccfb, 5af2d, 9bdb2, 7a6f6, f36a4, 0a002, ebef9, 42666, 27d84, 65c28, 81d59, ef2dc, 003b8

Other Closed Pull Requests

Event Handling Enhancements: This topic covers improvements in PyTorch's event handling, addressing issues with event_id always being 0 and adding checks for elapsedTime. These changes aim to reduce user confusion and ensure robustness through additional tests.
pull/151226, pull/151404

GEMM and Matrix Operations: Enhancements in matrix operations include support for submatrices in GEMM and ScaledGEMM within the ROCm framework and an Aten GEMM overload for FP32 output from FP16/BF16 inputs. These updates improve functionality and efficiency, although the latter was not merged.
pull/151138, pull/150812

Memory Management and Optimization: The introduction of a HostAllocator class standardizes host memory management across backends, while optimizations in graph partitioning and dispatching mechanisms enhance performance. These changes aim to improve maintainability and reduce computational overhead.
pull/151431, pull/150814, pull/151107

Symbolic Shape Handling: The addition of sym_and and sym_or functions allows for variadic arguments, simplifying symbolic expressions. This enhancement preserves symbolic expressions for better runtime assertions and branch preservation.
pull/150456

Compile Time and Tracing Improvements: Enhancements in compile time tracing within the AOT autograd component include logging and timing mechanisms. These changes aim to optimize the compilation process by addressing significant missing gaps.
pull/151256

Tiling and Kernel Optimization: The removal of unnecessary singleton tiling splits optimizes Triton kernel generation by eliminating superfluous dimensions. This change reduces computational overhead and improves kernel fusion efficiency.
pull/151508

CI and Testing Enhancements: Continuous integration for the "openreg" component is enabled by relocating test files and updating documentation. These changes ensure better testing coverage and integration.
pull/151007

Compilation Warnings and Functionality: Addressing compilation warnings in the BlasKernel component involves removing unused functions, while changes to avoid specializing min/max functions improve code quality. These updates enhance maintainability and functionality.
pull/151302, pull/151347

User Experience and Error Handling: Enhancements in error messages for relaxed constraints and validation of inputs in torch.autograd.backward improve user experience. These changes involve multiple updates and contributions.
pull/151407, pull/150975

Inductor and Libdevice Operations: The removal of unnecessary libdevice operations in the inductor component optimizes code generation. This change eliminates extra operations that were previously needed for dispatching.
pull/151562

ONNX Export and Data Type Handling: Fixes in the ONNX export process address incorrect conversion of bfloat16 initializers. These changes ensure proper handling of bfloat16 data types in PyTorch models.
pull/151121

No-Operation Elimination: Enhancements in noop elimination for slice and slice_scatter operations improve efficiency. These changes include tests and improvements, although they were not merged.
pull/151175

Standalone Compile Function Improvements: Updates in the standalone_compile function ensure correct handling of multiple return values and prevent mutations. These changes enhance the function's reliability and integration with custom backends.
pull/151502

Flash Attention and Tensor Handling: Saving Q, K, V tensors in flash attention processes involves debugging and adding annotated kernels. These changes focus on improving the attention mechanism's efficiency.
pull/151073

Tensor Release and Memory Management: Addressing tensor release issues with pin_memory involves multiple updates. These changes improve memory management and are supported by related pull requests.
pull/151091

MPS Backend Benchmarking: Initiating benchmarking for the MPS backend assesses compile results for pass rates and speedup. These changes involve a series of commits and discussions to enhance performance evaluation.
pull/151155

ROCm and CI/CD Process: The creation of ROCm 6.4 images as part of CI/CD omits the magma tarball and includes updates like switching to Ubuntu 22.04. These changes aim to streamline the integration and delivery process, although they were not merged.
pull/151236

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 

[WIP] Move is_pinned to host allocator

Toxicity Score: 0.55 (Frustration expressed, Defensive tone, Unresolved tension.)
This GitHub conversation involves multiple users discussing a work-in-progress pull request. User1 initially provides a solution, but User2 expresses frustration over its ineffectiveness, leading to a tense exchange. User3 attempts to mediate by suggesting alternative approaches, but User1's defensive tone exacerbates the situation. The conversation remains unresolved, with underlying tension due to differing opinions on the implementation strategy.

TEST CACHE

Toxicity Score: 0.55 (Defensive responses, Frustration expressed, Lack of clarity)
This GitHub conversation involves username1 and username2, where username1 initially provides feedback on a proposed change, and username2 responds with a defensive tone. The conversation escalates as username1 expresses frustration over the lack of clarity in username2's explanations, leading to a tense exchange.

[fake tensor cache] Support index with non bool/int8 indices

Toxicity Score: 0.55 (Frustration expressed, defensive responses, mediation attempts, unresolved dissatisfaction.)
This GitHub conversation involves several users discussing a proposed change, with username1 expressing frustration over the lack of progress and username2 responding defensively. The tone shifts from collaborative to tense as username3 attempts to mediate, but username1's continued dissatisfaction triggers further tension.

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

malfet
188
27
12
139

FFFrog
128
15
0
14

anijain2305
106
19
18
10

mlazos
126
18
0
4

pianpwk
99
20
2
25

laithsakka
85
17
7
32

guilhermeleobas
118
13
1
0

guangyey
91
9
0
31

justinchuby
49
6
6
69

StrongerXi
81
6
16
23

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
malfet	188	27	12	139
FFFrog	128	15	0	14
anijain2305	106	19	18	10
mlazos	126	18	0	4
pianpwk	99	20	2	25
laithsakka	85	17	7	32
guilhermeleobas	118	13	1	0
guangyey	91	9	0	31
justinchuby	49	6	6	69
StrongerXi	81	6	16	23