Weekly GitHub Report for Pytorch: April 21, 2025 - April 28, 2025 (12:01:41)

            Weekly GitHub Report for Pytorch: April 21, 2025 - April 28, 2025 (12:01:41)

            Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant updates including support for torch.compile with Python 3.13, a new performance-related feature torch.compiler.set_stance, and FP16 support on X86 CPUs. Notably, the release marks a shift away from publishing on Conda, with a focus on using official wheel packages, and introduces a backward-incompatible change by setting weights_only=True as the default for torch.load, enhancing security.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

construcing DTensor on a 2D device mesh SIGTERMs: This issue involves a bug encountered while constructing a DTensor on a 2D device mesh using PyTorch, which results in SIGTERM errors during execution. The problem arises specifically when running a script on a multi-GPU setup, where the last example in a tutorial fails due to segmentation faults, potentially linked to the interaction between init_process_group and init_device_mesh.

The comments discuss attempts to reproduce the error, with some users unable to replicate it locally. Suggestions include checking CUDA device settings and using torch.cuda.set_device. The issue seems related to device initialization, with a proposed solution to avoid using both init_process_group and init_device_mesh simultaneously. Debugging efforts include using gdb to capture stack traces, and there is a consensus that the problem might be due to a conflict in device handling between the two initialization methods.
Number of comments this week: 19

RFC: The State of Custom CUDA extensions in PyTorch: This issue discusses the current state and challenges of implementing custom CUDA extensions in PyTorch, highlighting the trade-offs between different methods such as torch.utils.cpp_extension.load_inline() and the proposed torch.cuda._compile_kernel(), which aims to significantly reduce compilation times by leveraging nvrtc. The document also explores potential improvements and integrations with other PyTorch components, such as AOTInductor, and considers the future direction of CUDA kernel development in the context of emerging Pythonic DSLs for GPU programming.

The comments discuss the need for higher-level APIs to improve usability, the potential integration of torch.cuda._compile_kernel with torch.compile and AOTInductor, and the importance of supporting complex data structures and headers for performance. There is also a focus on leveraging cuda-python for better access to device libraries and improving tensor marshaling, with examples provided to illustrate potential implementations.
Number of comments this week: 9

Inconsistent sum/dot/norm behavior: This issue highlights the inconsistent behavior of the sum, dot, and norm functions in PyTorch when dealing with large float32 arrays, where torch.sum is noted for its precision, while torch.linalg.norm is slower and less accurate, especially when using multiple CPU threads. The user is seeking clarification on how to achieve consistent results across different PyTorch versions and whether there is a way to normalize these discrepancies.

The comments discuss the expected numerical inaccuracies due to the precision limits of float32, the impact of operations like square root and squaring on precision, and the potential reasons for the observed discrepancies, such as rounding errors and the use of different algorithms like Kahan summation. There is also a mention of the lack of documentation on these behaviors and a suggestion to generalize existing tooling to improve consistency, with an invitation for contributions to the codebase.
Number of comments this week: 8

[rfc][c10d] RDMA APIs (read/write, rkey): This issue discusses the need for RDMA-like APIs in PyTorch that allow communication without requiring both sides to initiate the process, which is beneficial for distributed inference, checkpointing, and other advanced use cases. The proposal includes two design options: using the ProcessGroup API for manual tensor operations and creating a "Ghost" Tensor subclass for more abstracted tensor registration and exchange.

The comments explore the implications of the proposed designs, debating serialization and implicit transfers, and suggest alternatives like a collective handle creation. They discuss the value of using side channels for handle transmission, compare RMA and PGAS models, and address technical details like file descriptor handling in symmetric memory.
Number of comments this week: 7

Expanding subset of tensor reads wrong memory: This issue describes a bug in PyTorch where expanding a subset of a tensor on a CUDA device results in incorrect memory reads, producing unexpected output when the function is compiled with torch.compile(). The problem does not occur when using a CPU, not compiling the function, or when cloning the tensor elements, and it is influenced by the use of different data types and the dynamic parameter.

The comments discuss attempts to reproduce the issue with different configurations, noting that using dynamic=True affects the output, and highlight the importance of using different values for n to trigger the bug. There is also a discussion on how the output varies with different data types, and a suggestion to clone tensor elements to achieve the expected results.
Number of comments this week: 6

2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler': This issue involves an ImportError encountered when attempting to import 'triton_key' from 'triton.compiler.compiler', which is causing a backend compiler failure in a PyTorch environment using the 'inductor' backend. The problem arises during the execution of a Python script that utilizes the OotdPipeline and attempts to compile certain components with Torch's compile function, specifically affecting users working with PyTorch version 2.4.0.dev20240330+cu121 on an Ubuntu 22.04.3 LTS system with CUDA 12.1.
Alternate algorithm for computing MaxPool2D under specific condition.: This issue proposes an alternative algorithm for computing MaxPool2D in PyTorch when the stride is equal to 1, suggesting that a kernel size of 5 can be represented by two MaxPool2D operations with a kernel size of 3, and similarly for other sizes, to reduce computational cost on the CPU. The approach aims to optimize performance by decreasing the computation for each cell, and testing has shown a speedup of approximately 1.293 times compared to the traditional method.
cuda_utils.so: failed to map segment from shared object: This issue involves a bug encountered when running a PyTorch model within a Docker container, where the execution of a cached cuda_utils.so file in the /tmp directory fails due to a missing execution permission, despite the directory having the correct permissions. The error occurs specifically when using a tmpfs with a permission setting of 1777, and the problem persists even when the script is executed with root privileges, which should theoretically allow full execution rights.
Enable UFMT on all files in PyTorch: This issue addresses the need to apply uniform formatting (UFMT) to approximately 1,500 files in the PyTorch codebase that are currently exempt from this formatting standard. The process involves removing file names from the exclude_patterns in the UFMT section of the .lintrunner.toml file and running a specific command to ensure all files adhere to the desired formatting, with additional preparatory work required to resolve known issues in certain files before applying the UFMT changes.
[JIT archive] Add a flag to not include debug files: This issue proposes the addition of a flag to the torch.jit.save() function in PyTorch to exclude .debug_pkl files, which are primarily used for debugging purposes and can significantly increase the file size of TorchScript models compared to ONNX models. The motivation behind this feature request is to reduce the size of JIT archives, particularly for small models with quantization, to facilitate more efficient deployment on mobile devices by eliminating unnecessary debug files that do not affect the model's functionality.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 110
Summarized Issues:

Inconsistent behavior in PyTorch functions: This issue highlights the inconsistent behavior and precision discrepancies in PyTorch's sum, dot, and norm functions when handling large float32 arrays. It notes that torch.sum is the most precise, while torch.linalg.norm is slower and less accurate, and seeks guidance on achieving consistent results across different versions and hardware configurations.
pytorch/pytorch/issues/151761

ONNX export failures and feature requests: Several issues discuss failures and feature requests related to ONNX export in PyTorch. These include the failure of ONNX export with enabled_gqa and scaled_dot_product_attention, the need for a decomposition for the searchsorted function, and the unsupported operator 'aten::lift_fresh' in opset version 17.
pytorch/pytorch/issues/151762, pytorch/pytorch/issues/151793, pytorch/pytorch/issues/151932

Bugs in PyTorch's tensor operations and compilation: Various issues report bugs in PyTorch's tensor operations and compilation processes. These include a bug with the .t() method on a tensor subclass, inefficiencies in __torch_function__ handling, and incorrect memory reads on CUDA devices.
pytorch/pytorch/issues/151771, pytorch/pytorch/issues/151776, pytorch/pytorch/issues/151799

Feature requests for PyTorch functions: There are requests for new features in PyTorch functions, such as an option to disable gradient caching in torch.func.jvp and enhancements to the torch.distributed.tensor.debug.visualize_sharding function.
pytorch/pytorch/issues/151782, pytorch/pytorch/issues/151857

Bugs and inefficiencies in PyTorch's export and deserialization: Issues highlight bugs in PyTorch's export and deserialization processes, such as incorrect clamping of ShapeEnv range information and the lack of support for certain operations on the MPS backend.
pytorch/pytorch/issues/151809, pytorch/pytorch/issues/151820

Performance and profiling issues in PyTorch: Several issues report performance and profiling problems in PyTorch, including inefficiencies in printing SymPy expressions, unexpected profiling results, and slow torch.bmm operations with BF16 tensors.
pytorch/pytorch/issues/151823, pytorch/pytorch/issues/151829, pytorch/pytorch/issues/151934

Bugs in PyTorch's distributed and parallel processing: Issues describe bugs in PyTorch's distributed and parallel processing, such as errors with DTensor on a 2D device mesh and unexpected profiling results with torch.add.
pytorch/pytorch/issues/151858, pytorch/pytorch/issues/151829

Documentation and implementation discrepancies in PyTorch: Several issues highlight discrepancies between PyTorch's documentation and implementation, such as the CosineAnnealingLR scheduler formula and the torch.bernoulli() function signature.
pytorch/pytorch/issues/152081, pytorch/pytorch/issues/152095

Bugs in PyTorch's autograd and gradient computation: Issues report bugs in PyTorch's autograd and gradient computation, such as memory leaks during backward passes and incorrect gradient results with torch.log1p.
pytorch/pytorch/issues/152063, pytorch/pytorch/issues/152088

Bugs in PyTorch's compilation and execution: Various issues describe bugs in PyTorch's compilation and execution, such as segmentation faults with torch.fliplr and incorrect iterator behavior in torch.compile.
pytorch/pytorch/issues/152085, pytorch/pytorch/issues/152262

Bugs in PyTorch's memory management and usage: Issues highlight problems with PyTorch's memory management, such as unexpected peak memory usage with FSDP and out-of-memory errors during training.
pytorch/pytorch/issues/152263, pytorch/pytorch/issues/152135

Bugs in PyTorch's sharding and distributed operations: Issues report bugs in PyTorch's sharding and distributed operations, such as the lack of a sharding strategy for aten.masked_fill_.Scalar and inefficiencies in NCCL backend operations.
pytorch/pytorch/issues/152249, pytorch/pytorch/issues/152220

Bugs in PyTorch's function implementations: Various issues describe bugs in PyTorch's function implementations, such as incorrect behavior with torch.flipud and runtime errors with torch.compile.
pytorch/pytorch/issues/152253, pytorch/pytorch/issues/152162

Bugs in PyTorch's testing and CI processes: Issues highlight bugs in PyTorch's testing and CI processes, such as segmentation faults during builds and disabled tests on ROCm platforms.
pytorch/pytorch/issues/152182, pytorch/pytorch/issues/152168

Bugs in PyTorch's attention and sharding mechanisms: Issues report bugs in PyTorch's attention and sharding mechanisms, such as discrepancies in attention module outputs and the lack of support for scaled dot product attention.
pytorch/pytorch/issues/152261, pytorch/pytorch/issues/152257

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 65
Summarized Issues:

Internal Compiler Errors and Segmentation Faults: This topic covers issues related to internal compiler errors and segmentation faults encountered during PyTorch compilation or execution. These errors often arise due to incompatible compiler versions or incorrect environment configurations, leading to unexpected crashes or runtime errors.
pytorch/pytorch/issues/150174, pytorch/pytorch/issues/151055

Documentation and Usability Concerns: Several issues highlight the need for improved documentation and usability in PyTorch. These include clarifying function arguments, ensuring accurate descriptions of supported features, and addressing discrepancies that lead to user confusion or errors during implementation.
pytorch/pytorch/issues/150181, pytorch/pytorch/issues/151105, pytorch/pytorch/issues/151636, pytorch/pytorch/issues/151784

Performance and Optimization Issues: Performance regressions and optimization challenges are common in PyTorch, affecting operations like Conv2D and tensor conversions. These issues often require detailed profiling and adjustments to achieve desired efficiency across different hardware platforms.
pytorch/pytorch/issues/150514, pytorch/pytorch/issues/151039, pytorch/pytorch/issues/151351

Export and Compatibility Problems: Exporting models to ONNX or other formats can encounter compatibility issues, particularly with unsupported operators or dynamic dimensions. These problems necessitate updates to the export process or additional support for specific operations to ensure successful model deployment.
pytorch/pytorch/issues/150986, pytorch/pytorch/issues/151648, pytorch/pytorch/issues/152018

Test Failures and Disabling: Certain tests in the PyTorch project are disabled due to failures on specific platforms, such as ROCm, indicating underlying compatibility or configuration issues. These failures require investigation and resolution to ensure robust testing across all supported environments.
pytorch/pytorch/issues/151078, pytorch/pytorch/issues/151081, pytorch/pytorch/issues/151082, pytorch/pytorch/issues/151083, pytorch/pytorch/issues/151084, pytorch/pytorch/issues/151085, pytorch/pytorch/issues/151086, pytorch/pytorch/issues/151087, pytorch/pytorch/issues/151088, pytorch/pytorch/issues/151089, pytorch/pytorch/issues/151090

Bugs in PyTorch Operations: Various bugs in PyTorch operations, such as incorrect results or runtime errors, are reported across different functions and backends. These issues often require fixes in the underlying implementation to ensure correct and reliable behavior.
pytorch/pytorch/issues/150674, pytorch/pytorch/issues/150776, pytorch/pytorch/issues/150851, pytorch/pytorch/issues/150853, pytorch/pytorch/issues/151522, pytorch/pytorch/issues/151523, pytorch/pytorch/issues/151589, pytorch/pytorch/issues/151610, pytorch/pytorch/issues/151735, pytorch/pytorch/issues/152205

Security and Vulnerability Concerns: Security vulnerabilities, such as potential remote code execution, are critical issues that require prompt attention and fixes. These vulnerabilities highlight the importance of secure coding practices and thorough testing to prevent exploitation.
pytorch/pytorch/issues/152006

Dependency and Installation Issues: Conflicts and errors during installation, often due to dependency mismatches or incorrect configurations, can hinder the setup of PyTorch environments. Resolving these issues typically involves updating or aligning package versions to ensure compatibility.
pytorch/pytorch/issues/151786, pytorch/pytorch/issues/152009, pytorch/pytorch/issues/152121

GPU and Hardware Compatibility: Compatibility issues with specific GPUs or hardware configurations can lead to runtime errors or performance bottlenecks. These issues often require updates to software or drivers to ensure proper support for the latest hardware capabilities.
pytorch/pytorch/issues/152077, pytorch/pytorch/issues/152223, pytorch/pytorch/issues/152255

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 192
Key Open Pull Requests
1. Add scripts to check xrefs and urls: This pull request introduces scripts designed to traverse the documentation and code within the PyTorch project to identify and address any broken cross-references and URLs, as evidenced by multiple updates and renaming of scripts such as check_xrefs.sh to lint_xrefs.sh and check_urls.sh to lint_urls.sh, along with modifications to the _docs.yml file.

URL: pull/151844

Merged: No

Associated Commits: 1d738, dcd6e, 72d59, 9605a, 2a07f, a37df, 150c5, 48c4e, ee06f, 83ed6, 8037b, e963c, 0a041, 90093, a0cbb, 8f72f, 8ad07, 55b02, 4edf5

2. [WIP] Deprecate AcceleratorHooksInterface isPinnedPtr, use at::getHostAllocator()->is_pinned instead: This pull request aims to deprecate the AcceleratorHooksInterface's isPinnedPtr method in favor of using at::getHostAllocator()->is_pinned within the PyTorch project, as part of a series of changes tracked by the ghstack tool.

URL: pull/151916

Merged: No

Associated Commits: 37774, 87fb0, 90a69, 91261, 3b679, 014ed, 1d73c, 0b19f, 51e74, b3260, 03ba4, 82484, b23d0, da02e, a7c30, da742, 0623b, 7217f, e8058

3. [Graph Partition] Pass all cudagraph tree tests: This pull request addresses the issue of passing all cudagraph tree tests in the PyTorch project by implementing various fixes and updates, such as correcting test input and output orders, enabling certain features by default, and supporting additional functionalities like the ForeachKernelSchedulerNode, as evidenced by multiple commits aimed at refining the graph partitioning process.

URL: pull/152048

Merged: No

Associated Commits: 81a80, ed63b, 05880, f6758, 7193e, ff4ae, 0b311, 75b71, 69415, b93f3, 81a39, 35470, 74624

Other Open Pull Requests

Metal Performance Shaders (MPS) Support in PyTorch: This pull request enhances the PyTorch library by adding support for Metal Performance Shaders (MPS) to the at::getHostAllocator API. It is designed to facilitate writing device-agnostic code, although certain functionalities like record_event, get_stats, reset_accumulated_stats, and reset_peak_stats are not yet supported for MPS.
pull/151913

AOTAutogradCache Enhancements: This pull request addresses an issue with the AOTAutogradCache in PyTorch by saving the bw_module in the cache after removing unserializable metadata. It ensures that both the lowered backward and the bw_module are cached to support runs with and without compiled autograd, while also differentiating cached and non-cached versions to prevent crashes during AOT compilation with a restored bw_module.
pull/151860

Visualization and Sharding in PyTorch: This pull request adds enhanced support for rich visualization to the torch.distributed.tensor.debug.visualize_sharding function in PyTorch. It addresses issue #151857 and includes updates to adapt the functionality for execution on systems with at least four GPUs.
pull/152027

Export Functionality Enhancements: This pull request aims to enhance the export functionality by supporting the export of hops with function schema arguments. It makes the function schema proxyable to trace auto-functionalized hops and simplifies the implementation using pytree.register_constant, with plans to add support for serialization and deserialization (serde) in future updates.
pull/152073

Windows Arm64 Runners Integration: This pull request aims to integrate new Windows Arm64 runners into the PyTorch project by utilizing pre-installed Visual Studio. It removes unnecessary installations, enables long paths, and updates the action configuration to accommodate these changes.
pull/152184

ROCm Backend Updates: This pull request aims to reland changes from a previous pull request by removing all "MasqueradingAsCUDA" files and classes from the hipify process in the ROCm backend of the PyTorch project. It also updates the hipify version to 2.0.0, reverts certain changes, and addresses deprecation warnings.
pull/151845

Unimplemented Function Replacement: This pull request involves replacing the unimplemented function with unimplemented_v2 in the torch/_dynamo/variables/nn_module.py file as part of a larger task (#147913). It includes multiple commits with contributions from William Wen, addressing various updates and improvements to the code.
pull/151895

Group GEMM Template and Epilogue Fusion: This pull request addresses enhancements in the PyTorch project by enabling Group GEMM Template and Epilogue Fusion. It removes redundant buffers post-fusion and supports Linear Silu Mul fusion when concatenated linear operations are enabled, as well as enabling horizontal transverse operations.
pull/151780

Gather Operation Data Type Support: This pull request aims to enhance the PyTorch library by supporting additional data types for both input and indices in the gather operation. It is part of a series of changes tracked through the ghstack tool and involves multiple contributors and reviewers from the community.
pull/151822

Unimplemented Function Replacement in Lists: This pull request involves replacing the unimplemented function with unimplemented_v2 in the torch/_dynamo/variables/lists.py file as part of issue #147913. It includes multiple commits, some of which are co-authored by William Wen.
pull/151873

Docstring Linter Exemption: This pull request aims to address issue #151692 by exempting overriding methods from the docstring linter in the PyTorch project. It is part of a series of changes managed through the ghstack tool.
pull/151906

ConvTranspose*d Padding Option: This pull request introduces the padding="same" option to the ConvTranspose*d and conv_transpose*d functions in the PyTorch library. It ensures compatibility with existing convolution layers and includes updates such as feature implementation, testing, documentation, and handling of padding discrepancies.
pull/152228

Dynamic Shape Error Readability: This pull request introduces the is_exporting() function and the _is_dynamo_exporting() function to aid in determining when to utilize stack traces for enhancing the readability of dynamic shape errors. It is part of upcoming work in the PyTorch project.
pull/151833

Hiding Unused Scalar Integer Sizes: This pull request aims to hide unused scalar integer sizes from Dynamo in the PyTorch project. It is part of a stack of related changes and addresses issues #113129 and #146168 while involving multiple contributors.
pull/151962

Dynamic Annotations on Tensors: This pull request aims to enhance the PyTorch project by adding support for dynamic annotations on tensors within ListVariables and TupleVariables. It addresses a specific issue and works in conjunction with another related pull request to resolve a previously identified problem.
pull/152119

CUDA Script Unification: This pull request aims to unify the install_cuda and install_cuda_aarch64 scripts by generalizing the install_cuda script to handle both standard and aarch64 architectures. It eliminates the need for a separate install_cuda_aarch64 script and consolidates common code into install_cuda and install_cudnn functions.
pull/152140

Dynamo Configuration in HOPify: This pull request aims to integrate a dynamo configuration into the HOPify context managers within the PyTorch project. It is part of a stack of changes and includes multiple updates with a focus on adding tests, as indicated by the detailed comments and the involvement of several contributors.
pull/152159

Embedding Function Index Handling: This pull request addresses issue #151918 by modifying the aten.embedding function to prevent it from wrapping negative indices. It includes several commits that add and improve functionalities related to index handling and embedding operations in the PyTorch project.
pull/151967

Global Tensor Shape Calculation: This pull request introduces a utility function named compute_global_tensor_shape for a 1D device mesh. It calculates the global tensor shape by gathering local tensor shapes from shards and constructing the global shape based on the placement type, currently supporting only "Shard" and "Replicate" placements.
pull/151990

AC_TRACER Infrastructure Mode: This pull request introduces a new infrastructure mode called AC_TRACER in the PyTorch project. It is designed to trace and replay computational graphs during backward passes by prioritizing it above other infra modes, ensuring efficient recomputation without needing to reenable ambient modes or execute user logic in the same manner.
pull/152158

TritonTemplate Function Refactor: This pull request refactors the TritonTemplate.generate function by moving the code generation logic to a new generate_and_load function. It is part of a split from a larger pull request and includes updates to typing.
pull/151764

SymmMem Module Optimization: This pull request introduces a work-in-progress feature called "all_to_all_vdev" in the SymmMem module. It aims to optimize tensor operations by merging input and output splits into a single tensor, utilizing multi-block processing, and employing nvshmemx_collective_launch for efficient communication.
pull/151819

Dim.AUTO Warning Mechanism: This pull request addresses issue #151582 by implementing a warning mechanism in the PyTorch library to alert users when dimensions specified with Dim.AUTO are likely to specialize to 0 or 1. Changes are made in the dynamic_shapes.py file.
pull/151827

XPU Documentation Update: This pull request aims to update the "get start xpu" documentation by revising links, updating product names, and adding a print statement to display the result of torch.xpu.is_available() in the code snippet. It removes references to Windows 10 while ensuring compatibility with both Linux and Windows binaries.
pull/151886

Graph Partitioning Optimization: This pull request introduces an optimal reordering strategy using a breadth-first search (BFS) approach to minimize the number of partitions in a graph. It schedules nodes based on their indegree and categorizes them into cudagraphable and non-cudagraphable queues, considering peak memory usage for efficient execution.
pull/151968

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 235
Key Closed Pull Requests
1. Generate test reports for pytest when option is given: This pull request introduces a feature to generate test reports for pytest when a specific option is provided, by appending an argument to enable test report generation, and suggests checking the TEST_SAVE_XML environment variable instead of IS_CI to conditionally enable test reports, aligning with practices in other parts of the codebase.

URL: pull/152167

Merged: No

Associated Commits: f20a2, 1b267, 7ffa9, 88b05, b0f26, a6e46, 6e7b6, b74be, 02dd0, 56d31, bd77c, 97d97, fc7d4, 704a5, 313ce, cac8d, e4818, 359e1, adf5f, cfc4d, 843e4, 6261d, 414ce, 2673e, f6c1c, 92d0c, 8e5fe, 92bae, 1e1d0, 68f74, 483e6, ed511, 9b74e, c4482, 48761, a40e8, 6b45b, c3a72, 47013, fc2dd, 9c2ac, 8eb21, f7ddc, 2a9af, bf28d, 2eacd, 515a0, 33808, 93740, cea43, 28799, 67c28, 0f861, e2b1c, f37e1, fd04c, d7914, 96800, b7c70, 2fb13, 1a6ef, 191b0, 02cec, 1f0d7, 35201, 25a11, c312d, cd131, 6ea2e, 79a94, efdcc, 01f1c, 4d78e, 99aee, a35e7, b3b16, 80a38, a02ea, 40cf4, a4fda, 14e3f, b7a77, edba2, 529f6, 29811, 6f327, 95abc, 0ff30, e76c0, 4a643, 28388, a09a3, dfdf7, 73d95, ccd00, 3aeeb, 159e2, d778c, c729f, ed0d2, f072b, 45049, 3804a, 5d316, 5fc1e, 06a3c, 264e8, 2c275, 834a0, 4bf09, fa0f1, 98206, fbd29, 337ca, 69ee6, 6cd17, 8ca79, d0d4e, 0bb9b, 7e4b8, 59629, a48cc, 3380a, 2f74c, aaf71, 459c6, bc6c0, 83541, 017a6, e05ac, b8f4d, 3aecf, 2f851, c0b70, 43de9, 6a1b8, a7ccd, f4ac9, c9834, 4f8ad, cd576, aa617, 334aa, 72f71, 49b7f, 015b5, 68a75, 74074, 13339, cd021, 25305, 78bbb, f9bdf, cc793, b37fa, 54f73, b247e, 097fa, 7c977, ee81f, 62b56, 5b9df, 6d28d, b32b0, 21b0e, 73100, dcc32, e31e2, 05114, a5602, 9422e, 2ab75, 34827, 9344d, 5f637, aa285, 3c1a1, 5acc3, 69e41, 99ae7, c1f51, 98c53, 56232, dccb7, bd191, 47ad3, fd3d3, 4d2d8, 81723, f2cfe, 2455d, 4e1d4, f39a1, c91ac, 4ac2e, d703f, 2ee8d, e2cf6, 43f1b, 5de92, fabbc, 05597, 89a85, b2372, 68454, 76cc3, 2a58d, 2102b, a3898, 5e9bd, 2ea86, 78953, 5b368, 5e320, 3278d, 41285, 1d73b, d743a, 3a170, 56e67, 0eb55, 9c1bc, 402d1, 03970, 81c43, ff075, b11c9, b1d05, 6efc5, 24bda, 92f12, bd09d, dccc4, 8a9c6, d78d2, 6ced5, 04133, fc6e3, 2089b, d7049, 75c71, 8313b, 7f28c, 1a6d5, e2c7a, dda0c, a936d, 6120c

2. [rocm6.4_internal_testing] Dockerfile swap: This pull request involves swapping the contents of the CentOS Stream Dockerfile into the main Dockerfile for the ROCm 6.4 internal testing environment, as part of a series of updates and optimizations to support various builds and tests, including CentOS Stream 9 and Ubuntu 24.04, while addressing specific issues and enhancing compatibility with PyTorch and ROCm features.

URL: pull/151927

Merged: No

Associated Commits: 51ce1, b966e, 0e96f, ec70f, e85cf, a4b50, 4ce57, d4fbf, 090d9, 2b0f3, 7d339, d6879, 6d5c3, a6d96, 4a42d, 8c393, d7265, 6f76e, 046a0, fc66f, c8ae4, dfcad, 17c65, e3810, 956c1, 7a198, 74e1e, 0ce9f, 99b07, e24ef, 3d0ad, 59e14, d3c94, 52172, d4d0b, 15a21, 432b2, f7ad5, 46868, 2e486, 8b86e, 057e9, 80b4c, 26585, d5d9d, e9712, 398bd, 896c7, 14c14, 6e971, 0b21f, 22512, fb4d1, b273a, 0e216, a7d21, 9f390, cfb67, e8629, d9668, f24c4, 93895, fddb7, c873a, 9c50b, 79c54, 5f50c, bc969, 8190c, fc899, 402de, ef297, a8d55, e03df

3. [WIP][CUDA][cuBLAS][cuBLASLt] Opt-in unified cuBLAS + cuBLASLt workspaces: This pull request introduces an opt-in feature for unified workspaces between cuBLAS and cuBLASLt in PyTorch, addressing a previously reported 70% forward issue by allowing users to enable the feature with the TORCH_CUBLASLT_UNIFIED_WORKSPACE=1 environment variable, and includes multiple commits for updates and fixes in the CUDABlas.cpp file and other related components.

URL: pull/151163

Merged: No

Associated Commits: ad7e3, 34e4c, 2afd7, 6af3d, cebef, 2a746, 296e6, e097f, 38f37, cf18f, bcce8, ed4cb, 1c107, 63eea, 400ea, 05ab7, 666bc

Other Closed Pull Requests

Device Tests in FlexAttention Module: This pull request focuses on fixing the instantiation of device tests in the FlexAttention module of the PyTorch project. It includes a series of updates and commits aimed at addressing the issue effectively.
pull/151846

Docker Image Testing Workflow: This pull request modifies the workflow for testing binary Docker images by fetching correct tags from AWS ECR and implementing reusable actions. It also addresses access issues for certain architectures and ensures Docker images are rebuilt when relevant scripts change.
pull/150558

Torch.Event Function Signature and Documentation: Two pull requests address the torch.Event function signature and its documentation. One focuses on the discrepancy in the enable_timing parameter, while the other enhances documentation by adding detailed signatures and correcting display issues.
pull/151221, pull/151411

Caching Mechanism for Fake Tensors: Multiple pull requests aim to enhance the PyTorch project by implementing caching mechanisms for fake tensors. These changes are part of a series tracked through the ghstack tool, focusing on scenarios with None, integer, and symbolic integers in the output.
pull/151957, pull/151961, pull/151410

Dynamic Shapes and Built-in Operations: Two pull requests focus on enhancing dynamic shapes and handling built-in operations in PyTorch. One uses the bound_sympy library for size-oblivious reasoning, while the other reestablishes infrastructure for operations like min, max, and math.pow.
pull/151242, pull/151348

MPSInductor and GPU Test Enhancements: Two pull requests address issues in MPSInductor and GPU tests. One implements the atomic_add store mode, resolving several GPU test issues, while the other enables a specific test by modifying data types.
pull/151871, pull/151872

Deprecation of Host Allocator Legacy API: This pull request aims to deprecate the host allocator legacy API in favor of a unified API, getHostAllocator(device_type). It streamlines memory allocation processes and improves user experience by providing a consistent interface.
pull/151437

Enhancements in Guard Checking and Static Value Functions: Two pull requests focus on enhancing guard checking logic and introducing static value functions. One refactors guard checking logic, while the other introduces has_static_value to determine static boolean, float, or integer values.
pull/151563, pull/151601

Enhancements in Dynamic Sources Allowlist and Testing Utilities: Two pull requests enhance the dynamic sources allowlist and relocate operation modifiers to testing utilities. These changes facilitate capturing allowlist changes over time and reusing operation modifiers in other tests.
pull/151766, pull/151781

Token::text_view() Method Introduction: This pull request introduces a new method, Token::text_view(), which returns a string_view instead of a string. It aims to avoid potential lifetime issues in existing code and includes multiple updates and rebases.
pull/151804

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 

test

Toxicity Score: 0.55 (Escalating frustration, defensive responses, unresolved tension.)
This GitHub conversation involves username1 expressing dissatisfaction with the progress of a pull request, while username2 responds with a defensive tone. The conversation escalates as username1 continues to express frustration, leading to a tense exchange. Username3 attempts to mediate by suggesting a compromise, but the initial tension remains unresolved.

Update description for torch.random.fork_rng

Toxicity Score: 0.55 (Defensive responses, critical feedback, escalating tension.)
This GitHub conversation involves username1 proposing an update, with username2 providing feedback that is initially neutral but becomes increasingly critical. Username1 responds defensively, leading to a tense exchange. The tone shifts from collaborative to confrontational, with both parties expressing frustration.

[Easy] Remove redundant code

Toxicity Score: 0.55 (Defensive responses, unresolved tension, mediation attempts.)
This GitHub conversation involves username1 expressing dissatisfaction with username2's approach, which is perceived as ineffective. Username2 responds defensively, leading to a tense exchange. Username3 attempts to mediate, but the conversation remains strained, with underlying frustration evident.

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

malfet
192
24
6
145

FFFrog
140
10
0
8

anijain2305
122
17
1
9

mlazos
131
13
0
5

pianpwk
100
24
2
21

swolchok
108
17
0
16

guangyey
103
8
0
18

justinchuby
52
4
6
66

laithsakka
66
19
5
29

guilhermeleobas
100
13
1
1

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
malfet	192	24	6	145
FFFrog	140	10	0	8
anijain2305	122	17	1	9
mlazos	131	13	0	5
pianpwk	100	24	2	21
swolchok	108	17	0	16
guangyey	103	8	0	18
justinchuby	52	4	6	66
laithsakka	66	19	5	29
guilhermeleobas	100	13	1	1