Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Pytorch: April 07, 2025 - April 14, 2025 (14:16:22)

Weekly GitHub Report for Pytorch

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

The current version of this repository is v2.6.0

1.2 Version Information:

Released on January 29, 2025, PyTorch 2.6 introduces significant updates, including support for torch.compile with Python 3.13, a new performance-related feature torch.compiler.set_stance, and FP16 support on X86 CPUs. Notably, the release marks a shift away from publishing on Conda, with a focus on using Manylinux 2.28 for Linux binaries, and introduces a backward-incompatible change by setting weights_only=True as the default for torch.load.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. RMS norm causes NaNs when used with torch.compile + float8 with rowwise scales: This issue involves a bug in the PyTorch project where using RMS norm with torch.compile and float8 training with rowwise scales results in NaNs during training. The problem appears to have been introduced between versions 2.6.0 and the current version, causing the training loss to not decrease and eventually become NaN after approximately 40 steps.

    • The comments discuss the identification and isolation of the issue, including attempts to reproduce it, differences in kernel operations, and potential causes related to precision casting and activation checkpointing. Various debugging steps are shared, such as using environment variables to mitigate the issue, and the discussion includes detailed analysis of the computational graphs and kernel differences. The conversation also explores potential solutions and further debugging strategies, with contributors collaborating to narrow down the root cause.
    • Number of comments this week: 26
  2. [ONNX] How to export Llama4: This issue involves a user attempting to export the Llama 4 Scout model to ONNX format, encountering a RuntimeError due to unsupported input types, specifically a DynamicCache, when using PyTorch versions higher than 4.44.2. The user provides a detailed traceback and code snippet to reproduce the error, seeking assistance in resolving the issue with the ONNX export process.

    • The comments discuss various troubleshooting steps, including using torch.onnx.export with dynamo=True and strict=False, and updating to the latest torch-nightly version. The user reports that the export process hangs and encounters data-dependent issues, particularly with mixed precision frameworks. Suggestions include updating onnxscript and using the latest torch-nightly, but the user still faces errors related to topological sorting and mixed precision, prompting further discussion on alternative approaches and the user's specific requirements for exporting to MLIR.
    • Number of comments this week: 15
  3. torch.compile cannot handle plain nn.Parameter subclasses: This issue highlights a problem with the torch.compile function in PyTorch, which fails to handle plain nn.Parameter subclasses, such as those used in vLLM, due to the absence of __torch_function__ or __torch_dispatch__ methods. The error encountered is an InternalTorchDynamoError when attempting to compile a model using these subclasses, indicating a need for a workaround or fix to support these parameter types.

    • The comments discuss a workaround involving adding a dummy __torch_function__ implementation, note that the issue did not occur in PyTorch 2.6, and mention a related pull request that might have caused the regression. There is a discussion about the error persisting in the latest nightly build, with suggestions to use torch.compile(fullgraph=False) as a temporary solution. A user shares their experience with the issue, and a suggestion is made to modify the example code to avoid setting attributes on a type object, which allows compilation with fullgraph=True.
    • Number of comments this week: 11
  4. torch.export and torch.compile in torch 2.7 RC fails some cases that work with torch 2.6: This issue highlights a regression in PyTorch 2.7 RC where torch.export and torch.compile fail in certain cases that previously worked in version 2.6. The problem seems to be related to unexpected types in the code, and a temporary workaround involves using strict=False in torch.export.

    • The comments discuss the regression being related to torch.dynamo, with users experiencing similar issues in version 2.7 that did not occur in 2.6. A recommended workaround is to use strict=False in torch.export, although this is not the default in 2.7, and there is a suggestion to include this information in the release notes. There is also a mention of the issue affecting torch.compile, and a contributor plans to investigate the Dynamo issue further.
    • Number of comments this week: 9
  5. NCCL init hits CUDA failure 'invalid argument' on 12.2 driver: This issue reports a bug encountered when using the PyTorch nightly build with a CUDA 12.2 driver, where the NCCL initialization fails due to a 'CUDA invalid argument' error. The problem is observed when running a distributed all-reduce operation on a machine with NVIDIA H100 GPUs, and it appears to be resolved when using a CUDA 12.4 driver or higher.

    • The comments discuss potential causes and solutions, including requests for detailed debug logs and suggestions to ensure proper device initialization order. It is noted that the issue occurs on different nodes with different driver versions, and no clear solution is identified from the logs provided.
    • Number of comments this week: 7

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler': This issue involves an ImportError encountered when attempting to import 'triton_key' from 'triton.compiler.compiler', which is causing a backend compiler failure in a PyTorch environment using the 'inductor' backend. The problem arises during the execution of a Python script that utilizes the OotdPipeline with specific configurations, and it has been open for over a year without resolution, indicating a potentially complex or overlooked problem in the integration of Triton with PyTorch.
  2. Alternate algorithm for computing MaxPool2D under specific condition.: This issue proposes an alternative algorithm for computing the MaxPool2D operation in PyTorch when the stride is equal to 1, suggesting that a kernel size of 5 can be represented by two MaxPool2D operations with a kernel size of 3, and similarly for larger kernel sizes. The motivation behind this approach is to reduce computational costs on the CPU by modifying the MaxPool2D layer directly, as demonstrated by testing code that shows a significant speedup in execution time compared to the traditional method.
  3. cuda_utils.so: failed to map segment from shared object: This issue describes a bug encountered when running a PyTorch model within a Docker container, where the execution of a cached shared object file, cuda_utils.so, fails due to a missing execution permission despite being run as the root user. The problem arises specifically in a Docker environment with a tmpfs permission set to 1777, causing the shared object file in /tmp to lack the necessary execution bit, leading to an ImportError during the model's execution.
  4. Enable UFMT on all files in PyTorch: This issue addresses the need to apply uniform formatting (UFMT) to approximately 1,500 files in the PyTorch codebase that are currently exempt from this formatting standard. The process involves removing file names from the exclude_patterns in the UFMT section of the .lintrunner.toml file and running a specific command to ensure all files adhere to the desired formatting, with additional preparatory work required to resolve known issues in certain files before applying the UFMT changes.
  5. [JIT archive] Add a flag to not include debug files: This issue proposes the addition of a flag to the torch.jit.save() function in PyTorch to exclude .debug_pkl files, which are primarily used for debugging purposes and can significantly increase the file size of TorchScript models compared to ONNX models. The motivation behind this feature request is to reduce the file size of models, particularly for deployment on mobile devices, by eliminating unnecessary debug files, as demonstrated by the user's experience where removing these files manually resulted in a substantial reduction in file size without affecting model functionality.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 92

Summarized Issues:

  • PyTorch Export and ONNX Compatibility Issues: Several issues highlight challenges in exporting models to ONNX format using PyTorch, often due to missing operators or compatibility problems with specific PyTorch versions. These issues suggest a need for improved support and testing for ONNX export functionality in PyTorch, particularly when using advanced features like dynamo=True or dealing with dynamic shapes.
    • issues/150823, issues/150842, issues/150891, issues/150986, issues/151016, issues/151017
  • Performance and Regression Concerns in PyTorch: Multiple issues report performance regressions and inefficiencies in PyTorch, particularly after version upgrades or when using specific features like torch.compile. These issues highlight the need for thorough performance testing and optimization to ensure that new releases do not introduce significant slowdowns or regressions.
    • issues/150832, issues/150961, issues/151037, issues/151039, issues/151043
  • Bugs in PyTorch's Distributed and Sharding Features: Several issues describe bugs in PyTorch's distributed training and sharding features, such as errors during gradient computation or process group management. These issues suggest a need for improved error handling and robustness in distributed training scenarios to prevent crashes and ensure smooth operation.
    • issues/150799, issues/150928, issues/151030, issues/151159
  • Documentation and Usability Improvements in PyTorch: Various issues highlight discrepancies and areas for improvement in PyTorch's documentation, such as unclear parameter requirements or missing information. These issues emphasize the importance of accurate and comprehensive documentation to aid users in effectively utilizing PyTorch's features.
    • issues/150873, issues/150917, issues/151101, issues/151103, issues/151104, issues/151105
  • Bugs and Errors in PyTorch's Core Functions: Several issues report bugs in core PyTorch functions, such as incorrect error handling or unexpected behavior during tensor operations. These issues highlight the need for rigorous testing and validation of PyTorch's core functionalities to ensure reliability and correctness.
    • issues/150776, issues/150835, issues/150836, issues/150851, issues/150883, issues/151106

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 43

Summarized Issues:

  • Bugs in PyTorch Tensor Operations: This topic covers various bugs related to tensor operations in PyTorch. Issues include the as_subclass method failing under TorchDispatchMode, inconsistencies in inference results between eager execution and the inductor compiler, and a segmentation fault with torch.matmul() using float16 on CPU. These bugs highlight challenges in maintaining consistent behavior across different execution modes and data types.
    • issues/149290, issues/150165, issues/150637
  • PyTorch Documentation and API Enhancements: Several issues point to the need for improved documentation and API enhancements in PyTorch. These include the lack of documentation for torch.Tensor.fill_, missing illustrative plots for certain functions, and the need for a utility function to determine the best computing device. Addressing these issues would enhance user understanding and usability of the library.
    • issues/150009, issues/150170, issues/149719
  • Bugs in PyTorch's CUDA and GPU Operations: This topic includes bugs related to CUDA and GPU operations in PyTorch. Issues involve runtime errors with torch.empty_strided() on CUDA devices, CUDA profiling warnings with torch.utils.rename_privateuse1_backend, and a "CUDA error: too many resources requested for launch" during backward passes. These issues highlight the complexity of ensuring compatibility and performance across different hardware configurations.
    • issues/150179, issues/150281, issues/150266
  • Bugs in PyTorch's Distributed and Parallel Computing: Several issues address bugs in PyTorch's distributed and parallel computing features. These include memory leaks in all_gather_object, incorrect global rank assignment with torchrun, and the need for a torch.distributed.get_local_rank() function. These issues underscore the challenges in managing resources and ensuring correct behavior in distributed environments.
    • issues/150798, issues/150660, issues/151143
  • Bugs and Enhancements in PyTorch's Compilation and Optimization: This topic covers issues related to PyTorch's compilation and optimization processes. Problems include a "Fatal Python error" with torch.compile(), memory leaks in diffuser pipelines, and the need for optimization strategies for tensor operations. These issues highlight the ongoing efforts to improve performance and stability in compiled and optimized code paths.
    • issues/150757, issues/150708, issues/151139
  • Bugs in PyTorch's Data Type Handling: Several issues pertain to bugs in handling data types in PyTorch. These include incorrect results with torch.vdot() on complex data types, dtype incompatibility in torch.bartlett_window, and attribute loss with the .type() method. These issues emphasize the importance of robust data type management in scientific computing libraries.
    • issues/150366, issues/150616, issues/150618
  • Bugs in PyTorch's Build and Installation Processes: This topic includes issues related to building and installing PyTorch. Problems include build failures with GCC 12, hash mismatch errors during installation, and compatibility issues with the optree dependency. These issues highlight the challenges in maintaining a smooth build and installation experience across different environments and configurations.
    • issues/150846, issues/150945, issues/150889

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 157

Key Open Pull Requests

1. [WIP][CUDA][cuBLAS][cuBLASLt] Opt-in unified cuBLAS + cuBLASLt workspaces: This pull request introduces an opt-in feature for unified cuBLAS and cuBLASLt workspaces in PyTorch, addressing a previous issue related to a 70% forward performance problem, and includes multiple commits such as checking in the unified workspace, disabling certain precision reductions, and updating various files like CUDABlas.cpp and common.py.

  • URL: pull/151163
  • Merged: No
  • Associated Commits: ad7e3, 34e4c, 2afd7, 6af3d, cebef, 2a746, 296e6, e097f, 38f37, cf18f, bcce8, ed4cb, 1c107, 63eea, 400ea, 05ab7, 666bc

2. Propagate callable parameter types using ParamSpec (#142306): This pull request aims to enhance the PyTorch codebase by propagating callable parameter types using ParamSpec, addressing partial issues from a previous task (#142306), and includes various commits that reorder function parameters, propagate arguments and keyword arguments for nested wrappers, adjust return types, import ParamSpec from typing_extensions for compatibility, and make several other modifications to satisfy type-checking and linting requirements.

  • URL: pull/151014
  • Merged: No
  • Associated Commits: b83b6, f7218, ce461, fccfb, 5af2d, 9bdb2, 7a6f6, f36a4, 0a002, ebef9, 42666, 27d84, 65c28, 81d59, ef2dc, 003b8

3. [ROCm][TunableOp] Support submatrices in offline tuning: This pull request introduces support for submatrices in offline tuning for General Matrix Multiply (GEMM) and ScaledGEMM operations within the ROCm TunableOp framework, including updates to unit tests, refactoring, and bug fixes to enhance functionality and consistency.

  • URL: pull/151138
  • Merged: No
  • Associated Commits: d88f2, 886e4, 51efa, 875bb, 1946a, ec028, be531, caa5e, dd9af, 3b1c6, c65db, c02e5, fba51

Other Open Pull Requests

  • EVT Implementation in Cutlass Project: This series of pull requests focuses on implementing EVT within the Cutlass project, involving example tensor creation, Python code generation for the epilogue visitor, and modifications to the GEMM template. These updates include multiple contributions and collaborations, indicating a comprehensive effort to enhance EVT functionalities.
    • pull/150904, pull/150905, pull/150907
  • CUDACPPScheduling Integration: The pull request aims to integrate EVT into CUDACPPScheduling by allowing epilogue nodes in CUDA combined scheduling. This is part of a series of updates in the PyTorch project to enhance scheduling capabilities.
    • pull/150906
  • Incorrect Typing in Inductor Module: These pull requests address incorrect typing issues in the cuda_kernel and cuda_template.py components of the Inductor module within the PyTorch project. The updates involve collaboration with several contributors to ensure accurate typing and functionality.
    • pull/150908, pull/150909
  • Experimental Changes in PyTorch: This pull request contains a series of non-mergeable, experimental changes labeled as "throwaway" for testing purposes. The repeated "[DO NOT MERGE]" tag indicates its temporary nature within the PyTorch project.
    • pull/150910
  • Epilogue Argument Emitter for Cutlass: This pull request involves implementing an Epilogue Argument emitter for the Cutlass component in the PyTorch project. It includes multiple updates and contributions from various collaborators as part of a series of related changes.
    • pull/150903
  • Graph Partitioning Optimization: This pull request optimizes graph partitioning in PyTorch by reordering nodes to reduce the number of partitions. It specifically moves nodes with simple dependencies to improve efficiency, as demonstrated in a use case involving a padded tensor subclass.
    • pull/150814
  • DispatchKey.Autograd Rework: The pull request reworks the dispatching mechanism of the DispatchKey.Autograd in PyTorch. It ensures the autograd key is only triggered when necessary, bypassing unnecessary operations for operands that do not require gradients.
    • pull/151107
  • CPython Exception Tests: This pull request introduces a series of tests for CPython exceptions to enhance exception handling robustness in PyTorch. It includes multiple test files as part of a stack of related changes.
    • pull/150789
  • CPython List and Tuple Tests: This pull request adds tests for CPython list and tuple functionalities, including several test files. It is part of a series of related changes tracked by ghstack and is currently not merged.
    • pull/150790
  • CPython Dictionary Tests: This pull request introduces new tests for CPython dictionaries, including various test files. It is part of a stack of related changes and is currently open and unmerged in the PyTorch GitHub repository.
    • pull/150791
  • CPython Set Tests: This pull request introduces new tests for the CPython set implementation in PyTorch. It includes multiple commits with updates marked as "[ghstack-poisoned]" to ensure robustness and correctness.
    • pull/150792
  • CPython String Tests: This pull request introduces new tests for CPython string functionalities by adding several test files. It is part of a series of related updates managed through the ghstack tool.
    • pull/150793
  • CPython Math and Cmath Tests: This pull request introduces new tests for the CPython math and cmath modules to ensure their functionality within PyTorch. It involves multiple commits and contributors as part of a stack of related changes.
    • pull/150794
  • CPython Integer and Float Tests: This pull request introduces new tests for CPython's integer and float functionalities, including several test files. It is part of a stack of related changes aimed at enhancing the PyTorch project.
    • pull/150795
  • CPython Generators and Contextlib Tests: This pull request introduces new tests for CPython generators and the contextlib module. It is part of a stack of related changes to ensure proper functionality and integration within PyTorch.
    • pull/150796
  • CPython Iterators and Sorting Tests: This pull request introduces new CPython tests specifically for iterators and sorting functionalities. It includes updates to test files to ensure these features are thoroughly evaluated.
    • pull/150797
  • Inductor NCU Enhancements: This pull request introduces kernel name filtering and custom metrics within the Inductor NCU, along with a new CSV output format. These updates enhance the functionality and usability of the PyTorch project.
    • pull/150872
  • CPython Tests under Dynamo: This pull request introduces infrastructure to enable running CPython tests under Dynamo. It involves multiple commits with updates marked as "[ghstack-poisoned]" as part of a series of related changes.
    • pull/150787
  • CPython Unittest Module Tests: This pull request aims to enhance the PyTorch project by adding CPython tests specifically for the unittest module. It includes several updates marked as "[ghstack-poisoned]" across multiple commits.
    • pull/150788
  • Aten GEMM Function Overload: This pull request introduces an overload for the Aten GEMM function in PyTorch, enabling FP32 output from FP16 or BF16 inputs using CUDA and cuBLAS. The series of commits linked to the pull request detail this enhancement.
    • pull/150812
  • Prologue Supported Inputs Optimization: This pull request optimizes the PyTorch codebase by moving prologue_supported_inputs computations to the def_kernal function. This change helps avoid replaying load_input on a cache hit, significantly improving performance benchmarks.
    • pull/150869
  • Inductor Support with While Loop: This pull request introduces inductor support by transforming the code to use a while_loop. It is part of a series of changes tracked by ghstack and involves multiple updates and contributions from various collaborators.
    • pull/150971

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 174

Key Closed Pull Requests

1. [Inductor] Refactor wrapper codegen to use Wrapper IR.: This pull request involves a preparatory refactor of the existing wrapper code generation in the PyTorch project to utilize a new Wrapper Intermediate Representation (IR), which aims to enhance modularity and encapsulation by structuring the wrapper code into WrapperLine subclasses, thereby allowing backend-specific code generation without altering core Inductor files, and facilitating the generation of FX IR from Wrapper IR.

  • URL: pull/150458
  • Merged: No
  • Associated Commits: 05cf7, aa7a2, 76b12, 2831c, 8678c, fc9ff, 2f17b, fb252, 9dc74, 57369, 63593, f621b, 51ea0, 0cd44

2. [DTensor] StridedShard support uneven sharding: This pull request introduces support for uneven sharding in the DTensor library by enabling the use of Fully Sharded Data Parallel (FSDP) and Tensor Parallel (TP) on parameters with dimensions that are not evenly divisible by the Data Parallel (DP) or TP mesh sizes, and includes several fixes for DTensor behavior related to uneven strided sharding, such as correcting the creation and reconstruction of strided tensors, improving distributed checkpointing, and adding a utility for converting strided sharding placements to regular sharding placements.

  • URL: pull/150490
  • Merged: No
  • Associated Commits: 25ea2, 114a7, e834c, 7ed1b, 0bb28, eab3d, 49588, 26b19, 0bb1a, e82db, e815f, 76f18, afae7, c2764

3. Test Github Runner behaviors : This pull request, titled "Test Github Runner behaviors," was created to experiment with and test the behaviors of GitHub Runners by making various changes such as combining YAML template files for Windows x64 and ARM64, adjusting build scripts, adding labels, and modifying the order and syntax of steps, but it was ultimately not merged.

  • URL: pull/150014
  • Merged: No
  • Associated Commits: 1bb52, a1fef, 8c1da, 8031e, b97a4, 3542d, 812b5, a5b7c, 2e294, 69c31, 1d301, c82bd

Other Closed Pull Requests

  • Utility Functions for Schema Generation: This topic involves the introduction of utility functions to generate schemas for hops using example inputs. The pull requests focus on creating argument and output schemas that include mutation information and assembling these into a torch._C.FunctionSchema, while addressing limitations such as the inability to handle mutations within tuple inputs.
    • pull/149688
  • DTensor and Sharding Enhancements: These pull requests focus on improving the DTensor module by converting traditional DTensor format placements into a more explicitly ordered format and cleaning up the _local_shard_size_and_offset function. They aim to fix local shape computation for strided sharding in uneven shape cases and simplify other parts of DTensor logic.
    • pull/150493, pull/150650
  • Compilation and Compatibility Fixes: This topic addresses issues related to compilation errors and compatibility in the PyTorch project. The pull requests involve specifying int64_t in templates to ensure compatibility with mixed integer types and fixing unit tests for the XPU that were broken by the community.
    • pull/150894, pull/150830
  • Documentation and Warning Enhancements: These pull requests aim to enhance the documentation for object collectives and clarify the behavior of the tensor.to() function. They add prominent warning labels to highlight potential issues and address specific documentation issues.
    • pull/150815, pull/150913
  • Testing and Runner Image Updates: This topic involves testing new functionalities and updating runner images. The pull requests include testing a new Windows Arm64 runner image and introducing new tests for autograd's producer-consumer stream synchronization.
    • pull/150925, pull/150952
  • Performance and Profiling Improvements: These pull requests focus on enhancing performance and profiling capabilities in the PyTorch project. They involve enabling the fusion of the qconv1d-relu pattern and adding a RECORD_FUNCTION for aoti_xxx to improve profiling visibility.
    • pull/150751, pull/149308
  • Bug Fixes and Workarounds: This topic addresses various bug fixes and workarounds in the PyTorch project. The pull requests include handling cases where the scatter dimension is 0 for 2D output tensors and updating conditions for the NVIDIA CUPTI library to prevent crashes.
    • pull/150935, pull/150957
  • Feature Enhancements and New Methods: These pull requests aim to introduce new features and methods to enhance the PyTorch project. They include adding a new flag to torch.library.register_fake and introducing a new method to the VerificationInfo class for converting objects into dictionaries.
    • pull/150806, pull/151024
  • Unmerged and Closed Pull Requests: This topic includes pull requests that were ultimately not merged or closed without merging. They cover various enhancements and modifications, such as modifying the proxy mode and enhancing the openreg module.
    • pull/150962, pull/151034, pull/151000, pull/150885
  • Miscellaneous Enhancements: These pull requests cover a range of miscellaneous enhancements in the PyTorch project. They include supporting the tuning of the _scaled_grouped_mm function and adapting torch.accelerator.device_count for multi-process usage.
    • pull/150421, pull/149924
  • Cache and Schema Management: This topic addresses issues related to cache and schema management in the PyTorch project. The pull requests involve clearing fx's schema cache upon library deletion and fixing the can_inplace function to handle multiple uses within a fused node.
    • pull/150495, pull/150845

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

  1. [OpenReg][PrivateUse1] add device context for OpenReg Module
    • Toxicity Score: 0.55 (Defensive responses,frustration,tense interactions)
    • This GitHub conversation involves several users discussing the addition of device context support for the OpenReg Module. User1 initially raises a concern about the implementation, which is met with a defensive response from User2, who is the author of the pull request. User3 attempts to mediate by suggesting a compromise, but User1 remains unsatisfied, expressing frustration over the lack of progress. The tone of the conversation is tense, with moments of defensiveness and frustration, particularly between User1 and User2.

IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
malfet 181 20 11 118
guilhermeleobas 173 16 1 2
justinchuby 101 6 5 73
laithsakka 107 19 7 31
anijain2305 87 12 18 19
jamesjwu 111 9 11 2
mlazos 111 13 1 0
angelayi 80 13 8 21
pianpwk 76 15 1 27
StrongerXi 78 3 14 16

Access Last Week's Newsletter:

  • Link
Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.