Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Pytorch - 2024-11-11 12:00:06

Weekly GitHub Report for Pytorch

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


I. Issues

1.1 Open Issues

Open Issues This Week: 92

Summarized Issues:

  • Runtime Errors in PyTorch: Multiple issues describe various runtime errors encountered in PyTorch, including assertion failures, import errors, and CUDA-related problems. These errors often occur during specific operations or configurations, such as using sparse tensors, compiling models, or initializing GPUs. The errors can be due to internal assertion failures, undefined symbols, or invalid device functions.
    • github.com/pytorch/pytorch/issues/131318
    • github.com/pytorch/pytorch/issues/131319
    • github.com/pytorch/pytorch/issues/131324
    • github.com/pytorch/pytorch/issues/131328
    • github.com/pytorch/pytorch/issues/131333
    • github.com/pytorch/pytorch/issues/131334
    • github.com/pytorch/pytorch/issues/131335
    • github.com/pytorch/pytorch/issues/131336
    • github.com/pytorch/pytorch/issues/131337
    • github.com/pytorch/pytorch/issues/131338
    • github.com/pytorch/pytorch/issues/131339
    • github.com/pytorch/pytorch/issues/131398
    • github.com/pytorch/pytorch/issues/131411
    • github.com/pytorch/pytorch/issues/131425
    • github.com/pytorch/pytorch/issues/131439
    • github.com/pytorch/pytorch/issues/131449
    • github.com/pytorch/pytorch/issues/131463
    • github.com/pytorch/pytorch/issues/131470
    • github.com/pytorch/pytorch/issues/131491
    • github.com/pytorch/pytorch/issues/131515
    • github.com/pytorch/pytorch/issues/131561
    • github.com/pytorch/pytorch/issues/131562
    • github.com/pytorch/pytorch/issues/131631
    • github.com/pytorch/pytorch/issues/131635
    • github.com/pytorch/pytorch/issues/131650
    • github.com/pytorch/pytorch/issues/131653
    • github.com/pytorch/pytorch/issues/131654
    • github.com/pytorch/pytorch/issues/131655
    • github.com/pytorch/pytorch/issues/131656
    • github.com/pytorch/pytorch/issues/131657
    • github.com/pytorch/pytorch/issues/131662
    • github.com/pytorch/pytorch/issues/131664
    • github.com/pytorch/pytorch/issues/131667
    • github.com/pytorch/pytorch/issues/131668
    • github.com/pytorch/pytorch/issues/131679
    • github.com/pytorch/pytorch/issues/131688
    • github.com/pytorch/pytorch/issues/131691
    • github.com/pytorch/pytorch/issues/131693
    • github.com/pytorch/pytorch/issues/131695
    • github.com/pytorch/pytorch/issues/131696
    • github.com/pytorch/pytorch/issues/131701
    • github.com/pytorch/pytorch/issues/131734
    • github.com/pytorch/pytorch/issues/131736
    • github.com/pytorch/pytorch/issues/131737
    • github.com/pytorch/pytorch/issues/131739
    • github.com/pytorch/pytorch/issues/131740
    • github.com/pytorch/pytorch/issues/131746
    • github.com/pytorch/pytorch/issues/131750
    • github.com/pytorch/pytorch/issues/131753
    • github.com/pytorch/pytorch/issues/131754
    • github.com/pytorch/pytorch/issues/131755
    • github.com/pytorch/pytorch/issues/131765
    • github.com/pytorch/pytorch/issues/131769
    • github.com/pytorch/pytorch/issues/131770
    • github.com/pytorch/pytorch/issues/131772
    • github.com/pytorch/pytorch/issues/131774
    • github.com/pytorch/pytorch/issues/131778
    • github.com/pytorch/pytorch/issues/131781
    • github.com/pytorch/pytorch/issues/131788
    • github.com/pytorch/pytorch/issues/131793
    • github.com/pytorch/pytorch/issues/131794
    • github.com/pytorch/pytorch/issues/131799
    • github.com/pytorch/pytorch/issues/131802
    • github.com/pytorch/pytorch/issues/131805
    • github.com/pytorch/pytorch/issues/131815
    • github.com/pytorch/pytorch/issues/131823
    • github.com/pytorch/pytorch/issues/131829
    • github.com/pytorch/pytorch/issues/131840
    • github.com/pytorch/pytorch/issues/131859
    • github.com/pytorch/pytorch/issues/131864
    • github.com/pytorch/pytorch/issues/131865
    • github.com/pytorch/pytorch/issues/131881
    • github.com/pytorch/pytorch/issues/131883
    • github.com/pytorch/pytorch/issues/131889
    • github.com/pytorch/pytorch/issues/131891
    • github.com/pytorch/pytorch/issues/131893
    • github.com/pytorch/pytorch/issues/131897
    • github.com/pytorch/pytorch/issues/131901
  • Compilation and Build Issues: Several issues report problems related to compiling and building PyTorch, including missing dependencies, C++ compile errors, and out-of-memory errors during the build process. These issues often require specific configurations or troubleshooting steps to resolve.
    • github.com/pytorch/pytorch/issues/131333
    • github.com/pytorch/pytorch/issues/131339
    • github.com/pytorch/pytorch/issues/131562
    • github.com/pytorch/pytorch/issues/131793
    • github.com/pytorch/pytorch/issues/131864
  • Bugs in PyTorch Functions and Operations: Multiple issues highlight bugs in specific PyTorch functions and operations, such as torch.mean, torch.sum, torch.layer_norm, and torch.arange. These bugs often result in incorrect outputs, runtime errors, or unexpected behavior.
    • github.com/pytorch/pytorch/issues/131457
    • github.com/pytorch/pytorch/issues/131750
    • github.com/pytorch/pytorch/issues/131770
    • github.com/pytorch/pytorch/issues/131774
    • github.com/pytorch/pytorch/issues/131805
    • github.com/pytorch/pytorch/issues/131889
    • github.com/pytorch/pytorch/issues/131891
    • github.com/pytorch/pytorch/issues/131893
  • Performance Regressions: Several issues report performance regressions in PyTorch, where certain operations or models experience significant slowdowns compared to previous versions. These regressions can affect both training and inference times.
    • github.com/pytorch/pytorch/issues/131664
    • github.com/pytorch/pytorch/issues/131693
    • github.com/pytorch/pytorch/issues/131734
  • ONNX Export Issues: Multiple issues describe problems encountered when exporting models to ONNX format using PyTorch, including unsupported operators, invalid operations, and runtime errors. These issues often require updates to the ONNX exporter or workarounds to resolve.
    • github.com/pytorch/pytorch/issues/131349
    • github.com/pytorch/pytorch/issues/131635
    • github.com/pytorch/pytorch/issues/131679
    • github.com/pytorch/pytorch/issues/131829
  • TorchDynamo and TorchInductor Issues: Several issues highlight problems with TorchDynamo and TorchInductor, including runtime errors, incorrect outputs, and performance regressions. These issues often require updates to the underlying libraries or specific configurations to resolve.
    • github.com/pytorch/pytorch/issues/131439
    • github.com/pytorch/pytorch/issues/131450
    • github.com/pytorch/pytorch/issues/131457
    • github.com/pytorch/pytorch/issues/131734
    • github.com/pytorch/pytorch/issues/131736
    • github.com/pytorch/pytorch/issues/131746
    • github.com/pytorch/pytorch/issues/131750
    • github.com/pytorch/pytorch/issues/131753
    • github.com/pytorch/pytorch/issues/131754
    • github.com/pytorch/pytorch/issues/131755
    • github.com/pytorch/pytorch/issues/131765
    • github.com/pytorch/pytorch/issues/131769
    • github.com/pytorch/pytorch/issues/131770
    • github.com/pytorch/pytorch/issues/131772
    • github.com/pytorch/pytorch/issues/131774
    • github.com/pytorch/pytorch/issues/131778
    • github.com/pytorch/pytorch/issues/131781
    • github.com/pytorch/pytorch/issues/131788
    • github.com/pytorch/pytorch/issues/131793
    • github.com/pytorch/pytorch/issues/131794
    • github.com/pytorch/pytorch/issues/131799
    • github.com/pytorch/pytorch/issues/131802
    • github.com/pytorch/pytorch/issues/131805
    • github.com/pytorch/pytorch/issues/131815
    • github.com/pytorch/pytorch/issues/131823
    • github.com/pytorch/pytorch/issues/131829
    • github.com/pytorch/pytorch/issues/131840
    • github.com/pytorch/pytorch/issues/131859
    • github.com/pytorch/pytorch/issues/131864
    • github.com/pytorch/pytorch/issues/131865
    • github.com/pytorch/pytorch/issues/131881
    • github.com/pytorch/pytorch/issues/131883
    • github.com/pytorch/pytorch/issues/131889
    • github.com/pytorch/pytorch/issues/131891
    • github.com/pytorch/pytorch/issues/131893
    • github.com/pytorch/pytorch/issues/131897
    • github.com/pytorch/pytorch/issues/131901

1.2 Top 5 Active Issues:

We consider active issues to be issues that have generated much discussion in the issue's comments.

  1. torch._dynamo.exc.Unsupported: call_function args: UserDefinedObjectVariable(EasyDict): This issue involves a user attempting to run inference on a model using executorch on an Android device, but encountering an error related to torch._dynamo.exc.Unsupported: call_function args: UserDefinedObjectVariable(EasyDict). The user has provided detailed traceback logs and has tried various solutions, including using different dictionary implementations and setting strict=False, but the issue persists.

    • The comments discuss the root cause being that EasyDict is not traceable by Dynamo, and various suggestions are made, including using AttrDict and custom dictionary classes. The user is advised to register EasyDict as a pytree node, but this leads to further issues. The conversation includes attempts to resolve the problem by modifying the code and using different versions of executorch and PyTorch, but the issue remains unresolved. The user eventually sets up a public repository to help reproduce the issue, and the discussion continues with attempts to debug and find a solution.
    • Number of comments: 72
  2. [RFC] Per-Parameter-Sharding FSDP: This issue proposes a new design for Fully Sharded Data Parallel (FSDP) in PyTorch, called Per-Parameter-Sharding FSDP, which aims to address limitations in the existing FSDP by sharding each parameter on dimension 0. The new design promises benefits such as flexible mixed precision, efficient handling of frozen parameters, communication-free sharded state dicts, and potential future communication optimizations.

  • The comments discuss various aspects of the proposed design, including clarifications on mixed precision support, feedback on the API, and suggestions for improvements. There are also discussions on the challenges of integrating with other parallelism strategies, initialization flows, and handling non-persistent buffers. Some users report issues with high loss and freezing during training, and there are ongoing efforts to address these problems and improve the overall implementation.
    • Number of comments: 54
  1. ROCm & Windows Support: This issue requests the addition of PyTorch support for AMD GPUs on Windows, following AMD's release of ROCm Windows support. The user emphasizes the need for this feature to enhance the PyTorch ecosystem on Windows platforms.

    • The comments discuss the current lack of PyTorch support on Windows for AMD GPUs, with users expressing frustration and sharing their experiences with ROCm on Linux. Some users mention trying various workarounds and alternative setups, while others express hope for future updates and improvements from AMD and the PyTorch team.
    • Number of comments: 52
  2. Custom attention recompilations: This issue is about a compiler cache exhaustion problem encountered when using the torch.compile decorator in a PyTorch model's forward function, which leads to repeated recompilations and performance degradation. The error logs indicate that the recompilations are triggered by changes in certain objects and tensor sizes, and the user is seeking solutions to mitigate these recompilations.

    • The comments discuss the use of a custom operation, recompilation triggers, and potential solutions such as increasing the cache size limit and using specific logging configurations. The conversation also includes attempts to reproduce the issue, suggestions for modifying the code to avoid recompilations, and discussions about caching and conditional cases in the model.
    • Number of comments: 51
  3. CUDA nightly docker actually includes CPU build of torch: This issue reports that the CUDA nightly Docker image for PyTorch is incorrectly including the CPU build of Torch instead of the GPU build, causing an error related to the CUDA_HOME environment variable not being set. The problem persists across multiple versions and has been identified as a recurring issue due to build sequencing and testing inadequacies.

    • The comments discuss the recurring nature of the issue, attempts to diagnose and fix it, and the need for better testing and sequencing in the build process. Various users confirm the problem, suggest temporary workarounds, and propose improvements to the CI pipeline to prevent future occurrences. The issue appears to be mitigated with recent changes, but further testing is needed to confirm stability.
    • Number of comments: 41

1.3 Top 5 Quiet Issues:

We consider quiet issues to be issues that have been opened in this project for the longest time. The team should work together to get these issues resolved and closed as soon as possible.

  1. distributed.batch_isend_irecv() crash when send/recv refers to itself: This issue describes a crash occurring in a PyTorch program when using the batch_isend_irecv function for the Gloo backend, specifically when the send and receive operations are set to the current rank. The problem arises because the traceback does not reference any Python code, making it difficult to diagnose and resolve the crash.

    • Open for 366 days, 19 hours, 17 minutes
  2. torch.ops.aten.split.Tensor._schema return alias annotations are wrong: This issue highlights a discrepancy in the alias annotations for the torch.ops.aten.split.Tensor._schema function, where the output TensorList should have the same alias annotation as the input but currently does not. The problem is demonstrated through code snippets showing that the expected alias information is not being captured correctly in the schema's return annotations.

    • Open for 366 days, 17 hours, 38 minutes
  3. Build failure due to C++ version mismatch: This issue describes a build failure in the PyTorch project due to a mismatch in the C++ standard versions used by different components, specifically involving GCC 12.2, protobuf, and abseil. The problem arises because PyTorch does not consistently use the CMake target protobuf::libprotobuf, leading to inconsistencies in the C++ standard requirements and resulting in build errors.

    • Open for 366 days, 01 hours, 24 minutes
  4. [dynamo.export] Assertion Error: Mutating module attribute during export.: This issue addresses an assertion error encountered during the export process in the dynamo.export module, specifically related to the mutation of module attributes. The discussion includes potential solutions such as backing up and reverting the original attribute values, and ensuring the soundness of the exported model by either emitting a variable or lifting the attribute as an additional graph input and output.

    • Open for 365 days, 21 hours, 31 minutes
  5. torch compile does not work with torch.nn.functional.softmax ?: This issue describes a bug encountered when attempting to compile a model using the torch.compile stack, which appears to be incompatible with the torch.nn.functional.softmax function. The error message indicates a runtime error related to reducing over a zero-size dimension during a reduction operation, which prevents the successful execution of the model compilation.

    • Open for 365 days, 19 hours, 56 minutes

1.4 Closed Issues

Closed Issues This Week: 90

Average Issue Close Time (This Week): 69.90 days

Summarized Issues:

1.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open issues within the past week to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open issues from the past week.


II. Pull Requests

2.1 Open Pull Requests

Open Pull Requests This Week: 220

Pull Requests:

2.2 Closed Pull Requests

Closed Pull Requests This Week: 389

Summarized Pull Requests:

2.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open pull requests within the past week to identify potentially heated exchanges and to maintain a constructive project environment.

  1. [test only] TORCH_LOGS_RANKS
    • Toxicity Score: 0.55 (Frustration expressed, defensive responses, underlying tension)
    • This GitHub conversation involves multiple users discussing a proposed solution. The conversation starts with a neutral tone as users share their initial thoughts. However, tension arises when username1 expresses frustration that username2's solution did not work as expected. Username2 responds defensively, which further escalates the tension. Other users attempt to mediate and bring the conversation back to a constructive tone, but the underlying frustration remains evident.

III. Commits

3.1 Commits

Commits This Week: 330

Summarized Commits:

  • FlopCounterMode Fixes: The register_flop_formula function has been corrected to handle custom operations properly, avoiding decomposition issues and adding new tests to verify the fix.
  • Backend Support Enhancements: Support for the HPU backend has been added to _in_graph_classes() in Dynamo Torch, addressing issues caused by hardcoded CUDA stream flows.
  • Data Structures: An OrderedSet implementation has been introduced, extending collections.abc.MutableSet and using a dictionary to maintain order, with edge cases addressed and tests reused from Python's standard library.
  • Public API Changes: Changes to make nn.Module's state_dict load and post hooks public have been relanded, ensuring proper behavior when hooks are registered via the public API.
  • Process Group Fixes: The split_group function has been fixed for single rank process groups, addressing a request from the xlformer team.
  • Iterator and Bytecode Enhancements: The IteratorVariable has been implemented with polyfill fallbacks for enumerate, and bytecode reconstruction for itertools' repeat and count functions has been introduced.
  • AOT Autograd Improvements: The donated buffer feature has been implemented in the AOT Autograd system, with corresponding unit tests ensuring proper detection and storage of donated buffers.
  • Test Tolerance Adjustments: Absolute tolerance for specific tests has been relaxed from 0.01 to 0.02, and torch.testing.assert_close is now used to address assertion errors caused by a Triton update.
  • Build and Compilation Fixes: Vulkan build issues related to missing override errors have been fixed, and _mm_plus_mm code generation issues have been resolved.
  • Graph Pass Reordering: The replace_set_grad_with_hop_pass has been reordered with the lift_constant_tensor pass to ensure necessary metadata for constant attributes.
  • AutoHeuristic Kernel Selection: Support for kernel choice selection in AutoHeuristic has been introduced, allowing registration of functions and utilization of autotuning results.
  • Configuration and Deprecation: The force_parameter_static_shapes configuration has been suggested in recompile logs, and deprecated fields have been removed from the ExportedProgram class constructor.
  • Clang-Tidy Warnings: Clang-tidy warnings in the JIT component have been addressed, and _get_operation_overload has been modified to prevent exceptions when an overload does not exist.
  • Batch Normalization Node Handling: The BN node is now manually erased after being folded into a convolution operation, addressing issues with Dead Code Elimination (DCE).
  • Performance Data Collection: A new workflow for collecting performance data on A10g GPUs has been introduced, potentially replacing costly A100 instances for performance comparisons.
  • ROCm Compatibility: A constructor for c10::BFloat16 from __hip_bfloat16 has been implemented to accommodate changes in the ROCm 6.2.0 API.
  • Inference Setting Fixes: The version counter (VC) of mutated graph inputs is now properly incremented in inference settings.
  • Scalar Tensor Handling: The aten.conj function has been updated to handle scalar tensors directly within C++ to avoid errors during tracing.
  • Autograd and Tensor Mutations: Handling of .data in Dynamo has been updated to ensure mutations are invisible to autograd, with a flag set on the returned TensorVariable.
  • Function Support: Support for the zip_longest function has been introduced in the Dynamo module, addressing a specific issue.
  • Accuracy Tests: An inductor CPU accuracy test designed to run on AVX2 runners has been introduced as part of the CI process.
  • Docker Build Fixes: Architecture issues related to the arm64 Docker build have been fixed, and the range tree code generation has been updated to prevent double invocation.
  • ABI-Compatible CPU Fixes: The ConstantHandle::get function has been added to resolve a compilation error, and the linker path has been fixed to include the libtorch path.
  • Graph Signature Accessors: Immutable accessors in the graph signature have been introduced, validated by existing tests.
  • Meta Error Fixes: A meta error in the _convert_weight_to_int4pack function has been resolved, and data-dependent errors in non-strict export have been addressed.
  • Type Hints and Annotations: Type hints have been added to various functions and modules, enhancing code clarity and maintainability.
  • Non-Persistent Buffers: A utility to accurately track and set the persistence status of buffers has been introduced, ensuring non-persistent buffers are correctly handled.
  • Forward Hooks: The guard on keys for _forward_hooks and _forward_pre_hooks has been removed, addressing a specific issue.
  • Debugging and Logging: The .users attribute in debug.py has been updated, and TCPStore wait timeout logging has been enhanced.
  • Memory Planning and Compilation: A Python binding for _get_current_graph_task_keep_graph has been introduced, and compiler warnings Wunused-function and Wunused-result have been globally enabled.
  • Exception Handling: The handling of the StopIteration exception has been updated, and a bug fix in the Dynamo component has been introduced.
  • Triton Kernel Support: Support for Triton kernels with symbolic function string arguments has been added, and the project has been updated to version 2.5.8.
  • Static Methods in Graphs: Static methods are now permitted in the graph, and the _parent_mesh attribute has been removed from DeviceMesh.
  • Decorator Typing: Typing for decorators has been introduced in various modules, enhancing code quality and maintainability.
  • GPUDirect Storage APIs: Wrappers for synchronous GPUDirect Storage APIs have been introduced, and inline built-in neural network modules have been enabled.
  • MTIA Memory Stats: The MTIA equivalent of the torch.cuda.memory_stats function has been introduced, and corrupt remote address logs in TCPStore have been fixed.
  • AotCodeCompiler Updates: The AotCodeCompiler has been switched to a new cpp_builder, and new utilities for 'and_masks' and 'or_masks' have been introduced.
  • Kernel Operations: Source and destination ranks for point-to-point (p2p) kernel operations have been populated, and certain operations have been relocated to a private scope.
  • Commit Synchronization: The sync_distributed_folder function has been updated to use non-reverse order for commit synchronization.
  • Cudagraph Refactoring: The cudagraph post-compile process has been refactored, and backward mutations in the backward graph have been addressed.
  • Example Fixes: The example for the convert_conv3d_weight_memory_format function has been corrected, and frame summary functionality has been updated.
  • Unit Tests: Unit tests for cudagraph expandable segments have been enabled, and a device argument has been added to the large_grid unit test.
  • Performance Improvements: The performance of torch.masked.mean and torch.masked._std_var has been improved, and metadata preservation in the output node has been addressed.
  • Training Flag Mirroring: The training flag in the OptimizedModule has been mirrored, and test files for various parallel modules have been reorganized.
  • Export and Export for Training: The implementation of export and export_for_training has been consolidated, and the is_training flag has been renamed to dispatch_tracing_mode.
  • Constant Input Handling: Constant inputs passed to AOTI are now ignored during compilation, and a lazy variable tracker for FORMAT_VALUE in f-strings has been introduced.
  • Prim Operations: Support for prim::max and prim::if operations with multiple outputs has been added, and clang-tidy warnings in the JIT component have been resolved.
  • Optimizer Documentation: Missing documentation hooks have been added to the Optimizer base class, and a new workflow for collecting performance metrics for aarch64 architecture has been introduced.
  • Log Message Fixes: An issue with the log message in torchrun has been fixed, and Docker conda builds have been migrated to the pytorch/pytorch repository.
  • Type Annotations: Type annotations for decorators in various modules have been introduced, and build errors related to missing overrides in Vulkan have been fixed.
  • Annotation Relocation: User-defined annotations have been relocated to the Native Caching Allocator, and logging mechanisms have been enhanced.
  • GEMM Template Updates: Initial support for k-slicing in the CPP GEMM template has been introduced, and a basic implementation of the Adafactor optimizer has been added.
  • HPU-Specific Fixes: CPU scalars can now be moved to specific devices, fixing an HPU-specific error, and Python code generation issues have been addressed.
  • Multi-Kernel and Split-Scan: Issues in the interaction between multi-kernel and split-scan have been fixed, and the use of UFMT for the torch/ao/ directory has been enabled.
  • Nightly Checkout Tool: The nightly checkout tool has been refactored, and new arguments for unittest and pytest in the CI environment have been introduced.
  • Noise Level Control: A new noise level for controlling the verbosity of dTensor operation logs has been introduced, and flaky tests in various files have been addressed.
  • Code Annotations: Type annotations have been added to codecache.py and config.py, and potential segmentation fault issues have been addressed.
  • Test Stability: Flaky tests in test_pad_mm.py and test_benchmark_fusion.py have been resolved, and a new --cuda option for the tools/nightly.py script has been introduced.
  • Memory Planning Fixes: Issues in CUDACachingAllocator.cpp have been fixed, and the subgraph lowering process has been annotated.
  • MTIA API Support: Support for the module.mtia() API has been introduced, and the use of UFMT for the torch/ao/quantization/ directory has been enabled.
  • Tensor Attributes: Tensor attributes are now explicitly registered as buffers in the static input test, and clang-tidy warnings in the torch/csrc/distributed/c10d/control_plane directory have been resolved.
  • SDPA Implementation: The SDPA implementation has been introduced to the MPS backend, and unused variables have been removed from the codebase.
  • Warnings and Parameters: Warnings related to unused parameters have been resolved, and the use of UFMT for the torch/ao/pruning/ and torch/ao/nn/ directories has been enabled.
  • Inline Neural Network Modules: Tests for inline inbuilt neural network modules have been updated, and the mobilenet_v2 test for the CPU inductor has been skipped.
  • Clang-Tidy Warnings: Clang-tidy warnings in the aten/src/ATen/native directory have been resolved, and the conversion process for aten.tensor has been updated.
  • Process Group Status: The PG status has been included in the flight recorder, and the tunable ops validator has been updated to fetch the hipblaslt version from the runtime.
  • CUDA Compile Options: Private compile options for CUDA code have been fixed, and instances of InstructionTranslator have been annotated.
  • Grad Hook Fixes: The set_grad hook has been fixed for empty outputs, and a graph_break log registration error has been addressed.
  • Destructor Override Warnings: Missing destructor override warnings have been eliminated, and the InstructionTranslator has been further annotated.
  • File Removals: The _export/exported_program.py file has been removed, and error messages for internal changes have been enhanced.
  • Sparse Block Support: A sparse block has been introduced to the flex_decoding kernel, and initial support for custom operations has been re-landed.
  • Decorator Typing: Typing for decorators in the jit/_decompositions module has been introduced, and flex decoding unit tests for ROCm have been enabled.
  • Test Reorganization: Tests for various parallel modules have been reorganized, and the torch.onnx.export API has been updated.
  • Benchmarking Configuration: The dump_exported_program parameter has been set to True in the ONNX benchmarking configuration, and the rerun_disabled_tests option for the Inductor workflow has been added.
  • Build Customization: The PYTHON_LIB_REL_PATH environment variable can now be set to customize the installation location of Caffe2's Python modules.
  • File Structure Reorganization: The file structure for various parallel modules has been reorganized, and the torch/utils/_config_module.py file has been fully typed.
  • Experimental Job Migration: Experimental jobs have been migrated to the Amazon2023 AMI, and ROCm jobs have been moved to a periodic frequency.
  • CI Dashboard Updates: The 'cpu-x86' label has been renamed to 'cpu_x86' in the CI dashboard, and AOTI will now use a proxy executor for aten operations.
  • Lint Job Migration: Self-hosted lint.yml jobs have been migrated to the new Amazon 2023 AMI, and the WaitCounter has been relocated to the c10/util directory.
  • Multiline Traces: Support for multiline traces in version 3.13 has been introduced, and issues in the test/dynamo/test_bytecode_utils.py file have been fixed.
  • Constant Folding Annotations: The constant_folding.py file has been annotated, and support for ZB1P and ZB2P algorithms has been introduced.
  • ROCm Workflow Sharding: An extra shard for distributed periodic jobs has been added to address timeouts in ROCm workflows.
  • Compile Limits: The number of compiles per frame has been limited, and type annotations for decorators in the onnx/symbolic_helper module have been introduced.
  • Build Error Fixes: A build error related to a shadowed handle has been fixed, and new runner labels have been added to actionlint.
  • Dynamo for Windows: Changes have been made to enable the use of Dynamo for Windows, and the _dispatch_sqrt function has been removed.
  • Triton Operator Fixes: Issues with autotuning in the Triton operator have been addressed, and a new API for custom gradient divide factors has been introduced.
  • Constructor Updates: Missing constructors or assignment operators have been introduced, and a new configuration option for compiler collectives has been added.
  • Pattern Matcher Refactoring: The all-gather and reduce-scatter pattern matchers have been refactored, and the mark_node_as_mutating function has been removed.
  • Multiple Outputs Support: Support for multiple outputs in flex-attention has been introduced, and the UserDefinedTritonKernel has been updated.
  • Ruff Skips Removal: All ruff skips in the torch/onnx module have been removed, and global suppression of inconsistent missing overrides has been addressed.
  • Executorch Flag Removal: The temporary _is_executorch flag has been removed, and a new workflow for collecting CPU performance data nightly has been introduced.
  • Function Simplification: The THPEvent_get_device function has been simplified, and example NestedTensor objects have been added to the test suite.
  • Triton Autotune Bypass: A flag to bypass unsupported @triton.autotune arguments has been introduced, and instances of the InstructionTranslator have been annotated.
  • Scatter Reduce Fixes: Issues with negative indexing in scatter reduce operations have been addressed, and cudagraph fallback tests have been updated.
  • Autograd Metadata: The tensor dictionary is now populated with compiled autograd metadata, and mypy typing has been introduced to the pattern_matcher module.
  • Header Inclusions: The gen.py script has been updated to include _native.h headers, and the testing script for different model families has been updated.
  • Decorator Annotations: Unnecessary mypy allow-untyped-decorators annotations have been removed, and noop implementations for set_rng_state and get_rng_state APIs have been introduced.
  • NCCL Exception Logging: The logging level for NCCL exceptions has been updated, and Python code generation issues have been addressed.
  • OS Version Checks: The code has been updated to use isOperatingSystemAtLeastVersion: for OS version checks, and thread blocking heuristics in GEMM have been improved.
  • Blacklist Removal: The _BLACK_LISTED_OPS has been removed, and issues in the CacheBase.get_system function for AMD devices have been addressed.
  • Optree Import: torch.utils._pytree now imports optree only when used, and an older specialization for the StopIteration exception has been removed.
  • Aliasing Updates: The TENSOR_ALIASING has been renamed to OBJECT_ALIASING, and activation checkpointing differentiation in CommDebugMode has been implemented.
  • Activation Checkpointing Fixes: A bug with activation checkpointing has been addressed, and a low contention intra-node all-gather and reduce-scatter implementation has been introduced.
  • Fake Tensor SymInt Caching: Issues with the caching mechanism for fake tensor SymInt have been addressed, and a fake process group for unit tests has been introduced.
  • Test Failures: Failures in the test_padding.py and do_bench tests have been addressed, and an error to disallow untyped decorators has been enabled.
  • Mypy Type-Checking: Mypy type-checking errors have been resolved, and the pinned commit for the executorch submodule has been updated.
  • Destructor Override Warnings: Missing destructor override warnings have been eliminated, and the operator_benchmark caffe2 build has been removed.
  • README Updates: Outdated information from the .ci README file has been removed, and static methods in the graph have been allowed.
  • NestedTensor Mean Operator: The mean operator has been integrated into PyTorch's NestedTensor, and illegal memory access issues in the Flash-Attention splitkv kernel have been fixed.
  • Type Annotations: Type annotations have been added to the torch/_dynamo/utils.py file, and configuration name matching criteria for CPU inductor tests have been relaxed.
  • Fakification Process: The post-trace fakification process in strict mode has been refactored, and the GraphModuleOpUpgrader has been removed.
  • Sparse Status Handling: Handling of SPARSE_STATUS_INVALID_VALUE has been updated, and error filtering and logging have been enhanced.
  • Aten Operation Parameters: Support for layout, device, and dtype parameters in aten operations has been introduced, and the torch.library.register_vmap function has been added.
  • ComboKernel Implementation: A ComboKernel that consolidates independent Inductor Triton kernels has been introduced, and the DtypeView has been added to address a memory leak.
  • Torch Version Typing: Mypy typing has been introduced to the `torch

IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, or created at least 1 pull request in the past month.

Contributor Commits Pull Requests Issues
PyTorch MergeBot 1094 0 0
anijain2305 0 56 12
cyyever 0 67 0
XuehaiPan 0 64 0
hyperkai 0 0 56
aorenste 0 47 1
ezyang 0 33 10
zou3519 0 23 19
Chillee 0 26 10
AlnisM 0 30 0
xuhancn 0 29 0
malfet 0 25 4
williamwen42 0 22 4
mlazos 0 23 1
clee2000 0 18 5
yf225 0 22 1
yushangdi 0 20 3
drisspg 0 22 1
desertfire 0 15 8
bdhirsh 0 11 9
oulgen 0 18 2
wz337 0 18 1
eellison 0 17 2
eqy 0 18 0
yanboliang 0 18 0
peterbell10 0 17 1
yifuwang 0 16 0
shunting314 0 11 5
wconstab 0 14 1
ZainRizvi 0 14 1
masnesral 0 15 0
atalman 0 9 6
rec 0 12 2
jbschlosser 0 12 1
etaf 0 6 6
mori360 0 12 0
pianpwk 0 12 0
zhxchen17 0 11 0
isuruf 0 10 1
ydwu4 0 11 0
wanchaol 0 10 1
albanD 0 8 3
jiashenC 0 9 1
guangyey 0 10 0
huydhn 0 8 2
d4l3k 0 10 0
sinhaanshul 0 10 0
aakhundov 0 10 0
qqaatw 0 8 1
angelayi 0 7 2
joydddd 0 9 0
soulitzer 0 7 2
justinchuby 0 6 3
leslie-fang-intel 0 8 0
zxd1997066 0 1 7
awgu 0 6 2
Skylion007 0 7 0
jgong5 0 7 0
shuqiangzhang 0 6 1
mikaylagawarecki 0 7 0
jananisriram 0 7 0
PaliC 0 6 0
FindHao 0 6 0
ColinPeppler 0 6 0
ZhiweiYan-96 0 6 0
aaronenyeshi 0 6 0
jataylo 0 6 0
Aidyn-A 0 4 2
fduwjj 0 6 0
jeffdaily 0 4 2
Danielmic 0 2 4
xuzhao9 0 5 0
nmacchioni 0 5 0
yanbing-j 0 4 1
peaceorwell 0 4 1
jansel 0 4 1
sijiac 0 3 2
XilunWu 0 5 0
fegin 0 5 0
guilhermeleobas 0 2 3
H-Huang 0 3 2
BoyuanFeng 0 5 0
IvanKobzarev 0 4 1
zxiiro 0 5 0
zdevito 0 5 0
YangQun1 0 3 2
ringohoffman 0 4 1
avikchaudhuri 0 5 0
CaoE 0 5 0
dvrogozh 0 0 5
jerryzh168 0 3 1
awayzjj 0 4 0
janeyx99 0 4 0
aartbik 0 4 0
nautsimon 0 4 0
tianyeeT 0 4 0
jamesjwu 0 4 0
syed-ahmed 0 3 1
awaelchli 0 2 2
davidberard98 0 2 2
xw285cornell 0 3 1
jiayisunx 0 4 0
fwenguang 0 3 1
henrylhtsang 0 4 0
tianyu-l 0 4 0
jeanschmidt 0 1 3
zhuhaozhe 0 4 0
jianc99 0 0 4
clessig 0 0 4
vmoens 0 0 4
xmfan 0 2 1
jovianjaison 0 3 0
furtnerthomas 0 3 0
khushi-411 0 3 0
majing921201 0 2 1
sraikund16 0 3 0
sdingcn 0 3 0
Valentine233 0 3 0
jhavukainen 0 3 0
xingyunjohn1 0 3 0
helloguo 0 3 0
oraluben 0 2 1
yangsiyu007 0 3 0
mayank31398 0 1 2
andriigrynenko 0 3 0
atuljangra 0 3 0
ppwwyyxx 0 2 1
chuanqi129 0 2 1
redwrasse 0 2 1
DiweiSun 0 3 0
pragupta 0 3 0
nicholasw-gc 0 3 0
nikonikolov 0 1 2
chunyuan-w 0 3 0
ashwani-rathee 0 3 0
kit1980 0 3 0
andrewor14 0 2 1
lw 0 2 1
rootjalex 0 0 3
albertz 0 0 3
tsengalb99 0 0 3
KnightGOKU 0 0 3
OrenLeung 0 0 3
jainapurva 0 2 0
nvcastet 0 2 0
mengph 0 1 1
AlexDenisov 0 2 0
tenpercent 0 2 0
bigfootjon 0 2 0
fenypatel99 0 2 0
titaiwangms 0 2 0
yan-yhy 0 2 0
jerrychenhf 0 1 1
sradc 0 1 1
haocizhang 0 2 0
chenyang78 0 2 0
EikanWang 0 2 0
connernilsen 0 2 0
r-barnes 0 2 0
izaitsevfb 0 2 0
datagero 0 2 0
tursom 0 1 1
fengyuan14 0 2 0
cccclai 0 2 0
michaeleisel 0 1 1
cdzhan 0 1 1
jamesperng 0 2 0
MaggieMoss 0 2 0
q10 0 2 0
dshi7 0 2 0
c-p-i-o 0 2 0
zhangfeiv0 0 2 0
WeiChunyu-star 0 2 0
YuqingJ 0 2 0
robert-hardwick 0 2 0
jianyuh 0 2 0
ankurneog 0 2 0
maxyanghu 0 1 1
Microve 0 1 1
koparasy 0 1 1
egienvalue 0 2 0
JackCaoG 0 2 0
brim1754 0 1 1
hongxiayang 0 2 0
krzysztofjordan 0 1 1
sanchitintel 0 2 0
YUNQIUGUO 0 2 0
SherlockNoMad 0 2 0
dilililiwhy 0 1 1
staugust 0 1 1
laithsakka 0 2 0
ani300 0 1 1
oniononion36 0 2 0
jithunnair-amd 0 1 1
blaine-rister 0 1 1
randolf-scholz 0 1 1
redradist 0 2 0
asiab4 0 2 0
wizzniu 0 2 0
rybakov 0 0 2
youkaichao 0 0 2
wbigat 0 0 2
tingyangk 0 0 2
stswidwinski 0 0 2
jakelevi1996 0 0 2
rohitdwivedula 0 0 2
WeizhuoZhang-intel 0 0 2
gau-nernst 0 0 2
njzjz 0 0 2
xinyu-intel 0 0 2
gilfree 0 0 2
benbellick 0 0 2
MoFHeka 0 0 2
GitHub 1 0 0
JonathanWenger 0 1 0
frost-intel 0 1 0
WenleiHe 0 1 0
hippocookie 0 1 0
RabbitWhite1 0 1 0
shengfukevin 0 1 0
hydeparksnow 0 1 0
richwomanbtc 0 1 0
Gunale0926 0 1 0
qingyunqu 0 1 0
ENUMERA8OR 0 1 0
zitongzhan 0 1 0
AlekseiNikiforovIBM 0 1 0
842974287 0 1 0
Fuzzkatt 0 1 0
bt2513 0 1 0
Xia-Weiwen 0 1 0
ZhaoqiongZ 0 1 0
VRSinghHabana 0 1 0
tmct 0 1 0
daulet-askarov 0 1 0
houqi 0 1 0
naromero77amd 0 1 0
dsjohns2 0 1 0
TsukiSky 0 1 0
AngryLoki 0 1 0
BeeGass 0 1 0
Mustafa-Hassan2001 0 1 0
aim-nara 0 1 0
Shan19900305 0 1 0
skotapati 0 1 0
valentinandrei 0 1 0
Stonepia 0 1 0
TiRune 0 1 0
dnikolaev-amd 0 1 0
mwlon 0 1 0
dulinriley 0 1 0
bertmaher 0 1 0
MatzeB 0 1 0
galv 0 1 0
yaochengji 0 1 0
xu-song 0 1 0
Alston-Tang 0 1 0
haampie 0 1 0
harshabhvr248 0 1 0
swolchok 0 1 0
alugorey 0 1 0
rlanday 0 1 0
inkcherry 0 1 0
sidt-meta 0 1 0
wlei-llvm 0 1 0
gag1jain 0 1 0
DenisVieriu97 0 1 0
alexcdennis 0 1 0
drewfustin 0 1 0
tchaikov 0 1 0
sanketpurandare 0 1 0
zejun-chen 0 1 0
shengbao-zheng 0 1 0
ahmadsarvmeily 0 1 0
dan-jacobson 0 1 0
soumith 0 1 0
cchan 0 1 0
DellCurry 0 1 0
Ryo-not-rio 0 1 0
frostedoyster 0 1 0
charlie-wt 0 1 0
lessw2020 0 1 0
adriaorenstein 0 1 0
jerrymannil 0 1 0
zixi-qi 0 1 0
crcrpar 0 1 0
Theo-Cheynel 0 1 0
zertosh 0 1 0
m1guelperez 0 1 0
adhithadias 0 1 0
chuanhaozhuge 0 1 0
kirtiteja 0 1 0
uniartisan 0 1 0
trixirt 0 1 0
CuiYifeng 0 1 0
arui-meta 0 1 0
Luthaf 0 1 0
danzimm 0 1 0
manuelcandales 0 1 0
Gasoonjia 0 1 0
davrot 0 1 0
erpang007chenfs 0 1 0
shink 0 1 0
retonym 0 1 0
akote123 0 1 0
jcaip 0 1 0
qchip 0 1 0
siju-samuel 0 1 0
adamjstewart 0 1 0
xuanzhang816 0 1 0
mengluy0125 0 1 0
erh94 0 1 0
frank-wei 0 1 0
judicaelclair 0 0 1
johnc-keen 0 0 1
Abhishekghosh1998 0 0 1
Laurick1 0 0 1
Gwihwan-Go 0 0 1
LOOKCC 0 0 1
HaoyuLiu12 0 0 1
NeoLegends 0 0 1
jokercw147 0 0 1
xfchangwei 0 0 1
linzs148 0 0 1
amitchawla1 0 0 1
wbigat2 0 0 1
Leonardo-Russo 0 0 1
dbl001 0 0 1
lezcano 0 0 1
ZenithGenius 0 0 1
y-sq 0 0 1
rogaits 0 0 1
hanwen-sun 0 0 1
s1030512149 0 0 1
sealoongleft 0 0 1
shyakocat 0 0 1
blackyang 0 0 1
v4if 0 0 1
Hjp-momojiji 0 0 1
lflis 0 0 1
Antonio-Moura-Coutinho 0 0 1
wht0948 0 0 1
vadimkantorov 0 0 1
xle97 0 0 1
pietrolesci 0 0 1
yiliu30 0 0 1
alexdremov 0 0 1
bryankaplan 0 0 1
younghuvee 0 0 1
FabianSchuetze 0 0 1
dannikay 0 0 1
BioGeek 0 0 1
quinnwillett 0 0 1
GdoongMathew 0 0 1
enrico-stauss 0 0 1
david-sitsky 0 0 1
battaglia01 0 0 1
MaltoseFlower 0 0 1
NicolasHug 0 0 1
rgommers 0 0 1
Vremold 0 0 1
lucasjinreal 0 0 1
valosekj 0 0 1
curtisvwalker 0 0 1
tinglvv 0 0 1
tylerjereddy 0 0 1
Qinlong275 0 0 1
RobuRishabh 0 0 1
wangjiangben-hw 0 0 1
AdrienCourtois 0 0 1
szmigacz 0 0 1
joacorapela 0 0 1
yaxan 0 0 1
guberti 0 0 1
sidijju 0 0 1
matthost 0 0 1
deo-abhijit 0 0 1
Coderx7 0 0 1
thanga-v2 0 0 1
rteehas 0 0 1
abcamiletto 0 0 1
akihironitta 0 0 1
muellerzr 0 0 1
Zzv213 0 0 1
optstat 0 0 1
UnbearableFate 0 0 1
cora-codes 0 0 1
airsplay 0 0 1
xTayEx 0 0 1
SperenzaNarra 0 0 1
KukumavMozolo 0 0 1
PeterSH6 0 0 1
kabyanil 0 0 1
aabtop 0 0 1
rvijayc 0 0 1
PierrunoYT 0 0 1
zhaohm14 0 0 1
accelerate321 0 0 1
SalmanMohammadi 0 0 1
Ignasijus 0 0 1
jamied157 0 0 1
yezhengmao1 0 0 1
fxmarty 0 0 1
ausstein 0 0 1
rohan-tan-bhowmik 0 0 1
BalancedTernary 0 0 1
xwang233 0 0 1
asglover 0 0 1
chadeos 0 0 1
AlexanderDokuchaev 0 0 1
leitian 0 0 1
kangchengX 0 0 1
Tim-Salzmann 0 0 1
ivodopyanov 0 0 1
Hongjie1Chu 0 0 1
platers 0 0 1
embg 0 0 1
coogle 0 0 1
carmocca 0 0 1
northfun 0 0 1
nbqu 0 0 1
wangkl2 0 0 1
sujuyu 0 0 1
ben-da6 0 0 1
biuq 0 0 1
ojh31 0 0 1
bigmover 0 0 1
ConnollyLeon 0 0 1
mattiadg 0 0 1
Giodiro 0 0 1
david-stojanovski 0 0 1
psandovalsegura 0 0 1
JamesMBartlett 0 0 1
EGanji 0 0 1
aws-caijune 0 0 1
Cztery 0 0 1
unsatisfying 0 0 1
fjneumann 0 0 1
fffelix-huang 0 0 1
Ly-Lynn 0 0 1
moghadas76 0 0 1
emosy 0 0 1
Quoding 0 0 1
ajindal1 0 0 1
urstrulyvishtan 0 0 1
Gamer-Guy12 0 0 1
Picaloer 0 0 1
SanityRemnants 0 0 1
emmaking-smith 0 0 1
maruel 0 0 1
svekars 0 0 1
Badr-MOUFAD 0 0 1
kentanabe 0 0 1
martintmv-git 0 0 1
seungjun-green 0 0 1
dcaustin33 0 0 1
mfbalin 0 0 1
Amir9663 0 0 1
ItamarKanter 0 0 1
songh11 0 0 1
wenbindu 0 0 1
rlrs 0 0 1
kshitij12345 0 0 1
johnmarktaylor91 0 0 1
kaiyuyue 0 0 1
samskalicky 0 0 1
tehbone 0 0 1
zezhang 0 0 1
Ma-Jian1 0 0 1
vshekhawat-hlab 0 0 1
Fannjh 0 0 1
xnming 0 0 1
Klomi 0 0 1
JoongunPark 0 0 1
LankyPoet 0 0 1
ebeyabraham 0 0 1
ezhang887 0 0 1
lanluo-nvidia 0 0 1
YousefMohamed101 0 0 1
mahao18cm 0 0 1
Viktor-Paul 0 0 1
DKchemistry 0 0 1
xiuguangLi 0 0 1
sujoysaraswati 0 0 1
KemalAltwlkany 0 0 1
Noor-Nizar 0 0 1
ptrblck 0 0 1
mr-raccoon-97 0 0 1
pritamdamania87 0 0 1
felipeliliti 0 0 1
cpprhtn 0 0 1
jiqing-feng 0 0 1
JohnGoldenGardiner 0 0 1
XinweiHe 0 0 1
zeobec 0 0 1
henrysky 0 0 1
StepinSilence 0 0 1
kcser-C 0 0 1
workingloong 0 0 1
scott-huberty 0 0 1
grimulkan 0 0 1
josephcappadona 0 0 1
rmdodhia 0 0 1
garrettbyrd 0 0 1
rscohn2 0 0 1
bohnstingl 0 0 1
Ogjerry 0 0 1
BeomseoChoi 0 0 1
yurivict 0 0 1
yliapis 0 0 1
nadav7679 0 0 1
cqy930325 0 0 1
michael080808 0 0 1
HolyWu 0 0 1

Don't miss what's next. Subscribe to Weekly Project News:
This email brought to you by Buttondown, the easiest way to start and grow your newsletter.