Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Pytorch: October 20, 2025 - October 27, 2025 (12:02:40)

Weekly GitHub Report for Pytorch

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

The current version of this repository is v2.6.0

1.2 Version Information:

Released on January 29, 2025, PyTorch 2.6 introduces significant enhancements including torch.compile support for Python 3.13, a new dynamic compilation control API torch.compiler.set_stance, and improved AOTInductor packaging and ABI compatibility. Notable highlights also include FP16 support on x86 CPUs, expanded Intel GPU support with simplified installation, and a backward-incompatible security improvement flipping the default weights_only parameter in torch.load, alongside numerous performance optimizations, bug fixes, and deprecations such as the discontinuation of official Anaconda channel packages.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. 4x performance regression for 3D convs with AMP on torch 2.9.0: This issue reports a significant performance regression—about 4 times slower—in 3D convolution operations when using Automatic Mixed Precision (AMP) with PyTorch version 2.9.0 compared to 2.8.0. The regression is confirmed on an RTX 4090 GPU using both a standalone benchmark and the nnU-Net project, and it appears related to the disabling of cuDNN for 3D convolutions due to correctness issues in a recent cuDNN release.

    • Commenters confirmed the regression occurs in nightly builds and provided a minimal reproducible example showing excessive kernel calls causing slowdowns. The root cause is linked to disabling cuDNN for 3D convs, and the issue is being treated as high priority with efforts to reenable cuDNN safely. Additional benchmarks and ops coverage for Conv3d are being considered, and users requested attention to related ops like ConvTranspose3d.
    • Number of comments this week: 8
  2. Add helper functions to massage common 3D+ params into 2D for Muon: This issue proposes adding helper functions to standardize the conversion of common 3D or higher-dimensional parameters into 2D formats specifically for the Muon module, aiming to simplify and automate parameter compression based on recognized patterns. The motivation is to explicitly accept only matrices for Muon and to research existing methods for parameter "smooshing" to create reusable utilities and documentation.

    • The discussion began with a volunteer expressing interest in contributing, followed by guidance emphasizing the need to research existing compression techniques. The contributor planned to start with documentation and was provided with a relevant resource link to aid their work.
    • Number of comments this week: 4
  3. randn_like should take a generator.: This issue requests that the function randn_like in PyTorch be updated to accept a generator parameter, similar to the existing randn function, to improve convenience and consistency. The user highlights that while randn allows specifying a generator, randn_like currently does not, which limits its usability.

    • The commenters express support for the feature and encourage the original poster to submit a pull request. One contributor offers to implement the change if no progress is made, while another invites the user to proceed with the PR, indicating no prior investigation into the implementation.
    • Number of comments this week: 3
  4. LELU Activation Function: Proposal for PyTorch: This issue proposes adding a new activation function called LELU (Logistic Error Linear Unit) to PyTorch as a computationally efficient and analytically consistent alternative to GELU, leveraging the logistic sigmoid function scaled by a factor derived from the logistic distribution variance. The discussion includes detailed implementations, benchmarking results comparing LELU to GELU and SiLU on different hardware, and optimized Triton kernel versions demonstrating competitive performance and correctness, along with training experiments showing LELU’s functional equivalence to GELU in a regression task.

    • The comments provide multiple LELU implementations including a PyTorch module using scaled SiLU, a benchmarking script comparing runtime on GPU and CPU, and a Triton-accelerated kernel with autograd support; results show LELU matches GELU in accuracy and training loss while sometimes being faster depending on hardware, and visualizations confirm similar activation shapes and distributions, supporting the proposal to integrate LELU into PyTorch.
    • Number of comments this week: 3
  5. aoti cross compile for windows failed with undefined reference to WinMain: This issue describes a failure when cross-compiling the aoti example for Windows on a Linux system, resulting in a C++ compile error related to an undefined reference to WinMain. The user provides detailed steps of their environment setup, including mingw installation, copying Windows CUDA libraries, and running a test script, but encounters a linker error during compilation.

    • The comments include a shared test script and a suggestion to try disabling precompiled headers by setting "aot_inductor.precompile_headers": False to potentially resolve the compilation issue.
    • Number of comments this week: 3

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler': This issue reports an ImportError encountered when attempting to import the name 'triton_key' from the 'triton.compiler.compiler' module, which causes a backend compiler failure in PyTorch's inductor backend during model compilation. The user provides detailed environment information, including PyTorch version 2.4.0 development build, CUDA 12.1, and Ubuntu 22.04, and demonstrates the error occurring while compiling specific pipeline components with torch.compile in a custom pipeline setup.
  2. Alternate algorithm for computing MaxPool2D under specific condition.: This issue proposes an alternate algorithm for computing MaxPool2D when the stride is equal to 1, by representing a larger kernel size (e.g., 5 or 7) as multiple smaller MaxPool2D operations with kernel size 3. This method aims to reduce computational cost on the CPU by decreasing the number of operations per cell and suggests modifying the MaxPool2D layer directly to avoid additional overhead during backpropagation, with demonstrated speedup in testing.
  3. cuda_utils.so: failed to map segment from shared object: This issue describes a problem encountered when running a PyTorch model inside a Docker container with a tmpfs mounted at /tmp having permissions set to 1777. Although the model compiles successfully, execution fails with an error indicating that the shared object cuda_utils.so cannot be mapped due to missing execute permissions on the file, despite the script running as root and the directories having appropriate permissions.
  4. Enable UFMT on all files in PyTorch: This issue addresses the task of enabling uniform formatting (UFMT) across all files in the PyTorch codebase, specifically targeting approximately 1,500 files that are currently excluded from UFMT. It outlines the process for removing files from the exclusion list, running the formatter, handling known formatting-related problems, and organizing the work by directory to facilitate incremental and reviewable changes.
  5. [JIT archive] Add a flag to not include debug files: This issue proposes adding a flag to the torch.jit.save() function that allows users to exclude debug files, specifically .debug_pkl files, from the JIT archive to reduce the overall file size. The motivation stems from observations that these debug files, which are only used for debugging purposes, can significantly increase the archive size without affecting model correctness, making the feature particularly beneficial for deploying smaller models on mobile devices.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 78

Summarized Issues:

  • Functionality and API Enhancements: Several issues request improvements or additions to PyTorch functions and APIs to enhance usability and compatibility. These include adding generator support to randn_like for convenience, supporting local sentinel values in Dynamo tracing to avoid graph breaks, adding a mask parameter to Conv2d for efficient masked convolutions, and enabling extras packages installation to simplify dependency management. These enhancements aim to make PyTorch more flexible and user-friendly in various scenarios.
  • [issues/165865, issues/165901, issues/166080, issues/166167]
  • Compilation and Runtime Errors in Torch Compile and Inductor: Multiple issues report errors and unexpected behaviors during compilation or runtime with torch.compile and the Inductor backend. Problems include assertion errors with torch.bmm and Triton backend, compilation errors with side-effect operations inside torch.cond, dead code elimination failures due to aliasing and mutations, and assertion failures during embedding lowering caused by unexpected dtype. These bugs hinder reliable compilation and execution of PyTorch models using these newer compilation features.
  • [issues/165892, issues/165981, issues/166009, issues/166042]
  • Distributed and Parallel Computing Issues: Several issues highlight problems related to distributed training and parallelism. These include silent shape mismatch errors when loading 1D tensors into scalar parameters, lack of vmap support with checkpointing causing runtime errors, unexpected conversion of distributed nn.Parameter to tensor on to_local(), and requests for no_shard strategy support in fully_shard API to control sharding levels. These issues affect correctness and flexibility in distributed model training.
  • [issues/165873, issues/165880, issues/166153, issues/166156, issues/165933]
  • Backend and Hardware Compatibility Problems: There are reports of compatibility and performance issues across different hardware and backends. Examples include incorrect results using torch.bmm on large CPU tensors with specific matmul precision, CUDA backend size limits causing failures in torch.linalg.eigh, ROCm test instability and failures, and CUDA illegal memory access errors on certain GPUs with compiled kernels. These problems impact reliability and performance on various platforms.
  • [issues/165906, issues/166004, issues/165966, issues/166070, issues/166108]
  • Profiling and Debugging Limitations: Issues point out gaps and failures in PyTorch's profiling and debugging tools. Requests include enhanced documentation for profiler key_averages, flaky test failures in profiler-related tests, and memory event recording failures when profiling all threads. These limitations reduce the effectiveness of performance analysis and debugging workflows.
  • [issues/165907, issues/165949, issues/166121]
  • ONNX Export and Model Conversion Failures: Several issues report failures when exporting models or components to ONNX format, including errors with batch normalization layers in torchvision CNNs and conversion errors triggered by specific functions like math.trunc. These failures limit interoperability with other frameworks and deployment tools.
  • [issues/166110, issues/166163]
  • Memory and Resource Management Bugs: Reports include memory overlap issues in in-place triangular matrix operations causing silent errors, semaphore resource leakage warnings in multiprocessing with torch.compile, and internal assertion failures in CUDA caching allocator during model decoding. These bugs can cause crashes, leaks, or silent data corruption.
  • [issues/165987, issues/166061, issues/166234]
  • Documentation and Usability Gaps: Some issues highlight missing or unclear documentation, such as the behavior of the dim argument in torch.unique, and user requests like adding a Chinese README version. Improving documentation clarity and localization can enhance user experience and reduce confusion.
  • [issues/165985, issues/166143]
  • Performance Regressions and Optimization Opportunities: There are reports of significant performance regressions, such as a 4x slowdown in 3D convolutions with AMP in PyTorch 2.9.0, inefficiencies in MultiheadAttention fast_path with attention masks, and proposals to improve associative_scan() performance using NVIDIA CCCL. Addressing these can restore or improve PyTorch's computational efficiency.
  • [issues/166122, issues/166166, issues/165999]
  • Build and Environment Issues: Several issues describe build failures or environment-related problems, including linker errors when cross-compiling for Windows, glibc version mismatches causing build confusion, CUDA architecture recognition errors, and concerns about Conda licensing in Dockerfiles. These affect developers' ability to build and deploy PyTorch reliably.
  • [issues/166093, issues/166101, issues/166120, issues/166233]
  • Alias, Mutation, and Autograd Bugs: Some issues report subtle bugs related to aliasing and mutation handling in compilation and autograd. Examples include incorrect Dead Code Elimination with fallback operators producing aliases, and custom autograd functions returning views with incorrect requires_grad attributes. These bugs can cause incorrect gradients or compilation failures.
  • [issues/166009, issues/166131]
  • Dynamo and Bytecode Transformation Errors: Multiple issues report internal errors and key errors during Dynamo compilation and bytecode transformation, often triggered by graph breaks, conditional attributes, or unsupported function calls like collections.defaultdict. These errors disrupt the tracing and compilation process in Dynamo.
  • [issues/166033, issues/166176, issues/166238]
  • Feature Requests for Backend and Accelerator Support: Requests include adding DispatchKey.AutocastXPU support in Triton backend, enabling graph capture and profiling for custom accelerator backends, and evaluating XLA as a compile-time backend. These aim to broaden hardware and backend support in PyTorch.
  • [issues/166054, issues/166106, issues/166205]
  • Miscellaneous Proposals and Fixes: Other issues cover a variety of topics such as adding a new fixed-scaling sigmoid activation (LELU), improving optimizer management with OptimizerDict, standardizing parameter compression for Muon, and adding the six library as a submodule to avoid repeated downloads. These contribute to PyTorch's feature set and build efficiency.
  • [issues/165982, issues/166208, issues/166209, issues/166064]

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 18

Summarized Issues:

  • MPS Backend Numerical and Functional Issues: Several problems affect the MPS backend, including incorrect and unstable results from torch.linalg.inv during batched matrix inversions, inconsistent error handling in torch.linalg.lu_factor on singular matrices, missing support for torch.linalg.householder_product, and non-deterministic outputs from index_copy likely caused by repeated indices. These issues highlight discrepancies between MPS and CPU implementations that can lead to incorrect computations or unsupported operations on Apple Silicon devices.
  • issues/165850, issues/165870, issues/166089, issues/166237
  • CUDA and Hardware Compatibility Problems: Users face CUDA errors and compatibility issues including a CUDA out-of-memory error on NVIDIA RTX 5090 due to an outdated CUDA runtime, lack of support for the RTX 5090 GPU's compute capability sm_120, and a CUDA kernel launch failure in reflect padding mode when batch sizes exceed 65536. These problems cause runtime failures and warnings, requiring driver updates or additional hardware support in PyTorch.
  • issues/165861, issues/165964, issues/166060
  • ROCm Platform Test Failures and Disabling: Multiple tests such as test_allocator_backend in TestCudaMallocAsync and test_blockwise_nvfp4_with_global_scale_512_128_256_cuda have been disabled on ROCm due to failures on the main branch, with ongoing efforts to implement fixes rather than revert recent changes. Temporarily disabling these tests prevents them from obscuring continuous integration results while the team works on forward fixes.
  • issues/165872, issues/166027
  • PyTorch Dynamo Tracing and Export Failures: The _dynamo_graph_capture_for_export function exhibits multiple issues including incorrect user code stack traces causing confusing AttributeErrors, failure to trace modules with unsupported keyword argument types like BlockMask, and errors when passing keyword arguments to aot_export_joint_with_descriptors. These problems complicate debugging and prevent successful model export with conditional computations or certain argument types.
  • issues/165911, issues/165948, issues/165951
  • Documentation and Usage Clarifications: The torch.mean documentation lacks clarity regarding unsupported integer input types such as torch.long, which cause runtime errors. Explicitly stating that only floating-point inputs are valid aims to reduce user confusion and prevent misuse of the function.
  • issues/166020
  • Graph Partitioning and Forward Method Argument Errors: Partitioning a PyTorch FX graph using example code results in runtime failures because the generated graph module's forward method lacks required input arguments. This issue prevents correct execution of partitioned graphs and requires fixes to the partitioning logic.
  • issues/166034
  • Compiler Internal Errors on RISC-V with RVV: Compiling DepthwiseConvKernel.cpp with GCC 14.2 for RISC-V with RVV enabled triggers an internal compiler error due to read-modify-write operations on the same memory reference. The issue is a GCC compiler bug that can be avoided by refactoring code to use temporary vectors before writing back.
  • issues/166057
  • Feature Request for New Convolution Layer: A proposal has been made to add torch.nn.DiagonalConv2d, a convolution layer performing operations along input tensor diagonals to enable diagonal feature extraction with a different output shape than standard 2D convolutions. This feature aims to expand PyTorch's convolution capabilities.
  • issues/166069
  • Default Device Behavior Clarification: The torch.normal function generates CPU tensors even when the default device is set to CUDA via torch.set_default_device('cuda'), which is expected behavior because factory functions do not respect the default device unless explicitly specified. This clarification helps users understand device assignment behavior in PyTorch.
  • issues/166104
  • Infrastructure Outage Impact: A major AWS outage caused the PyTorch project's GitHub Actions infrastructure to go down, with ongoing recovery and mitigation efforts described. This incident affected continuous integration and development workflows temporarily.
  • issues/165909

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 155

Key Open Pull Requests

1. CustomOp Inline Fusion: This pull request extends the custom operation autotuning framework by adding inline fusion support, allowing the best decomposition of a custom op to be inlined directly into the computation graph to enable better fusion with surrounding operations, thereby improving performance and memory efficiency.

  • URL: pull/165952
  • Merged: No
  • Associated Commits: 2bd0a, 3c282, 23cba, 63483, 2c1ec, 68826, b2b93, a9111, 124bc, c9816, 9a27f, 901dc, 6623f

2. [XPU] [1/2] add fp8 scaled_mm implementation for XPU: This pull request implements the scaled_mm operation for XPU, supporting TensorWise and RowWise scaling with fp8 data types (fp8_e4m3 and fp8_e5m2), while deferring BlockWise scaling and operation registration to subsequent pull requests to reduce review complexity.

  • URL: pull/165978
  • Merged: No
  • Associated Commits: cd27f, 9faa0, 0257a, 02a71, 9b96f, 969e6, 3a5c5, 5814f, bec7d, aef0f, 46730, b81f5, e1f2a

3. [Inductor UT] Enable more UTs for Intel GPU.: This pull request enables additional Inductor unit tests for Intel GPU and increases the number of test runners from 8 to 12 to accommodate the expanded test suite and prevent continuous integration timeouts.

  • URL: pull/166047
  • Merged: No
  • Associated Commits: e9aff, 003e5, 6a6d4, 1db0f, ecea2, fd73d, 4e7ae, 109ae, 08888, 0d3d3, 1a674

Other Open Pull Requests

  • AOT Export and Inductor Enhancements: Multiple pull requests improve AOT export and PyTorch Inductor functionality by updating export methods to handle symint creation correctly and adding full graph autotuning for kernel generation. These changes ensure better tracing, metadata tracking, and kernel output consistency in compiled models.
  • [pull/165931, pull/166053, pull/165969, pull/165967]
  • DTensor Local Tensor Mode Expansion: Several pull requests enable and expand local tensor mode support for DTensor tests, including redistribute tests with uneven sharding and a broad set of operations such as optimizers, matrix ops, and convolutions. These updates add missing functional collectives and improve compatibility with local tensor machinery.
  • [pull/166081, pull/166105]
  • Associative Scan Lowering and Masking: Two pull requests focus on lowering the reverse flag for the associative_scan operation to the Triton level and masking computations for zero-loaded inputs. This work prepares the groundwork for more efficient and correct associative_scan execution in Dynamo.
  • [pull/166100, pull/166099]
  • Dynamo Component Improvements: Pull requests add missing XOR binary operation support, replace FUNCTION_MATCH with CLASS_MATCH guards, and improve debugging hooks in __torch_dispatch__. These changes enhance Dynamo's functionality, readability, and observability.
  • [pull/166065, pull/166217, pull/166142]
  • CI Pipeline and Testing Enhancements: Multiple pull requests improve continuous integration by integrating Attention operation tests, adding a cuDNN version smoke test, modifying ROCm CI workflow for stability, and porting tests to Intel GPU support. These efforts increase test coverage and reliability across platforms.
  • [pull/165915, pull/165891, pull/165997, pull/165886]
  • Code Quality and Safety Improvements: Pull requests replace C-style casts with C++-style casts and switch the type checker from MyPy to Pyrefly to reduce lint noise and improve code quality management. These changes contribute to safer and cleaner codebase maintenance.
  • [pull/165891, pull/166197]
  • Tensor Operation Updates in torchfuzz: One pull request adds and updates multiple tensor operations such as split, chunk, stack, cat, expand, gather, cumsum, clamp, and index_select within the torchfuzz testing framework to enhance fuzz testing coverage.
  • [pull/166221]
  • Graph and Module Compilation Improvements: Pull requests introduce GraphModule.recompile_submodules to ensure submodules are recompiled and implement cudagraph partitioning as an FX pass to decouple graph partitioning from cudagraph wrappers. These changes improve modularity and compilation correctness.
  • [pull/166002, pull/165945, pull/165922]
  • MIOpen and Precision Support under ROCm: One pull request adds mxfp8 precision support, revamps MIOpen integration following best practices, and adds GitHub workflows for IFU automation while maintaining backward compatibility.
  • [pull/166184]

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 119

Key Closed Pull Requests

1. updated supported/prefer hipblaslt architectures : This pull request updates the supported and preferred architectures for hipblaslt in the PyTorch project to ensure compatibility and optimization for targeted GPU architectures.

  • URL: pull/166133
  • Merged: No
  • Associated Commits: 421d4, 8de5c, 03eb1, 3d53a, 998ff, 4f079, ec45b, 7d5af, 9f3fd, 3e6f0, 8e00a, 2b0f8, 3db12, d91af, 0cb3b, b1040, 8b879, 7228e, 9fd94, 34725, 30a9a, f6166, 90d16, a83d7, 65508, b0543, d1d97, f885f, d2444, 0afa9, 058b5, fd227, 6e080, 17578, 983b9, bd17a, f0101, ff217, b5353, 2140d, e52ee, 779e6, 40e74, 8dd85, f2b69, 9bd20, 390ea, b30e6, ad337, 06152, 1d51e, f41c6, 5d526, 620eb, a6c04, d19e0, 55346, 9167a, 20a0e, c76b2, a1cb3, 71a30, 8fe04, 18a50, 19367, 4b463, 1befb, fad6b, 2f824, 1963d, 30252, 3d102, cb987, 85ac5, 62c67, 86e58, 2074e, 2b25d, ca125, 96009, d568c, b26dd, 53829, 7b590, 61c07, 730c7, fb814, eb343, 9f118, ecc20, 1b442, cdfe1, 2d72f, a0ffd, 22d02, ed0d0, d010d, 9c429, ccdb1, ad6b8, 77a67, ade02, e96dc, 2975e, b4af4, 1d7b9, 2067a, eb471, d2d97, c3d28, 4febb, 419fb, ab27a, 0def0, 75c80, c03be, 64359, b2fb6, 8d179, fd4b1, c1404, 1a9ca, b2d45, 7b2a4, 6aaab, 0e570, 9a46f, 9596b, 9ea02, 675f8, db3ba, aeb64, a20c7, 66514, 0b82d, bd740, dfd38, 245bf, 2cd73, cbd27, fe1f5, b2b16, 336f2, 7a520, 0a0b4

2. [dynamo][remaining] Replace UserFunctionVariable with VariableTracker build: This pull request proposes replacing UserFunctionVariable with VariableTracker build in the Dynamo component to prevent future issues related to functools.partial or callable objects.

  • URL: pull/165896
  • Merged: No
  • Associated Commits: 60880, b9628, 456a3, 03d0c, 7ee52, 1476d, b75a2, c2582, b051d

3. Refactor api and configs of overlapping: This pull request aims to refactor the API and configuration management of the overlapping module by migrating important configuration values into a dedicated class, passing them directly into the class, and adding an optional configuration to enable inside inductor functionality.

  • URL: pull/166130
  • Merged: No
  • Associated Commits: 07d13, ed4b2, f4278, e7bb4, 6d6fc, 00427, 27652

Other Closed Pull Requests

  • API Hiding and Namespace Encapsulation: Several pull requests focus on hiding APIs and symbols within specific namespaces in PyTorch to prevent symbol conflicts and improve encapsulation. These include introducing macros for hidden namespaces, hiding APIs in the torch::stable and torch::headeronly namespaces, and using alternative methods to hide stable Library structs, aiming to reduce exposed interfaces and avoid unintended cross-extension usage.
    • pull/166076, pull/166077, pull/166078, pull/166079
  • Runtime Assertions and Compiler Robustness: One pull request addresses dropped runtime assertions in conditional higher order operations by ensuring the runtime assertion FX graph pass runs on subgraphs and resetting the fake mode unbacked memo across speculate subgraph invocations. This improves the correctness and robustness of runtime asserts across various compiler phases such as eager, aot_eager, and inductor.
    • pull/165893
  • Optimization and Bug Fixes in FX and Foreach Functors: Pull requests include an optimization attempt for the torch.fx.Node.replace_all_uses_with method and a bug fix ensuring consistent definition of the chunk_size variable as int64_t for Foreach functors. These changes aim to improve performance and correctness in their respective components.
    • pull/165889, pull/165971
  • Python Bytecode and Opcode Compatibility: A pull request fixes the creation of the BINARY_SUBSCR opcode to maintain compatibility with Python 3.14 and later, where BINARY_SUBSCR was replaced by BINARY_OP(opcode=BN_SUBSCR). This ensures PyTorch's bytecode handling remains up to date with Python changes.
    • pull/165864
  • Documentation and Typographical Corrections: One pull request corrects typographical errors in the MTIA backend documentation, fixing grammatical mistakes and misspelled parameter names to improve clarity and accuracy.
    • pull/165898
  • Kernel Configuration Flexibility: A pull request proposes enabling the BlockPtrOptions and TensorDescriptorOptions classes within TritonKernel to be overridden, allowing subclasses to implement custom behavior and increasing kernel configuration flexibility.
    • pull/165899
  • Type Suppressions and Linter Fixes in Inductor Runtime: One pull request attempts to reintroduce type suppressions to the _inductor/runtime module after a previous revert, including running a linter and removing changes to third-party code to maintain code quality and consistency.
    • pull/165918
  • Backend-Specific Enhancements and Fixes: Multiple pull requests target backend improvements, including adding an XPU component for the persons_of_interest module, deserializing loads in the planar sum portion of the reduce() and stats() functions for the ROCm backend, and moving the hypot function implementation to the MPS backend to prevent crashes with integer tensors.
    • pull/165920, pull/165927, pull/166021, pull/166216
  • Autotune and Inductor Stability Improvements: A pull request aims to gracefully restart the autotune subprocess in PyTorch Inductor after CUDA kernel launch failures during benchmarking, preventing unrecoverable states and allowing autotuning to continue under specific settings.
    • pull/166073
  • Trunk Tagging Workflow Enhancement: One pull request enhances the trunk tagging workflow by enabling the creation of tags for multiple commits within a single push event, addressing limitations with ghstack pushes that include multiple commits.
    • pull/165937
  • Deterministic Mode in Inductor: A pull request proposes enabling Inductor's deterministic mode by integrating it with torch.use_deterministic_algorithms to ensure reproducible behavior in PyTorch computations.
    • pull/165950
  • Labeling Automation Updates: One pull request proposes minor updates to label-to-label automation, making the "vllm-compile" label imply "module: vllm" and "oncall: pt2," while disabling automatic labeling of Flex issues as HigherOrderOperators to reduce noise and allow manual application of such labels.
    • pull/166172
  • Unmerged Feature and Fix Proposals: Several pull requests propose new features or fixes that were not merged, including adding documentation for Symmetric Memory, adding a generator argument to rand*_like APIs, fixing the broken vllm test build, relanding Node class method moves from Python to C++, and adding an XPU component for persons_of_interest.
    • pull/166148, pull/166159, pull/166146, pull/165882, pull/165920

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
bobrenjc93 201 38 28 12
cyyever 164 57 0 25
anijain2305 179 19 4 7
malfet 82 13 10 70
pianpwk 113 29 1 3
Skylion007 15 8 1 112
eellison 79 9 2 41
laithsakka 93 10 3 23
guangyey 96 12 0 21
ezyang 39 12 8 67

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.