Weekly GitHub Report for Pytorch: December 01, 2025 - December 08, 2025 (12:02:17)

        December 8, 2025

Weekly GitHub Report for Pytorch: December 01, 2025 - December 08, 2025 (12:02:17)

            Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant enhancements including torch.compile support for Python 3.13, a new dynamic compilation control API torch.compiler.set_stance, and improved AOTInductor packaging and ABI compatibility. Notable highlights also include FP16 support on X86 CPUs, expanded Intel GPU support, FlexAttention for X86 CPUs targeting LLMs, and a backward-incompatible security improvement flipping the default of torch.load to weights_only=True, alongside the deprecation of official Conda package publishing.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

Triton Update 3.6 broke - trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test : This issue reports that the update to Triton version 3.6 caused certain Linux CUDA tests on the PyTorch trunk branch to hang or timeout, specifically due to a suspected deadlock involving the x_grid_barrier synchronization primitive in Triton-generated kernels. The investigation includes reproducing the problem, analyzing generated Triton code, comparing LLVM IR and PTX outputs between versions 3.5 and 3.6, bisecting commits to identify the regression, and planning a revert and further fixes to address layout and synchronization mismatches introduced by the update.

The comments detail multiple test timeouts and hangs traced to a cooperative reduction kernel using x_grid_barrier, with no failures when tests run individually, suggesting inter-test interference. The investigation involved deep dives into Triton-generated code, LLVM and PTX differences, and bisecting nightly builds to isolate the problematic commit, revealing a reverted upstream change causing the issue; a revert on the release branch was verified to unblock CI, with plans to understand and fix the root cause in the kernel or grid generation logic.
Number of comments this week: 21

torch.compile bug: class-level list mutations are dropped: This issue reports a bug in PyTorch's torch.compile where mutations to class-level lists (specifically using list.append) inside a model's forward method are dropped, resulting in unexpected behavior where the list remains empty after compilation. The discussion identifies that the problem arises because the tracing mechanism does not properly handle mutations on global lists due to missing source tracking and incorrect mutation type handling, and contributors explore potential fixes including adjusting mutation checks and improving how variable sources are managed during tracing.

The comments analyze the root cause of the graph break triggered by list.append during compilation, confirming that the tracing system does not support this mutation method properly. They discuss the risks of broadly enabling mutation tracking, clarify the role of source tracking in variable creation, and propose code refactors to separate mutable and immutable variable handling. Contributors share debugging logs, smaller repro scripts, and audit related code files to ensure consistent source passing, culminating in a proposed function to create mutable list variables with proper source tracking, which resolves the reproduction case.
Number of comments this week: 10

[DeviceMesh / Torch>=2.9.1] Feature regression on DeviceMesh._flatten() + DeviceMesh.getitem() after moving to CuTe backend for mesh algebra.: This issue reports a feature regression in the DeviceMesh API where flattening and indexing operations that worked in Torch 2.9.0 fail in versions 2.9.1 and later due to changes in the CuTe backend's coalesce method, specifically when flattening dimensions that do not include the last dimension. The user requests clarification on whether this behavior is intended and seeks guidance on future-supported APIs for sub-mesh retrieval, as the current _flatten method may be deprecated and impacts their usage in distributed sharding strategies.

The discussion confirms the regression is a corner case related to the CuTe backend's handling of strides and that the problematic behavior is slated for deprecation rather than immediate removal. Contributors suggest using the complementary _unflatten API as a more explicit and future-proof approach, emphasizing a design philosophy favoring explicit mesh transformations over implicit ones. The issue was fixed in a recent pull request, and ongoing efforts aim to improve DeviceMesh usability and stability, with users encouraged to migrate to recommended patterns and provide feedback on related bugs.
Number of comments this week: 9

[DTensor] A New Placement -- RaggedShard: This issue proposes adding a new DTensor placement called RaggedShard to PyTorch DTensor, which supports uneven sharding across devices with configurable sharding granularity, addressing limitations of the current even-only Shard placement. The feature aims to enable advanced use cases such as block-wise quantization, structure-aware optimizers, and improved training throughput by eliminating unnecessary data copies, while also requesting a simpler interface for adding custom placements without intrusive core changes.

The comments discuss the technical challenges of integrating RaggedShard, especially regarding operator sharding rules and multi-dimension sharding complexity. Contributors express interest in making DTensor more extensible for custom sharding types, highlight the importance of contiguous memory for performance, and explore potential approaches like symbolic integers and redesigning sharding propagation rules to better support RaggedShard.
Number of comments this week: 7

[Py 3.14] Cannot swap t1 because it has weakref associated with it: This issue reports test failures encountered during the Python 3.14 migration of PyTorch, specifically caused by a runtime error when attempting to swap tensors that have associated weak references. The problem appears linked to changes in garbage collection behavior affecting FakeTensor.fake_mode and its weakref caches, leading to difficulties in managing tensor parametrizations and requiring investigation into potential fixes involving reference handling and garbage collection cleanup.

The discussion centers on identifying the root cause related to garbage collection changes in Python 3.14 affecting weak references in fake tensors, with attempts to fix it by adjusting cache cleanup and reference types; multiple approaches were tested but caused other failures, leading to ongoing efforts to manually clean reference cycles or add explicit garbage collection calls to resolve the issue more cleanly.
Number of comments this week: 7

2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler': This issue reports an ImportError encountered when attempting to import the name 'triton_key' from the 'triton.compiler.compiler' module, which causes a backend compiler failure in PyTorch's inductor backend during model compilation. The user provides detailed environment information, including PyTorch version 2.4.0.dev, CUDA 12.1, and Ubuntu 22.04, and demonstrates the error occurring in code that compiles parts of a pipeline with torch.compile using the "reduce-overhead" mode.
Alternate algorithm for computing MaxPool2D under specific condition.: This issue proposes an alternate algorithm for computing MaxPool2D when the stride is equal to 1, by representing a larger kernel size (e.g., 5 or 7) as multiple smaller MaxPool2D operations with kernel size 3. This method aims to reduce computational cost on the CPU by decreasing the number of operations per cell and suggests modifying the MaxPool2D layer directly to avoid additional overhead during backpropagation, with demonstrated speedup in testing.
cuda_utils.so: failed to map segment from shared object: This issue describes a problem encountered when running a PyTorch model inside a Docker container with a tmpfs mounted at /tmp having permissions set to 1777. Although the model compiles successfully, execution fails with an error indicating that the shared object cuda_utils.so cannot be mapped due to missing execute permissions on the file, despite the script running as root and directory permissions being correct.
Enable UFMT on all files in PyTorch: This issue addresses the task of enabling uniform formatting (UFMT) across all files in the PyTorch codebase, specifically targeting around 1,500 files that are currently excluded from UFMT enforcement. It outlines the process for removing files from the exclusion list, running the formatter, and managing preparatory fixes for known formatting-related problems, while also providing a detailed worklist organized by directory to coordinate contributions and reviews.
[JIT archive] Add a flag to not include debug files: This issue proposes adding a flag to the torch.jit.save() function that allows users to exclude debug files, such as .debug_pkl, from the JIT archive to reduce file size. The motivation stems from observations that these debug files significantly increase the archive size without affecting model correctness, which is particularly important for deploying smaller models on mobile devices.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 73
Summarized Issues:

Build and Compilation Failures: Several issues report build or compilation failures due to missing headers, invalid code generation, or backend incompatibilities. These include missing thrust/pair.h causing build failures, MPS Inductor backend generating invalid Metal shader code, and persistent compilation errors in Inductor benchmarks due to undeclared variables.  
issues/169266, issues/169290, issues/169664

Distributed Training and Communication Issues: Problems with distributed training include Gloo backend failing to connect ranks causing timeouts, and delayed backward allreduce communication when using composable.replicate with torch.compile, preventing overlap of computation and communication.  
issues/169275, issues/169461

Torch.fx and Code Generation Bugs: Using torch.fx.symbolic_trace on Sequential modules produces invalid Python syntax due to attribute naming, requiring changes to naming conventions or setattr usage to avoid syntax errors.  
issues/169276

Device and Hardware Support Limitations: Issues highlight hardware-specific limitations such as LayerNorm and BatchNorm failing to compile on Apple M4 GPU with MPS backend, Conv2d being slower on macOS CPUs due to missing MKLDNN backend, and FP8 lowering tests failing on certain NVIDIA devices due to hardware constraints.  
issues/169290, issues/169294, issues/169316

Feature Proposals and API Enhancements: Several proposals aim to improve PyTorch functionality, including adding a debug mode context manager, a new DTensor placement RaggedShard for uneven sharding, and modifying torch.cuda.get_arch_list() to work without a GPU present.  
issues/169318, issues/169320, issues/169414

Installation and Runtime Failures: Installation issues such as missing CUDA runtime libraries on Amazon Linux 2023 cause import errors, and Windows import failures starting from version 2.9.0 due to DLL initialization errors related to Visual C++ Redistributable dependencies.  
issues/169330, issues/169429

Memory and Resource Leaks: A memory leak occurs during the backward pass in the reentrant gradient checkpointing implementation of FSDP2, while other versions do not exhibit this problem.  
issues/169349

Profiler and Debugging Issues: The PyTorch C++ Profiler fails to collect information for the "privateuseone" backend due to incorrect flag handling, and Inductor pattern matcher improvements are proposed to add debug output and fix error message formatting.  
issues/169357, issues/169440

DTensor and Distributed Tensor Bugs: DTensor has bugs such as torch.equal() failing when comparing scalar to sharded tensors due to incorrect redistribution logic, and a violation in redistribute_cost function where direct redistribution cost is unexpectedly higher than via intermediate states.  
issues/169361, issues/169439

Platform and API Inconsistencies: The C++ Generator API requires platform-specific compilation unlike tensor creation, and importing .pt2 models shows inconsistent input validation between Windows and Linux, causing platform-dependent quantization failures.  
issues/169371, issues/169659

Test Failures and CI Issues: Multiple tests fail or are disabled due to regressions, flaky behavior, or environment changes, including parametrization tests failing after Python 3.14 migration, disabled HFPretrained tests on Linux, and CI failures after PyTorch 3.14.1 release.  
issues/169388, issues/169481, issues/169586

CUDA and GPU Backend Problems: Issues include SIGIOT stack smashing errors in CUDA CI tests indicating memory corruption, hipblaslt MI300 TF32 accuracy problems, and regression in training with Inductor backend causing illegal memory access during backward pass.  
issues/169390, issues/169392, issues/169671

Export and Serialization Bugs: Exporting models with sparse tensor inputs fails due to lack of dynamic shape support, and sequential torch.load() calls on the same stream fail with the new zipfile serialization format, causing errors on the second read.  
issues/169488, issues/169763

Torch.compile and Dynamic Behavior Issues: Using torch.compile with certain features causes errors such as failures with in-place int64 addmm_ operations due to dimension expansion errors, and mixing dynamic Python scalars with 0-d tensors causing excessive recompilations.  
issues/169538, issues/169634

Performance and Optimization Requests: Proposals include adding gating fusion to flex attention epilogue for improved performance, and logging side effects in Dynamo to better track user code behavior.  
issues/169435, issues/169722

Backend Compiler and Runtime Errors: Inductor backend fails compiling models with while loops and scatter ops due to IndexError, and torch._higher_order_ops.while_loop causes CPU-GPU synchronization issues leading to performance degradation.  
issues/169615, issues/169672

Documentation and Development Tooling: Suggestions include adding new Spin CLI commands to simplify development workflows and standardizing OpenReg testing with examples and documentation for new device backends.  
issues/169479, issues/169597

Thread Safety and Concurrency Issues: The non-thread-safe ncclGetAsyncError function requires added thread safety to prevent race conditions and hangs caused by concurrent calls from multiple threads.  
issues/169484

Model Saving and Frozen Parameter Handling: Saving models with FSDP2 hangs during state dict iteration unless a flag to ignore frozen parameters is passed, indicating a bug in frozen parameter handling during save.  
issues/169448

Package and Index Accessibility Problems: All xformers package wheels under the PyTorch CPU-only index return 403 errors due to obsolete index references, causing CI failures and requiring removal or fixes.  
issues/169458

Dynamic Shape and Compilation Crashes: Compiling models with dynamic shapes involving transpose and floating-point scale interpolation crashes due to OverflowError from float infinity to int conversion.  
issues/169757

ONNX Export Failures: Exporting models to ONNX fails with assertion errors when using cdist with dynamic inputs in PyTorch 2.8.0, whereas it worked in earlier versions.  
issues/169758

Padding and Autotune Errors: Padding a dimension marked as dynamic during compilation produces incorrect code and autotune errors despite successful compilation.  
issues/169760

Documentation Build and Disk Space Issues: The CI runner runs out of disk space during Sphinx site builds due to unnecessary rebuilding of already built documentation, suggesting skipping the build to prevent failures.  
issues/169728

Numerical Accuracy and Backend Discrepancies: FFT functions produce incorrect energy normalization on Intel CPUs for specific input sizes, and adaptive max pooling on Apple MPS devices with torch.compile yields significantly incorrect results compared to eager mode.  
issues/169670, issues/169738

Segmentation Faults and Integer Overflow: ReplicationPad2d causes segmentation faults with extremely large padding values near INT64_MAX due to integer overflow instead of raising proper exceptions.  
issues/169741

Test Coverage and Execution Gaps: Distributed elastic rendezvous test files lack the standard entry point, suggesting these tests may never have been executed.  
issues/169752

Build and Installation Process Issues: Setting TORCH_TARGET_VERSION fails to hide headers properly in editable installs due to recent build_ext changes, requiring investigation and fixes.  
issues/169675

Nightly Build and Release Blockers: ROCm nightly builds have been broken since December 1, 2025, due to binary test failures, needing resolution before the Release 2.10 branch cut date.  
issues/169684

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 20
Summarized Issues:

Tensor slicing behavior: When slicing a tensor beyond its boundaries in PyTorch, the operation returns a tensor with a zero dimension instead of raising an exception. This behavior is consistent with Python and NumPy indexing, clarifying that it is expected rather than a bug.  
issues/169268

Import and module errors: Several issues involve import errors and attribute conflicts, such as a missing function get_num_sms in torch._inductor.utils and an AttributeError caused by a local directory shadowing the dill package. These errors prevent proper module loading and require fixes or workarounds to resolve.  
issues/169269, issues/169311

Compilation and build failures: Multiple build failures occur due to missing headers, CUDA incompatibilities, or Metal compiler errors, including a missing hipblas-common.h header in Triton 3.6, CUDA 13.1 build failures related to half-precision operators, and Metal shading language syntax errors in Inductor backend on MPS. These issues block successful compilation until addressed.  
issues/169313, issues/169643, issues/169756

Runtime errors with dynamic shapes and data types: Runtime errors such as a TypeError from applying the unary ~ operator to a 'SymBool' type and a NotImplementedError for the "index_cpu" operation on Float8_e4m3fn tensors highlight challenges with dynamic shapes and new data types in PyTorch. These issues affect model execution and require fixes in later versions or backend support.  
issues/169289, issues/169656

Test instability and failures: Several tests are marked unstable or disabled due to regressions, flaky behavior, or platform-specific issues, including vllm-test instability, inductor-pallas unit test failures, and test failures on Linux Jammy and XPU platforms caused by oneDNN upgrades or pinned updates. These affect CI reliability and require ongoing maintenance.  
issues/169298, issues/169451, issues/169480, issues/169556, issues/169587

Performance degradation in distributed training: Using Fully Sharded Data Parallel version 2 (FSDP2) with many distributed tensors causes gradient clipping operations to exhibit quadratic instead of linear time complexity. This inefficiency severely degrades performance for large models and was traced to DTensor sharding propagation logic, which was subsequently fixed.  
issues/169445

CI infrastructure and maintenance issues: Maintenance activities such as manual takedown and re-provisioning of dgx.b200 runners temporarily affected VLLM nightly benchmarks, and macOS CI jobs were disabled or marked unstable due to security vulnerabilities and runner availability problems. These actions impact CI throughput and require coordination with Meta and other teams.  
issues/169386, issues/169680, issues/169681

Library version and dependency issues: Linux x86 CPU wheels started failing due to missing GLIBCXX_3.4.32 in the standard C++ library, causing import errors during PyTorch smoke tests. This regression appeared suddenly and was not present in previous successful runs, indicating dependency version mismatches.  
issues/169331

Graph and compiler correctness bugs: A bug in the PyTorch compiler causes incorrect ordering of symbolic integer node inputs in the backward graph compared to the forward graph, potentially leading to assertion failures in regional inductor. This discrepancy affects graph correctness despite some cases running without error.  
issues/169712

Side effect detection improvements: The current method for detecting side effects when using fullgraph=True in vLLM is unreliable, prompting a request for a better approach. Improving this detection is necessary to ensure correct graph compilation and execution.  
issues/169598

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 253
Key Open Pull Requests
1. Tianren/dynamic range input fix: This pull request addresses fixes and improvements related to dynamic range input handling and conditional dispatch logic in the PyTorch project, including code cleanup, lint fixes, and adding tests to ensure proper functionality across multiple conditional cases.

URL: pull/169750

Merged: No

Associated Commits: 795f4, f5150, b655c, 75178, 34607, 1ea31, 54d56, 6f7e1, 5de53, bf488, 042c2, 961c8, ed295, 8cff4, 1c083, e5975, e1a04, 599cd, 769dc, ba893

2. [DO NOT MERGE] Testing cuDNN 9.15 with CUDA 13: This pull request is a speculative and experimental effort to test the integration of cuDNN 9.15 with CUDA 13.0 in the PyTorch project, including adding and fixing tests, adjusting build configurations, and managing test skips and failures related to this new CUDA version.

URL: pull/169718

Merged: No

Associated Commits: 72a0a, ce239, 726a2, da4c0, ce210, 82fb2, e953e, 2453e, bfe3f, 3520c, 86790, 5b685, 3ab6d, 0215c, 2dbad, adacc, 92b0c, ee16c, aa337

3. [dynamo] Add LazyConstantVariable: This pull request proposes adding a new feature called LazyConstantVariable to the dynamo component of the PyTorch project, as part of a stack of related changes tracked via ghstack.

URL: pull/169282

Merged: No

Associated Commits: b26bf, 71896, 88a05, 1428c, 0e4d7, 13696, 63ad2, 9faa9, 2fac5, 8b312, ac88f, bc2d3, 3c2bd

Other Open Pull Requests

XPU Memory Tracking Enhancements: Multiple pull requests introduce and enhance memory tracking features for the XPU backend in PyTorch, including the _record_memory_history feature in both frontend and C++ parts, and stack context tracking in the XPU caching allocator using record_trace. These changes aim to improve memory debugging and analysis capabilities while explicitly avoiding legacy APIs without identified use cases.  
pull/169442, pull/169559, pull/169280, pull/169404

TorchDynamo Autograd Tracing Improvements: Several pull requests focus on improving autograd operation tracing in TorchDynamo by enabling tracing through tensor.backward via rewriting calls, automatically wrapping autograd.grad with allow_in_graph, and introducing a configuration option to control tracing of autograd ops. These updates enhance graph tracing accuracy and address specific issues related to gradient function handling.  
pull/169415, pull/169389, pull/169493

Benchmark and Kernel Optimization Work-in-Progress: Work-in-progress pull requests aim to enable the benchmark_combo_kernel feature by default and filter out reduction nodes in combo kernels to optimize kernel operations. These iterative updates indicate ongoing efforts to improve performance and kernel efficiency in PyTorch.  
pull/169326, pull/169332

Bug Fixes and Issue Debugging: Multiple pull requests address specific bugs and issues, including fixing the PythonMod bound_sympy problem, debugging environment and dependency issues related to compiler and libraries, and resolving dlpack test failures by modifying test comparisons. These fixes improve stability and compatibility across the codebase.  
pull/169612, pull/169691, pull/169657, pull/169327

DTensor and Collective Operation Enhancements: Pull requests introduce a custom handler for DTensor min/max dimension reduction to ensure correct nonlinear and linear local reductions, and add the capability to capture the async flag of NCCL collective operations in execution traces. These improvements support correct distributed computation behavior and better debugging of collective operations.  
pull/169691, pull/169416

Code Refactoring and Type Check Optimization: A pull request proposes replacing isinstance checks for TensorVariable with the method call x.is_tensor() to improve code clarity and maintainability. This refactoring simplifies type checking in the codebase.  
pull/169441

Python Version Compatibility and Documentation Updates: Pull requests add warnings about torch.compile compatibility limitations with Python 3.14 and explicitly set multiprocessing start methods to maintain backward compatibility on Unix platforms. These changes help users avoid issues related to Python version upgrades.  
pull/169262, pull/169401

Linear Cross Entropy Function Enhancements: Separate pull requests add a fused linear cross entropy implementation optimized for CPU and introduce CUDA support for the same function, enabling GPU acceleration for this loss computation. These additions improve performance and hardware support for linear cross entropy.  
pull/169459, pull/169460

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 154
Key Closed Pull Requests
1. Hanuman: This pull request is about adding and refining continuous integration workflows, including SonarQube and PMD scan configurations, integrating vulnerability scanning with Grype, and making various updates and tests to the test.md file and workflow scripts, although it was not merged.

URL: pull/169297

Merged: No

Associated Commits: d9db7, e75be, ee6c5, 1103e, a3360, 1ca3f, df1d0, 0fe6f, fa056, b6ea5, 860ea, 4429a, 3fa45, ba9aa, c9932, af185, ec301, 3ad8d, 970af, cf97c, cad2f, b4e16, a6575, 8aecf, 52a58, 687f0, 870a4, b41bc, 4ca8f, e9f56, 926af, fa7e4, e69b3, b7856, 48c6a, 9e0f2, 6f5c8, 7a0af, 92f2c, d75f8, 45c00, 410bd, 7ae1a, 80463, 4216e, 0da94, 73b5a, 6d63f, 5e65c, 2dd3b, 22c3d, 07354, 87a8d, e8ca4, 1e7ce, eb5d6, b07ad, b96bc, 6e723, ccfcf, ba841, 57e60, 28bfd, b44c9, 9235c, daf0b, 4c608, c65ab, b9255, 6294e, 7cba9, 25f42, cbeda, cd2e2, f6315, 12bc0, eb0de, 7f31e, 5a83d, 0b385, 20ea0, 2966b, 4ba5f, c833c, e20b7, da481, 5f6fe, 6b1ee, 507ea, 6cbe4, 8ea84, 4e057, f4810, bb1d7, dfdd6, 532bd, 64d02

2. Add STD_CUDA_{KERNEL_LAUNCH_}CHECK: This pull request introduces a shim function torch_c10_cuda_check_msg that invokes c10::cuda::c10_cuda_check_implementation to retrieve formatted CUDA error messages and implements STD_CUDA_CHECK, which uses this shim to throw a std::runtime_error with the error message, providing an alternative to C10_CUDA_CHECK that throws c10::AcceleratorError, thereby aligning CUDA error handling with the STD_TORCH_CHECK approach but without propagating the error_code attribute to Python.

URL: pull/169385

Merged: No

Associated Commits: b77b6, f185f, 9a504, 7264a, ca764, 06c00, 9ad45, 1c398, e66db, c3dc5, f9770

3. [DebugMode] annotate tags: This pull request proposes adding a DebugMode.annotate(tag=...) API to allow users to attach arbitrary contextual tags to logs for better differentiation, implements this feature as a custom operation compatible with eager and aot_eager modes while registering it as a no-op for inductor, and lays groundwork for future integration with compiled graph metadata to enhance debugging capabilities.

URL: pull/169506

Merged: No

Associated Commits: d9d94, 3c616, 2fb2e, 0575b, d9e8a, d5730, c2237, f27ef, 8e647, 63de1

Other Closed Pull Requests

Pallas Backend Improvements: Multiple pull requests focus on enhancing the Pallas backend by adding expected failure marks to CPU tests, fixing reduction and squeeze operations, and adding argmax/argmin support. There was also an attempt to add special function support, although it was not merged.

  [pull/169655, pull/169654, pull/169422]

Python Version and Typing Updates: Several pull requests address Python version compatibility and typing improvements, including pinning CPython to 3.14.0 to avoid regressions, updating files to Python 3.10 typing syntax, fixing tuple type annotations, and updating the typing-extensions package for Python 3.14 compatibility. Some of these changes were not merged or partially reverted.

  [pull/169592, pull/169355, pull/169463, pull/169352]

Torch Compile and Testing Enhancements: There are pull requests that re-enable torch.compile tests for Python 3.12 and Windows, add GPU_FLAGS inside containers during testing, and propose running inductor unit tests automatically when workflow files change. The automatic test run PR was not merged.

  [pull/169387, pull/169397, pull/169398]

Boxing and Argument Handling: One pull request improves the TORCH_BOX functionality by modifying unbox_type_t to remove const and reference qualifiers before applying the type mapper, ensuring proper argument ownership and conversion.

  [pull/169563]

CUDA Graphs and Storage Allocation Fix: A pull request fixes the allocation size for edgeData in CUDA Graphs by using the count of dependencies, preventing potential storage issues.

  [pull/169576]

Code Cleanup and Refactoring: Some pull requests propose code simplifications such as removing the msg argument from raise_observed_exception and removing the LocalGeneratorObjectVariable::_get_inline_tracer() method, although the latter was not merged.

  [pull/169343, pull/169306]

Inductor Tolerance Adjustments: Two pull requests propose increasing tolerance levels for specific inductor tests to address internal tasks, but neither was merged.

  [pull/169310, pull/169309]

AOTI Optimization Bug Fix: A pull request fixes a bug in AOTI optimization related to mutated small named buffers by recording mutations during input graph unlifting and passing this info to Inductor graph lowering.

  [pull/169347]

MPS Backend Enhancements: Pull requests add support for index select on MPS sparse tensors and add missing shape validation for nn.MaxUnpool{1,2,3}d operations on the MPS backend to ensure correctness.

  [pull/169368, pull/169261]

DTensor Module Improvements: Pull requests add a warning mode to the DTensor ExplicitRedistributionContext and add missing support for the foreach_max operation, fixing issues with operation composition.

  [pull/169452, pull/169667]

NVSHMEM Heap Size Reduction: A pull request reduces the NVSHMEM heap size in the CI docker container to prevent multicast allocation failures on H100 GPU systems with limited memory, maintaining compatibility with distributed tests.

  [pull/169543]

Backward Compatibility Reproduction: One pull request attempts to reproduce a stability issue caused by a backward compatibility break in the PyTorch codebase.

  [pull/169303]

Return Type Annotation Addition: A pull request adds a return type annotation to the _unpickle_sdp_backend function to improve code clarity and type safety.

  [pull/169345]

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

jeffdaily
19
1
493
5

malfet
94
18
7
84

cyyever
94
37
0
32

guangyey
106
15
1
22

anijain2305
92
12
8
26

tugsbayasgalan
87
21
1
28

Skylion007
4
2
3
121

ezyang
29
7
25
69

karthickai
90
18
7
13

mikaylagawarecki
87
20
1
15

                            Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
jeffdaily	19	1	493	5
malfet	94	18	7	84
cyyever	94	37	0	32
guangyey	106	15	1	22
anijain2305	92	12	8	26
tugsbayasgalan	87	21	1	28
Skylion007	4	2	3	121
ezyang	29	7	25	69
karthickai	90	18	7	13
mikaylagawarecki	87	20	1	15