Weekly GitHub Report for Pytorch: August 11, 2025 - August 18, 2025 (12:05:34)

            Weekly GitHub Report for Pytorch: August 11, 2025 - August 18, 2025 (12:05:34)

                    Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant enhancements including torch.compile support for Python 3.13, a new dynamic compilation control API torch.compiler.set_stance, and improved AOTInductor packaging and ABI compatibility. Notable highlights also include beta-level FP16 support on x86 CPUs, expanded Intel GPU support with simplified installation, and a backward-incompatible security improvement flipping the default of torch.load to weights_only=True, alongside numerous performance optimizations, bug fixes, and deprecations such as the discontinuation of official Anaconda channel packages.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

[RFC] Simplify bookkeeping of DeviceMesh slicing, (un)flattening, ...: This issue proposes a comprehensive redesign of the internal bookkeeping system of DeviceMesh to unify slicing, flattening, and splitting operations under a single coherent model, improving both user reasoning and developer implementation without changing user-facing APIs. The core idea is to introduce a DeviceMeshStorage class that caches ProcessGroups keyed by sets of ranks, allowing DeviceMesh objects to act as lightweight views referencing this shared storage, thereby enabling efficient and scalable manipulation of device meshes using CuTe layouts.

The comment discussion extensively explores the functional versus in-place operation design, with consensus favoring functional operations that return new DeviceMesh instances rather than mutating existing ones. Participants debate the necessity and handling of root versus sub-mesh hierarchies, the concept of global singleton versus separate universes for DeviceMeshStorage, and the implications for composability with parallelism strategies like FSDP, TP, and PP. Several clarifications and implementation details are shared, including a linked prototype PR, and the conversation converges on preferring explicit, functional designs with separate universes while maintaining a shared cache for ProcessGroups internally.
Number of comments this week: 16

torch.Tensor methods return type annotation too broad for torch.Tensor subclasses: This issue addresses the problem that torch.Tensor methods currently have return type annotations that are too broad, always specifying torch.Tensor as the return type even when the actual returned object is an instance of a subclass. This overly broad annotation leads to false positives in IDEs and type checkers when working with tensor subclasses, and the proposed solution is to update the type annotations to use Self to better reflect the dynamic dispatch behavior that preserves subclass types in most operations.

The discussion explains PyTorch’s dynamic dispatch mechanism that preserves tensor subclasses in operations, highlighting the challenge of expressing this precisely in Python’s type system. Contributors debated the trade-offs between broad and more specific return type annotations, noting that while Self improves IDE support for common cases, the dynamic nature of PyTorch (including Modes that can override return types arbitrarily) means no static typing solution is perfect; the consensus leans toward adopting Self as a practical improvement while acknowledging edge cases and the need for subclass authors to document deviations.
Number of comments this week: 8

[Documentation Clarity] torch.min/torch.max gradient behavior: This issue addresses the lack of clear documentation regarding the gradient behavior of torch.min and torch.max functions in PyTorch, highlighting that their current descriptions are either missing or partially incorrect compared to torch.amin and torch.amax. The user provides examples demonstrating that the gradient propagation differs depending on whether the reduction is over all dimensions or a specific dimension, and requests that the documentation explicitly clarify these distinct behaviors.

The comments confirm that the gradient behavior described is intentional rather than a bug, acknowledge the need to provide detailed gradient information for torch.min and torch.max in the documentation, and express agreement to reopen the issue to address and improve the documentation clarity.
Number of comments this week: 5

Wrong-size gradients in Expert Parallel MoE: This issue reports that when using Expert Parallel (EP) sharding with a Mixture of Experts (MoE) model, the computed gradients appear to be exactly twice as large as expected, despite the loss values remaining consistent. The user provides a detailed reproduction script and suspects the problem could lie either in PyTorch’s distributed implementation, the torchtitan library, or their own code replication, but the root cause remains unclear.

The comments confirm the gradient doubling behavior is reproducible both in the user’s repro and in torchtitan itself, with instructions provided to replicate the issue in torchtitan’s training setup. Discussion participants debate whether the bug originates in PyTorch or torchtitan, and it is ultimately identified as a known torchtitan issue with a fix planned imminently.
Number of comments this week: 5

Abort was called at 435 line in file ./shared/source/os_interface/linux/drm_neo.cpp: This issue reports a crash occurring at line 435 in the file drm_neo.cpp when using the latest PyTorch 2.8.0+xpu version with an Intel B60 GPU for inference, which does not happen in previous versions. The user experiences an abort signal (SIGABRT) triggered just after execution finishes, suspecting a driver-related problem despite following recommended driver upgrade steps and attempting reinstallation.

The discussion centers on the likelihood of a driver issue causing the abort, with suggestions to gather detailed error logs using dmesg and SYCL debugging tools. The user confirms the problem persists after reinstalling drivers, and it is noted that the error occurs only on exit without blocking inference, possibly linked to the oneAPI 2025.1 runtime.
Number of comments this week: 5

2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 

ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler': This issue reports an ImportError encountered when attempting to import the name 'triton_key' from the module 'triton.compiler.compiler' during the use of PyTorch's torch.compile functionality with the 'inductor' backend. The user provides detailed environment information and code snippets showing that the error arises in a setup involving PyTorch 2.4.0 development version, CUDA 12.1, and Triton 2.2.0, indicating a possible incompatibility or missing symbol in the Triton compiler package.
Alternate algorithm for computing MaxPool2D under specific condition.: This issue proposes an alternate algorithm for computing MaxPool2D when the stride is equal to 1, by representing a larger kernel size (e.g., 5 or 7) as multiple smaller MaxPool2D operations with kernel size 3, which reduces the computational cost per cell. The suggested modification targets the MaxPool2D layer directly to avoid additional overhead during backpropagation and is expected to yield performance improvements specifically on CPU, as demonstrated by testing that showed a speedup of approximately 1.29 times.
cuda_utils.so: failed to map segment from shared object: This issue describes a problem encountered when running a PyTorch model inside a Docker container with a tmpfs-mounted /tmp directory set to permission mode 1777. Although the model compiles successfully, execution fails with an error indicating that the shared object cuda_utils.so cannot be mapped due to missing execute permissions on the file, despite the script running as root and the directories having appropriate permissions.
Enable UFMT on all files in PyTorch: This issue addresses the task of enabling uniform formatting (UFMT) across all files in the PyTorch codebase, specifically targeting approximately 1,500 files that are currently excluded from UFMT enforcement. It outlines the process for removing files from the exclusion list, running the formatter, and managing known formatting-related problems, while also providing a detailed worklist organized by directory to coordinate and track progress on this large-scale formatting effort.
[JIT archive] Add a flag to not include debug files: This issue proposes adding a flag to the torch.jit.save() function that allows users to exclude debug files, specifically .debug_pkl files, from the JIT archive to reduce the overall file size. The motivation stems from observations that these debug files, which are primarily used for debugging purposes, can significantly increase the archive size without affecting model correctness, making the feature particularly beneficial for deploying smaller models on resource-constrained devices like mobile platforms.

2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 93
Summarized Issues:

Distributed Backend and Parallelism Issues: Several issues report problems with PyTorch's distributed backends and parallelism features, including memory backend errors causing crashes or OOMs, gradient size discrepancies in Expert Parallel MoE, and unexpected gradient sharding behavior in FSDP with world size 2. These problems highlight instability and unclear behavior in distributed training setups and backend selections.  
issues/160285, issues/160289, issues/160320

CUDAGraphs and Memory Management Bugs: There are multiple reports of runtime errors and inefficiencies related to CUDA memory management, including tensor output overwriting in CUDAGraphs, excessive reserved GPU memory due to non-shared segments per stream, and synchronization bugs causing NaN gradients in FSDP2. These issues indicate challenges in managing CUDA memory and synchronization correctly across different PyTorch components.  
issues/160281, issues/160291, issues/160308

PyTorch Compiler and Inductor Backend Failures: Several issues describe crashes, performance regressions, and incorrect behavior in PyTorch's compiler and inductor backend, including segmentation faults on repeated compilation, slower performance in autotune modes, and runtime errors with complex tensor operations. These problems affect model compilation stability and runtime efficiency.  
issues/160296, issues/160303, issues/160305, issues/160388, issues/160391, issues/160399, issues/160495

MPS Backend Limitations and Bugs: Numerous issues report incomplete or incorrect support in the Apple Metal Performance Shaders (MPS) backend, including failures in clamp broadcasting, index_select with scalar indices, variance on zero-dimensional tensors, unsupported ConvTranspose3D with BF16/FP16, and incorrect outputs in pooling and copy operations. These highlight significant gaps in MPS backend functionality compared to CPU.  
issues/160734, issues/160737, issues/160738, issues/160739, issues/160740, issues/160743, issues/160744

Tensor Subclass and Dispatch System Conflicts: Issues reveal problems with tensor subclass compatibility, including overly broad return type annotations and conflicts with torch.library.triton_op decompositions that interfere with FunctionalTensorMode dispatch, causing runtime errors and poor static analysis support. These indicate the need for improved type annotations and dispatch integration.  
issues/160322, issues/160333

Sparse Tensor and Quantization Support Gaps: Several issues report missing implementations and errors related to sparse tensor operations, such as unsupported quantize_per_tensor on sparse COO tensors, failure to optimize sparse parameters with LBFGS, and sparse COO tensor creation errors. These gaps limit sparse tensor usability and optimization in PyTorch.  
issues/160622, issues/160631, issues/160637

Torch.compile and FX Graph Compilation Errors: Multiple issues describe failures and unexpected behavior when using torch.compile and FX graph compilation, including errors with nested_compile_region, unsupported scan operator, graph breaks with custom autograd.Functions, and failures with dynamic shapes or symbolic sizes. These problems affect the reliability of PyTorch's compilation pipeline.  
issues/160525, issues/160544, issues/160756, issues/160757

XPU and ROCm Platform Stability Issues: Reports include NaN gradients on Intel XPU GPUs during training, process hangs and deadlocks on ROCm with Windows, and test flakiness on ROCm platforms causing disabled tests. These issues point to platform-specific stability and compatibility challenges.  
issues/160787, issues/160759, issues/160784, issues/160785, issues/160786

Memory Leaks and Autograd Function Bugs: There are reports of memory leaks caused by custom autograd.Functions performing in-place operations marked dirty, and issues with mutations inside AOT dispatch causing assertion failures, indicating problems in memory management and functionalization during autograd.  
issues/160317, issues/160664

Build and Packaging Issues: Some issues highlight build failures and packaging inefficiencies, such as redundant dependency installations in build instructions, large CUDA SBSA wheel sizes, and MSVC compilation errors due to Python 3.14 internal header usage. These affect developer experience and distribution size.  
issues/160302, issues/160673, issues/160647

Model Export and Serialization Failures: Problems are reported with exporting models containing NamedTuple inputs or complex tensors, causing failures in ONNX export and torch.export.save serialization, limiting interoperability and model saving capabilities.  
issues/160547, issues/160749, issues/160761

Performance and Threading Control Issues: Issues include ignoring user thread limits during compiled CPU matrix multiplication and redundant memory copies in inductor cpp_wrapper, leading to inefficient resource usage and performance regressions.  
issues/160520, issues/160812

Type Promotion and Dtype Mismatch Bugs: Bugs are reported where forward-mode AD promotes tangent dtypes unexpectedly, and dequantize_per_channel fails with Float8 due to unsupported type promotion, causing runtime errors and incorrect computations.  
issues/160513, issues/160651

Distributed Process Group and Backend Support Limitations: The GLOO backend does not support MPS devices and fails to initialize on CPU devices properly, causing errors and limiting distributed training on these platforms.  
issues/160731, issues/160732

Test Suite Flakiness and CI Timeouts: MacOS tests in CI are timing out due to unskipped tests and poor test distribution, and several ROCm tests are disabled due to flakiness and SIGIOT errors, impacting test reliability and coverage.  
issues/160498, issues/160784, issues/160785, issues/160786

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 33
Summarized Issues:

Performance Improvements in CUDA and Inductor Backends: Several issues focus on enhancing performance by replacing inefficient fallback operations and fixing compilation errors in the Inductor backend. These include implementing a Triton-based CUDA kernel for weight-only quantized linear operations and resolving symbolic expression type errors during AOT compilation, which together aim to improve inference speed and compilation reliability.  
[issues/158849, issues/160535, issues/160646]

Precision and Autocast Issues on MPS and Float16: Problems with automatic mixed precision on Apple’s MPS device and silent computation errors with float16 inputs during compiled execution have been reported. These issues cause runtime errors or significant numerical discrepancies, highlighting the need for better precision handling and fallback to full precision where necessary.  
[issues/160332, issues/160730, issues/160746]

Build and Compatibility Failures: Multiple build failures and compatibility issues have been reported, including protobuf version conflicts, clang compiler incompatibility, missing CUDA libraries, and missing header files for XPU builds. These problems prevent successful compilation or runtime loading, affecting various platforms and configurations.  
[issues/160512, issues/160521, issues/160661, issues/160762]

Documentation and Usability Enhancements: There are requests to improve documentation accuracy and completeness, such as correcting file extension discrepancies, clarifying floating-point epsilon definitions, adding full import statements in examples, and providing comprehensive API documentation. These efforts aim to improve developer clarity and reduce confusion.  
[issues/160395, issues/160397, issues/160612, issues/160774]

Error Handling and Test Coverage Improvements: Issues highlight the need for better error handling through try-catch blocks and improved error messages, as well as increasing test coverage with unit and integration tests. These improvements are intended to enhance robustness, debugging ease, and code reliability.  
[issues/160775, issues/160818, issues/160819]

Quantization and Backend Support Limitations: Problems with quantized tensor operations such as the lack of torch.zeros_like() implementation for QuantizedCPU tensors and missing support for RISCV scalar backend have been identified. These limit functionality and hardware support in PyTorch.  
[issues/160171, issues/160630]

Code Quality and Naming Consistency: Minor code quality issues like typographical errors in variable names and non-idempotent build scripts have been reported, which affect maintainability and build reproducibility.  
[issues/160579, issues/160633]

User Experience and Beginner Accessibility: Proposals to create a simplified PyTorch abstraction layer and add a GUI for beginners aim to lower the barrier to entry by hiding complexity and providing interactive learning tools.  
[issues/160640, issues/160642]

Runtime and Metadata Bugs in TorchDynamo and Inductor: Bugs involving incorrect metadata stripping in TorchDynamo and runtime errors viewing ComplexFloat tensors in Inductor have been reported, causing failures in debugging and tensor operations.  
[issues/160471, issues/160663]

CI and Test Flakiness Issues: Random test failures due to timing issues and missing dependencies in nightly tests have been observed, leading to intermittent CI instability.  
[issues/160511, issues/160689]

Hardware Detection and Support Gaps: PyTorch compiled with CUDA 12.1 fails to detect the NVIDIA GeForce RTX 5090 GPU due to lack of architecture support, requiring newer CUDA versions for compatibility.  
[issues/160804]

Type Checking and Static Analysis Failures: Mypy type checking fails due to inheritance from a final class in type stubs, causing validation errors in development and CI environments.  
[issues/160650]

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 230
Key Open Pull Requests
1. [Debug1]New(rebased) modifiy setupvllm: This pull request proposes rebased modifications to the setup configuration of the vLLM module in the PyTorch project, including multiple commits focused on setup improvements, linter fixes, and added tests, but it has not yet been merged.

URL: pull/160627

Merged: No

Associated Commits: 9be74, 82fe9, 3cb29, 45665, 76db4, 934b4, 2f5e6, 2308c, 82e3d, 38cf2, 08be7, ebb81, 6fcac, ac1c9, 63b9d, 6fb76, 79e68, 9a0a5, 155b6, 50085, 93a0d, e83f0, ca21d, 9afcd, fcce9, d3ac9, 7b886, 150b4, 6231c, f1e5e, ade55, a27d7, be353, 14650, 44eca, 124f3, 7406d, ac5bd, 3cf15, 2c0de, 9fb92, 3bafa, 587ba, f9ae0, 62f78, 68abb, b31d5, 20022, 0cd59, ddd46, 8f19c, 9e5d8, 39966, a42cc, edf38, cab35, c845f, 76a8d

2. [VLLM]setup test cli logics: This pull request sets up the test CLI logic for VLLM by installing wheels from a previous build stage, dynamically generating and installing a VLLM test package list based on the Torch wheels present, and running tests according to a temporary predefined test plan for basic VLLM testing.

URL: pull/160361

Merged: No

Associated Commits: 80a66, b027b, 6b02c, 81d34, 537a7, 347f9, 5afb3, e80a5, 52f36, 0586b, 74033, 3c35f, c3176, 5d1c0, 45f23, 2f90d, 8a3f2, 67aad, effdb, eb4a2, 255b8, ba244, 3f5be, 20017, ec416, e227f, b56f5, 497de, 02618, 65e29, 44fea, 19c50, 43947, d9cba, bae14, 6d713, a704f, cd390, 84b71, b2366, fc107, ea0f0, d1f60, 73dae

3. [VLLM TEST]setup test workflow: This pull request proposes setting up a test workflow for the VLLM project within the PyTorch repository, as indicated by its title "[VLLM TEST]setup test workflow" and the series of update commits.

URL: pull/160583

Merged: No

Associated Commits: 08e63, ef164, 3d062, 6853d, e99b1, 036ce, 48d25, 42145, 35a97, 5b6fc, 1954e, 4dc99, 5d0e2, cae8a, 5a920, cd2ee, 9e0e5, ff64e, 3aa14, 23e89, 7f219, 2f3bb, 5bf16, 4cdb8, 63544, cf619, f1057, 115ea, ad3c6, 154cc, 71fdb, cd86d, 4b8e3, 99ab0

Other Open Pull Requests

Setup process enhancements for vLLM: Multiple pull requests focus on improving the setup process of vLLM by adding support for sm89 and ensuring stability through build workflows and continuous integration tests. These changes include adding test commits and modifying setup-related configurations to support new architectures.  
pull/160625, pull/160619

Testing infrastructure and PyTest migration: A comprehensive migration to PyTest is introduced, including new GitHub Actions workflows for CUDA and non-CUDA tests, tests without NumPy, and reorganized test directories. Updates also cover dependency management, workflow clarity, and artifact handling to improve test execution and maintainability.  
pull/160827

Full graph behavior modification in tests: Several pull requests wrap class definitions in set_fullgraph(False) within different test cases such as test_list, test_tuple, and test_iter to control or modify full graph behavior during testing. This approach aims to adjust test execution contexts for better test isolation or behavior control.  
pull/160277, pull/160278

Distributed computing utilities: Pull requests introduce utilities to debug RNG desynchronization across distributed ranks and to succinctly summarize ranges of ranks for improved readability and debugging in distributed contexts. These tools provide detailed logs and compact representations to aid developers working with distributed systems.  
pull/160283, pull/160284

CUDA graph and kernel refactoring: Multiple pull requests propose enhancements and refactors related to CUDA graph launching and kernel components, including dynamic pointer replacement for CUDA graph launches and renaming/refactoring CUDA kernel classes to CUTLASS equivalents. These changes aim to improve performance, scheduling infrastructure, and code clarity in CUDA-related components.  
pull/160351, pull/160679, pull/160686, pull/160687

New features and API additions: Pull requests add new features such as a Python API for F.linear_cross_entropy, support for an optional dim=None argument in torch.logsumexp, and symbolic support for the C++ channels_last_contiguous memory format. These enhancements expand functionality and improve usability of PyTorch APIs.  
pull/160319, pull/160585, pull/160402

Performance and vectorization improvements: A pull request adds a Vectorized<double> template specialization for the SVE128 architecture in Caffe2 to leverage architecture-specific vectorization capabilities and enhance performance.  
pull/160329

Logging and visualization improvements: Structured logging for execution order in the inductor backend is introduced via a new debug method emitting trace artifacts, alongside an option to add spacing in the _PipelineScheduleExecution visualizer to clarify dependency visualization. These changes improve debugging and visualization clarity.  
pull/160448, pull/160474

Bug fixes and test case corrections: Several pull requests address bugs and fix broken test cases, including input buffer accumulation issues related to undefined gradients, fixing XPU CI and Inductor unit tests, and correcting synchronization streams in FSDP with CPU offload. These fixes ensure stability and correctness in various components.  
pull/160371, pull/160403, pull/160481

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 180
Key Closed Pull Requests
1. [Rocm7.1_internal_testing] Remove triton git commit from triton repo : This pull request proposes removing the Triton git commit reference from the Triton repository in the ROCm 7.1 internal testing branch to clean up or update the submodule linkage, but it was not merged.

URL: pull/160524

Merged: No

Associated Commits: 2e0b1, 1f8eb, 8a7fd, 97f3d, 550bc, e7cb7, f61af, 0fd19, 167b4, 06da6, 0412e, 123a1, 2ee3a, a95ad, f070d, bef73, 95105, 0036d, 6c0d1, 0dd76, eb265, c20a8, 6894b, baf34, 51916, 3d6ba, 1a5a7, c113e, 78867, be308, ab8a9, cc13b, 63cbb, 5286c, 9d8f0, 79fa0, 1dea6, dec5b, 81e75, a771d, 2fbd2, 15f91, f7b26, 73cf3, 222ae, ec0c5, 45e1d, bb655, e4c1c, 45985, d37c4, 3a570, 46344, be4f8, befce, 1aa5d, aef0f, 5b344, dc726, b345d, bbae9, fa9fa, 9e184, 08da4, 0b79e, f1ad4, cf324, 13a86, 3057d, a0a9d, 8ffba, 80cca, 347ef, 944be, cc2a6, a7d3b, e9093, 5dd3d, afe8b, c3f75, 24dfd, c7f61, d9e68, 7435c, 2d567, a8d8a, fc804, ef94e, 0d083, a97f4, 89423, 4eaa5, 07077, 84fdb, 622c1, 345b9, 3d404, 5a29b, 54625, 1ce55, fc180, d36b5, 8e494, 377ae, d5c98, 97f8a, 1897a, 36e36, c375f, c1ee5, 23c08, ab538

2. New setupvllm: This pull request proposes a new setup for vllm in the PyTorch project, including multiple commits focused on setup tasks, but it was not merged.

URL: pull/160628

Merged: No

Associated Commits: 23a6a, 1fe4d, 62e99, 44a31, 7f438, 97210, 1f018, c70a7, f87b7, edc0d, 7565e, 1ee62, 5ed01, c164b, 1dccb, 539f7, 8d274, 1fc87, 75861, 5d660, 506e2, 0d9fd, 4a936, e38a9, 93f98, 15163, 84236, fa5a4, 73c86, 83799, 8f78e, 743ea, 4c753, e65a2, c919b, ffa4a, d71ee, a6819, d273b, bac46, 2f2f2, a47f0, a291d, 5bd2a, 987ba, 1df81, c4359, ebc12, d9f14, 53d66, 7f891, cbddc, a99b1, d621b, 9a093, 4925d

3. Fix get_free_symbol_uses for several nodes: This pull request aims to fix the get_free_symbol_uses function for several node types, including NopKernel, ConcarKernel, InputsKernel, and external kernels, by correctly detecting unbacked symbols especially in cases involving ComputedBuffer with NonOwningLayout to prevent incorrect node elimination and topological sorting errors.

URL: pull/160314

Merged: No

Associated Commits: f5143, fd442, f8134, 7ca97, 91dae, fd5f1, d38eb, 0a586, 9bfd9, 1286f, 1f62d, 78238, 6e19a

Other Closed Pull Requests

Python and Platform Support Enhancements: Several pull requests propose adding or improving support for specific platforms and Python versions in PyTorch. These include adding Python 3.14 support on macOS ARM64, enabling XPU support in autograd tests, and registering conv_transpose3d for MPS autocast to enhance mixed precision on Apple Silicon.

[pull/160593, pull/160309, pull/160349]

Dynamo Compile Logging Improvements: Multiple pull requests focus on enhancing the dynamo compile process by adding logging capabilities. These include logging dynamo graph node shapes and stack traces, along with updates to test utilities to ensure proper verification and debugging.

[pull/160556, pull/160348]

Concurrency and Thread Safety Fixes: A pull request addresses a concurrency issue in TorchScript's ErrorReport::CallStack by making the callstack vector thread-safe to prevent segfaults caused by cross-thread destructor calls, including a test to verify the fix.

[pull/160386]

IValue and Data Type Support: One pull request aims to add support for unsigned integers to the IValue type by refactoring the saving logic for int64 and uint64 values and includes a regression test to ensure correct dispatch of uint64 values.

[pull/160102]

Test Infrastructure and Framework Updates: Several pull requests propose improvements to testing infrastructure, including migrating a shell script test runner to a Python-based framework with error injection support, wrapping class definitions in set_fullgraph(False) context in multiple test modules, and moving some Windows unit tests to the slow test category to address CI timeouts.

[pull/160267, pull/160330, pull/160331, pull/160635]

Dependency and Packaging Updates: A pull request updates the nvshem dependency to version 3.3.20 for manylinux2_28 compatibility, packages the libnvshmem_host.so.3 library into a combined aarch64+CUDA wheel, and addresses related archive and URL pattern changes to fix a PyTorch issue.

[pull/160458]

Code Quality and Documentation Improvements: Pull requests aim to tidy and improve code quality in the torch/csrc/jit/passes/onnx directory using clang-tidy fixes and formatting, and enhance the README.md by fixing formatting, documentation errors, and ensuring consistent code block indentation for better readability.

[pull/160262, pull/160307]

Error Handling and Serialization Fixes: One pull request fixes an issue with pickling and unpickling the CppCompileError exception by adding a __reduce__ method to properly save and restore its attributes across subprocess boundaries.

[pull/160294]

Backend and API Additions: Pull requests propose adding an API to query GPU core count on Apple devices via IOKit and adding a function to retrieve the current device index in the torch::stable::accelerator module, although the latter was not merged.

[pull/160414, pull/160453]

Autoload and Integration Enhancements: A pull request proposes implementing an autoload mechanism for the OpenReg device backend to enable automatic discovery and initialization at runtime, simplifying integration and improving user experience.

[pull/160623]

Test Scheduling and Scaling Improvements: One pull request updates schedule tests to support a larger world_size of 4 by reorganizing tests and helper methods and adjusting initialization to address issues with very small gradients.

[pull/160559]

Type Annotation and Typing Updates: Pull requests propose adding or updating type annotations in registry.py and typing updates related to the torchxla backend, though these were not merged.

[pull/160367, pull/160368]

Miscellaneous Bug Fixes: A pull request addresses an issue by preventing the function _call_iter_tuple_list from being called with an argument of type UserDefinedObjectVariable.

[pull/159402]

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

yangw-dev
618
25
5
31

malfet
135
18
10
136

ezyang
91
20
2
53

anijain2305
128
8
0
13

guilhermeleobas
98
23
0
5

xuhancn
110
12
0
1

janeyx99
60
6
3
41

wconstab
38
8
2
58

ydwu4
72
20
1
8

guangyey
30
8
0
60

Don't miss what's next. Subscribe to Weekly Project News:

Contributor	Commits	Pull Requests	Issues	Comments
yangw-dev	618	25	5	31
malfet	135	18	10	136
ezyang	91	20	2	53
anijain2305	128	8	0	13
guilhermeleobas	98	23	0	5
xuhancn	110	12	0	1
janeyx99	60	6	3	41
wconstab	38	8	2	58
ydwu4	72	20	1	8
guangyey	30	8	0	60