Weekly GitHub Report for Pytorch: February 01, 2026 - February 08, 2026 (15:57:33)

foreach_copy

Weekly GitHub Report for Pytorch: February 01, 2026 - February 08, 2026 (15:57:33)

        Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.

Table of Contents

I. News
1.1. Recent Version Releases
1.2. Other Noteworthy Updates

II. Issues
2.1. Top 5 Active Issues
2.2. Top 5 Stale Issues
2.3. Open Issues
2.4. Closed Issues
2.5. Issue Discussion Insights

III. Pull Requests
3.1. Open Pull Requests
3.2. Closed Pull Requests
3.3. Pull Request Discussion Insights

IV. Contributors
4.1. Contributors

I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant enhancements including torch.compile support for Python 3.13, a new dynamic compilation control API torch.compiler.set_stance, and improved AOTInductor packaging and ABI compatibility. Notable highlights also include beta-level FP16 support on x86 CPUs, expanded Intel GPU support with simplified installation, and a backward-incompatible security improvement flipping the default of torch.load to weights_only=True, alongside numerous performance optimizations, bug fixes, and deprecations such as the discontinuation of official Conda packages.

II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted. 

[ONCALL: DISTRIBUTED] [ONCALL: PT2] [BOT-TRIAGED] [BOT-MISLABELED] torch.compile doesn't trace dist.all_reduce output correctly: This issue reports a bug where torch.compile incorrectly traces the output of dist.all_reduce when called with async_op=False, causing it to return a torch.Tensor instead of None as expected, which leads to an AttributeError when attempting to call .wait() on the result. The problem occurs because the compiled version treats the output as an asynchronous handle, conflicting with the eager behavior, and the discussion reveals that asynchronous operations currently cause graph breaks and are not fully supported in compiled mode, with suggestions to branch logic based on compilation state or use the functional collective API consistently.  

The comments clarify that asynchronous all_reduce is not supported with torch.compile due to graph breaks, explain how Dynamo rewrites the collective calls to return an AsyncCollectiveTensor, and discuss the handling of wait_tensor calls in different compilation backends, noting that some passes move wait_tensor closer to its consumer to optimize execution.
Number of comments this week: 7

[TRIAGE REVIEW] [MODULE: MEMORY FORMAT] [MODULE: CORRECTNESS (SILENT)] [MODULE: POOLING] [MODULE: NORMS AND NORMALIZATION] [MODULE: MPS] [BOT-TRIAGED] [MPS] BatchNorm2d/avg_pool2d produce wrong results for channels_last tensors with storage_offset > 0: This issue reports a bug where BatchNorm2d and avg_pool2d produce incorrect results on the MPS backend when tensors use the channels_last memory format and have a storage_offset greater than zero, indicating a problem with how the offset is handled in these operations. The problem does not occur when the tensor is cloned (which resets the offset), but persists when making the tensor contiguous in channels_last format, suggesting that the bug is related to kernel address calculations or graph caching that ignores the storage_offset.  

The comments discuss the likely cause being either incorrect handling of storage_offset in kernel address math or reuse of cached kernels without accounting for offset, with tests confirming that cloning fixes the issue while making the tensor contiguous does not; the conversation also briefly diverges into a lighthearted exchange about the nature of the diagnostic comments.
Number of comments this week: 6

[TRIAGE REVIEW] [MODULE: NN] [MODULE: CORRECTNESS (SILENT)] [MODULE: MPS] [BOT-TRIAGED] [MPS] Incorrect grid_sample outputs: NHWC kernel correctness + missing memory format in kernel caching keys: This issue reports correctness problems with the torch.nn.functional.grid_sample function on the MPS backend, identifying two separate bugs: one related to the NHWC kernel path and another involving missing memory format keys in kernel caching. The reporter provides a minimal reproducible example demonstrating these bugs and discusses ongoing efforts to isolate additional MPS kernel issues affecting other operations like BatchNorm2D.  

The comments focus on acknowledging the complexity of the problem, sharing a related pull request that fixes the primary grid_sample bug, and discussing plans to test and address further MPS kernel issues in more complex workflows, highlighting the iterative nature of debugging and fixing these backend problems.
Number of comments this week: 5

[ONCALL: DISTRIBUTED] [MODULE: SYMM_MEM] [BOT-TRIAGED] "CUDA driver error: invalid device ordinal" when calling symm_mem.rendezvous: This issue reports a runtime error "CUDA driver error: invalid device ordinal" encountered when calling symm_mem.rendezvous in a multi-GPU setup using PyTorch's symmetric memory API, despite the user verifying correct GPU ordinals and successful NCCL initialization. The error appears related to hardware or driver limitations, possibly involving the lack of NVLink connectivity between GPUs, which may cause CUDA to reject memory access permissions during the rendezvous operation.  

The comments include a reproduction attempt on a different environment that did not encounter the issue, a suggestion that the error might be due to RTX 2080 Ti hardware limitations, a user confirming symm_mem works on RTX 2080 Ti with NVLink, and a discussion pointing to the CUDA driver call failing due to missing hardware access paths, with a request to improve the error message for clarity.
Number of comments this week: 4

[HIGH PRIORITY] [TRIAGE REVIEW] [ONCALL: PT2] [MODULE: INDUCTOR] [inductor] A regression bug: argmax outputs wrong when working on transposed and mutated matrix: This issue reports a regression bug in PyTorch where the argmax function produces incorrect results when applied to a transposed and mutated matrix, specifically in the context of the inductor backend. The problem reoccurs in version 2.10.0 and a recent development build, with the original fix only addressing the CUDA backend, leaving the CPU backend still affected.  

The comments discuss confirming the original fix's limitation to the CUDA backend, provide a test case for CUDA, and highlight the need to extend the fix to the CPU backend, with contributors agreeing on the current status and next steps.
Number of comments this week: 4

2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible. 
As of our latest update, there are no stale issues for the project this week. 
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository. 
Issues Opened This Week: 80
Summarized Issues:

Precision and Mixed Precision Issues: Several issues report numerical inaccuracies and dtype mismatches related to precision handling in PyTorch. These include a significant numerical divergence in GPT2 Word Position Embedding between float32 and float16 on CUDA, Flex Attention failing with mixed precision due to float and BFloat16 dtype mismatch, and mixed precision training overflow causing NaNs in validation loss with pre-trained models like ResNet18.  
issues/174010, issues/174018, issues/174441

TorchInductor Backend Incorrectness and Crashes: Multiple issues describe incorrect results, assertion failures, or crashes when using the TorchInductor backend with torch.compile. Problems include incorrect argmax indices for boolean tensors, advanced indexing with duplicate indices producing wrong results, dropout on transposed tensors causing mismatches, copysign and remainder operations yielding inconsistent outputs, and crashes during lowering or functionalization fallback.  
issues/174069, issues/174074, issues/174078, issues/174172, issues/174174, issues/174176

Regression and Inconsistent Behavior in Compilation and Dynamo: Several issues report regressions or inconsistent behavior in torch.compile or Dynamo, including failures to handle overridden Python magic methods, incorrect tracing of dist.all_reduce outputs, and errors treating user-defined objects as constants during JIT compilation.  
issues/174050, issues/174128, issues/174280

Memory Leaks and Allocation Issues: There are reports of memory leaks and allocation problems, such as continuous memory growth when using torch.compile, GPU memory not being deallocated during profiling, and the NVIDIA DGX platform hanging on large unified memory allocations instead of raising OOM errors.  
issues/174468, issues/174273, issues/174358

Backend and Hardware Compatibility Problems: Issues include CUDA driver errors with invalid device ordinals in multi-GPU setups, lack of support for NVIDIA RTX 5050 GPUs causing kernel image errors, and memory access violations on AMD Instinct GPUs during FP16 triangular operations.  
issues/174030, issues/174284, issues/174282

Test Failures and Disabled Tests: Multiple tests are failing or disabled across various platforms and modules, including ROCm test failures for cross entropy loss, disabled sparse multiplication tests, and Triton integration test failures after updates.  
issues/174353, issues/174389, issues/174304, issues/174313

Autograd and Gradient Computation Bugs: Issues include custom autograd Functions incorrectly computing gradients during partial backward passes, backward pass errors in LayerNorm with dynamic shapes, and requests for fallback support for undefined higher-order backward passes in composite custom operations.  
issues/174017, issues/174379, issues/174165

Model Export and ONNX Issues: Problems arise when exporting models, such as Alpamayo with Qwen3-VL backbone failing due to unregistered active modes, and BiLSTM ONNX export failing with dynamic time dimension conflicts.  
issues/174063, issues/174272

Operator Fusion and Performance Debugging: There is a request for guidance on preventing operator fusion in TorchInductor to debug slow fused operators, indicating challenges in controlling fusion behavior for performance analysis.  
issues/174066

Numerical and Kernel Implementation Bugs on MPS Backend: Several issues report incorrect outputs or crashes on the MPS backend due to kernel bugs, including grid_sample producing wrong results, BatchNorm2d and avg_pool2d failing with channels_last tensors with storage offsets, and torch.abs overflowing or underflowing for complex inputs.  
issues/174339, issues/174345, issues/174246

Compilation and Runtime Errors with Python Constructs: Bugs include torch.compile crashing on Python try/except blocks due to unhandled AttributeErrors and failures when cloning tensors and applying as_strided with in-place additions causing assertion errors.  
issues/174166, issues/174371

Test Infrastructure and CI Issues: Problems include CI failures due to missing shared libraries, confusion caused by CI outage reports, and proposals to improve test reuse and coverage across hardware backends.  
issues/174220, issues/174486, issues/174469

Error Message Regression Testing Proposals: Multiple issues propose adding module_error_inputs_func for various modules like Linear, Embedding, CrossEntropyLoss, Conv2d, and MaxPool2d to enable regression testing of error messages for edge cases and ensure consistency.  
issues/174177, issues/174179, issues/174181, issues/174183, issues/174185

DTensor and Distributed Tensor Bugs: Issues include DTensor's squeeze_ operation updating metadata but not local tensor shape, and a proposed enhancement to layer_norm strategy to reduce communication overhead by decomposing computations for sharded tensors.  
issues/174136, issues/174276

Hash Map and Data Structure Bugs: A bug in flat_hash_map's find() and emplace() functions causes iterator overruns and crashes due to an off-by-one error, requiring a patch to terminate iteration properly.  
issues/174230

Sorting and Output Order Inconsistencies: The aten.sort function with stable=None produces inconsistent output orders between eager and inductor backends on CUDA, breaking execution consistency.  
issues/174459

Profiling and Memory Tracking Enhancements: Proposals include adding manual synchronization APIs to CUDACachingAllocator to track physical memory usage accurately and improving torch.foreach_copy performance on CUDA by using cudaMemcpyBatchedAsync for mixed data types.  
issues/174227, issues/174496

Compilation Graph and Export Behavior Changes: Changes in torch._dynamo.export behavior from 2.9.1 to 2.10 affect modular FX graph generation and raise questions about stable APIs for preserving call_module calls and handling multiple FX graphs.  
issues/174481

Distributed and NCCL Backend Hangs: A hang occurs in the NCCL backend when using batched asynchronous send/receive with large tensors and many operations per batch in distributed setups.  
issues/174288

Custom Operation Dispatch and FakeTensorMode Bugs: FakeTensorMode fails to dispatch custom operations when an nn.Parameter subclass is used, causing NotImplemented errors despite valid registrations.  
issues/174274

LayerNorm Backward Pass Failures with Dynamic Shapes: Runtime errors occur in LayerNorm backward when using torch.compile with inductor on tensors with symbolic dynamic shapes, due to incorrect workspace slice indices.  
issues/174379

Boolean Operation Bugs in Inductor: Boolean operations involving .data assignments produce incorrect results in the inductor backend compared to eager execution, causing assertion failures.  
issues/174187

Error Message and Exception Handling in Dynamo Tracing: Dynamo tracing incorrectly reports tracing failures for the builtin repr operator instead of raising the expected ValueError, causing confusion between eager and compiled modes.  
issues/174138

Requests for API and Feature Enhancements: Requests include support for runtime dictionary lookups in ConstDictVariable to reduce recompilation, increasing NUM_THREADS for ARM OpenBLAS compilation, and discussions on Stable ABI support for certain C++ APIs.  
issues/174541, issues/174373, issues/174507

DataLoader Pin-Memory Deprecation Warnings: The DataLoader's pin-memory helper passes a deprecated device argument to Tensor.pin_memory(), causing warnings due to internal API changes where the device is now inferred implicitly.  
issues/174546

2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable. 
Issues Closed This Week: 58
Summarized Issues:

Segmentation Faults in CUDA Operators and AOT Loading: Multiple issues report segmentation faults occurring in CUDA-related operations, including torch._export.aot_load() crashing when loading compiled AOT artifacts with constants, torch.ops.aten.gru crashing with packed sequences, and torch.ops.aten.lstm.data crashing when batch_sizes tensor is not on CPU. These faults cause abrupt Python process crashes without clear diagnostic messages, indicating critical stability problems in CUDA operator handling.  
issues/172739, issues/173623, issues/173944

Test Disabling Due to Failures on XPU and ROCm Platforms: Several tests have been disabled due to consistent failures on the main branch affecting XPU and ROCm platforms, including test_skip_non_tf32 in SDPAPatternRewriterGpuTests and DynamicTests, test_triton_autotuning_cuda and test_triton_mutated_autotuning_cuda in AOTInductorTestABICompatibleGpu, and multiple SDPA and compile_preserves_metadata_cache tests on ROCm. These disables are temporary measures to maintain CI stability while fixes or skips are prepared.  
issues/173336, issues/173352, issues/173619, issues/173620, issues/173712, issues/173713, issues/173714, issues/173715, issues/173717

Compilation and Runtime Errors in Triton and Inductor Backends: Issues include a Triton kernel test failing with an IndexError due to argument index out of range, assertion errors in Triton-based flex decoding tests caused by LLVM verification failures, and Inductor backend errors such as a TypeError from incorrect CSE class instantiation and incorrect zero gradients in cumprod backward pass on CUDA. These problems affect kernel generation, autotuning, and model compilation stability.  
issues/173795, issues/174306, issues/174311, issues/174016, issues/174094

Wheel and Packaging Metadata Bugs: There are bugs related to wheel file metadata, including an invalid macOS platform tag cp313-cp313-macosx_110_0_arm64 in version 2.10.0 and a similar issue with the macOS arm64 wheel using macosx_110_0 instead of macosx_11_0. These cause compatibility tools to misidentify the platform and may lead to repeated reinstallations or broken installs.  
issues/173462, issues/174265

Runtime and Memory Errors in Distributed and CUDA Memory APIs: Issues include a RuntimeError triggered by setting TORCH_DISTRIBUTED_DEBUG=DETAIL when accessing backend.mem_allocator with NCCL, a RuntimeError "invalid device ordinal" when allocating symmetric memory across multiple GPUs, and a RuntimeError from torch.cuda.memory_snapshot() due to invalid mempool_id argument. These errors indicate problems in memory management and distributed backend debugging.  
issues/173538, issues/174029, issues/174044

Regression and Performance Issues in Model Training and Execution: A regression causes illegal memory accesses during Flash Attention backward pass after upgrading to 2.8.3, and upgrading from 2.8.x to 2.9.x causes a slowdown in the Qwen3-VL model's vision-conditioned forward pass due to Conv3d fallback. Additionally, a nightly build regression causes out-of-memory errors on 4x A100 GPUs for previously fitting training runs. These regressions impact model accuracy, speed, and memory usage.  
issues/173953, issues/174051, issues/174244

Documentation and Link Issues: Broken or incorrect documentation links have been reported, including a broken link to nested tensors documentation in PyTorch tutorials and a link in the PhotoTour dataset documentation redirecting to a gambling advertisement instead of the correct dataset URL. These issues affect user access to accurate resources.  
issues/174380, issues/174542

Build and Configuration Failures: Problems include no GPU targets being selected when building AOTriton due to incorrect environment variable formatting, and CUDA 12.6 binary builds failing due to unresolved symbols and linker errors related to linalg_eig_cusolver_xgeev. These issues prevent successful builds and deployments on certain platforms.  
issues/174068, issues/174281

Bugs in PyTorch Core Functions and APIs: Several bugs affect core PyTorch functions, such as a NameError in _wrap_values() due to undefined named_children, a bug in torch.tensordot documentation indexing, a bug in nn.LayerNorm producing NaNs on CPU with extreme float32 inputs, and a bug in PyTorch Dynamo's ONNX export causing bias name conflicts. These bugs impact correctness and usability of core APIs.  
issues/173879, issues/173924, issues/174011, issues/174042, issues/174133

CUDA and GPU Architecture Compatibility Issues: Issues include CUDA errors running Stable Diffusion on new NVIDIA GPUs due to missing kernel image support, AOTInductor generating PTX code targeting sm_120a but nvcc expecting sm_120 causing compilation failure, and ARMv8.1 LSE atomic instructions causing illegal instruction crashes on ARMv8.0 processors. These compatibility problems hinder usage on newer or specific hardware.  
issues/173991, issues/174161, issues/174344

PyTorch Compile and Autograd Integration Bugs: Bugs include torch.compile wrapping autograd.Function subclasses with an incompatible ApplyTemplate lacking setup_context, causing RuntimeError with torch.func transforms, and AOTAutograd warm caching failing due to unpickleable local functions causing backend compilation errors. These issues affect model compilation and autograd correctness.  
issues/174067, issues/174299

Test Failures and CI Infrastructure Issues: Failures include flaky CUDA memory pool tests due to improper reset of indicator variables, disabling of specific CUDA tests on ROCm after hipify v2 integration, and GitHub runner incidents causing delays and cancellations in PR merges and CI jobs. These issues impact test reliability and development workflow.  
issues/174392, issues/174404, issues/174119

Numerical Stability and Precision Discrepancies: A significant numerical deviation is reported in nn.Conv2d outputs comparing CUDA FP32 and CPU FP16, especially for 7x7 kernels, where CPU FP16 lacks stability causing error amplification beyond acceptable thresholds. This discrepancy may degrade model accuracy when using mixed precision across devices.  
issues/174089

Miscellaneous Bugs and Questions: Other issues include a race condition in the orgqr function for Metal Performance Shaders due to device-side barrier limitations, a crash in index.Tensor operation with empty indices caused by a failed assertion, and a question about the rationale behind accuracy threshold values in an fp8 ROCm CI test. These highlight diverse minor bugs and community inquiries.  
issues/173972, issues/173995, issues/174493

2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week. 

III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 243
Key Open Pull Requests
1. Fix InputObserver.infer_arguments with empty caches: This pull request refactors the InputObserver class to improve argument inference when caches are empty, significantly expands test coverage to handle optional and mixed arguments as well as dynamic input scenarios, and integrates pandas for enhanced discrepancy analysis between model outputs and ONNX exports.

URL: pull/174205

Associated Commits: 24b29, 2a250, 46eca, 2117e, abc2f, 7e2af, fc47e, ecff1, 31004, fea38, 32763, 80655, 20581, 78d6b, 3746c, 68385, 28643, 88d98, e4b5e, eef93, 7d94e, d971f

2. [FSDP2] enable more tests on CPU: This pull request enables additional unit tests to run on CPU for the FSDP2 feature in PyTorch, building on prior support introduced in earlier pull requests.

URL: pull/174048

Associated Commits: 2dc0a, 08218, 3fb46, 9444f, 2ab7f, cb274, 1d0dc, 987ab, ec941, a70f2, 17014, c0a64, 5633b, 329e8

3. DStorage for DTensor/DParam: This pull request introduces DStorage functionality for DTensor and DParam in PyTorch, enabling model parameters to be viewed and managed as a unified byte storage representation.

URL: pull/174267

Associated Commits: e8826, 52b7d, cd520, 484cb, 101eb, 55905, 18d1c, 04bc9, 08689, 76be5, 5a492, f30b6, e4fa6, c23af

Other Open Pull Requests

Checkpoint module refactoring and enhancements: Multiple pull requests improve the torch.utils.checkpoint module by refactoring internal implementations and adding new parameters. These changes simplify checkpointing by using SavedTensor objects directly and introduce an explicit device_type parameter to optimize device handling during checkpointing.  
[pull/174327, pull/174328, pull/174333]

Gradient computation fixes in autograd: A pull request fixes the behavior of ctx.needs_input_grad in custom autograd functions to dynamically update gradient requirements during partial backward passes. This enables more efficient gradient computation, particularly benefiting zero-bubble pipeline parallelism scenarios.  
[pull/174079]

Serialization improvements: One pull request changes the serialization approach in serialize.py by replacing comma-based serialization with a dictionary-based method. This update saves serialized results as strings and deserializes them back to appropriate data types without altering function inputs or outputs.  
[pull/174170]

CI and build infrastructure updates: Several pull requests enhance continuous integration and build workflows by adding Pallas TPU CI testing with private repository access, switching ROCm nightly builds to more reliable gfx942 runners, and adding a dedicated CI job for CPython tests on Python 3.13 to improve test stability.  
[pull/174201, pull/174290, pull/174414]

Backend and device support migrations: Multiple pull requests migrate functionality and tests to support new hardware backends, including moving grid_sampler_2d to Metal, adapting unit tests for Intel GPU support, and adding frontend Python APIs for XPUGraph to improve capture and replay on XPU devices.  
[pull/174343, pull/174370, pull/174046]

Documentation restructuring: One pull request reorganizes the C++ documentation by modularizing API files and removing exhale in favor of Doxygen and breathe. This restructuring drastically reduces build time from 5.5 hours to about one minute and eliminates thousands of nearly empty pages.  
[pull/174096]

DTensor and JIT kernel enhancements: Pull requests add an OpInfo test suite for fullgraph compilation of DTensor operations and extend JIT-compiled CUDA kernels to support uint16, uint32, and uint64 scalar types. These changes address compilation verification and fix crashes in torch.special functions using unsigned integers on CUDA.  
[pull/174142, pull/174303]

Torch function mode dispatch control: A pull request introduces a mechanism to skip torch function mode dispatch for a single call while keeping the mode active for subsequent operations. This is implemented via a skip_one_hop TLS flag and a new context manager, enabling functions like backward() to bypass mode dispatch internally but maintain mode activity overall.  
[pull/174098]

Test assertion removals: A series of stacked pull requests focus on removing assert statements from various test files across the PyTorch codebase. These changes aim to improve test code quality and maintainability by eliminating redundant or outdated assertions in top-level, fx, and distributed test directories.  
[pull/174255, pull/174256, pull/174257, pull/174258, pull/174259, pull/174260, pull/174261, pull/174262, pull/174263]

3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 284
Key Closed Pull Requests
1. [release/2.7] Enable ROCm for linalg ops - cholesky, lstsq and gels: This pull request is a cherry-pick for release/2.7 that enables the use of the hipSolver backend instead of the previously default magma backend for ROCm in linear algebra operations such as cholesky, lstsq, and gels, allowing users to select the cusolver backend for potentially improved performance.

URL: pull/174129

Associated Commits: 79126, 5416d, 65695, c2cca, 8b6bc, 3b61d, 06c6a, 28ca4, 1cc51, a6321, 35f1e, 3f236, ef2b1, 89490, c7ff7, 0c236, 07391, 13417, e294d, 04c7c, 414fc, ff69f, c0fde, 41d64, f31bd, ae842, 4e346, ef226, 40f0d, a80d3, 06077, cced6, de84f, 54d00, 39a79, 8d7ae, 7010d, ad7a2, d458c, 6fd40, 030d6, 7fe67, 62f12, 943cc, 39c25, 55ef4, 47c27, 5df0d, ee96e, cd603, 2dc4b, 24b0c, 99847, cd885, 20d62, dab81, 27e9c, 823b1, 1f8a9, 3f73e, f717b, 1a316, f7721, dec2e, 90091, fa982, 3bfe0, c1423, 800aa, 378a5, 3cddd, 9ebc6, 5beaf, 8af99, f001a, 6e62a, 1cb81, b8d92, 0d98f, ab54c, 70518, 9030e, 2c220, 03c7d, 1fee1, 1d1c7, 92d32, 0073e, 6f2f4, a1599, bdec1, e8f8a, ff4dd, 4c731, 02cee, 4a815, 92d6d, 1ae99, 306ba, 769d5, 94173, e0afc, 83133, 7a876, bbd01, 38f2b, 62ea9, a9d0d, 790cc, 12141, 77a7b, e2d14, f0c1c, 4c858, 5ebff, 17364, 2337d, bf007, ba48d, d17e2, 189aa, e867a, 5631e, 83049, d62a3, 68990, 8a12d, 3fc00, c7ce5, 197c9, b5d59, 9412d, c17ce, 34f3b, 13520, 06c10, 49675, d598f, 8e450, 575e2, 7edf5, c7a1e, 66726, 2a215, 0bd40, df38c, 7a768, 509a6, a4d60, 6fba5, dce73, 4d586, e725e, 7f01c, 4c00e, 866cc, 9434e, b6228, f86d1, 3ea89, b2571, fc756, 22c98, cd0f7, fe3d3, 30508, f07b7, 6b52d, 35dae, d5542, 60111, a929f, be95f, 1cd45, f0534, f0aeb, 6c845, 44c0e, faae1, 6fd45, 5e2f3, a46fe, 699f4, 19431, 5cd45, 39916, 1dfb2, f3ff1, b6098, 359ee, 56383, 55f04, aab10, 2975a, e1c87, 69f40, d9382, 9db1b, f6616, b0c5b, 698b5, 07f41, 56b79, 8d1a0, 1781e, 1f24f, c00d4, 10cbf, 26531, cbf75, 59925, eb99f, 85f25, e3cca, 9e206, b3f84, c02c4, 8d426, 99ccf, 167f7, dcd8e, a033d, 9015d, e8c4b, c2114, 018e5, 975f6, b8b81, 6110a, de2e5, fcbe2, 652c9, 40012, 4b0d5, 63052, a7b6c, 130d9, e3112, 9dc91, 175d5, 7de12, 94afe, 88375, 65632, 5925e, 132ce, bd94a, 262e5, 953e8, 1dc03

Associated Commits: 79126, 5416d, 65695, c2cca, 8b6bc, 3b61d, 06c6a, 28ca4, 1cc51, a6321, 35f1e, 3f236, ef2b1, 89490, c7ff7, 0c236, 07391, 13417, e294d, 04c7c, 414fc, ff69f, c0fde, 41d64, f31bd, ae842, 4e346, ef226, 40f0d, a80d3, 06077, cced6, de84f, 54d00, 39a79, 8d7ae, 7010d, ad7a2, d458c, 6fd40, 030d6, 7fe67, 62f12, 943cc, 39c25, 55ef4, 47c27, 5df0d, ee96e, cd603, 2dc4b, 24b0c, 99847, cd885, 20d62, dab81, 27e9c, 823b1, 1f8a9, 3f73e, f717b, 1a316, f7721, dec2e, 90091, fa982, 3bfe0, c1423, 800aa, 378a5, 3cddd, 9ebc6, 5beaf, 8af99, f001a, 6e62a, 1cb81, b8d92, 0d98f, ab54c, 70518, 9030e, 2c220, 03c7d, 1fee1, 1d1c7, 92d32, 0073e, 6f2f4, a1599, bdec1, e8f8a, ff4dd, 4c731, 02cee, 4a815, 92d6d, 1ae99, 306ba, 769d5, 94173, e0afc, 83133, 7a876, bbd01, 38f2b, 62ea9, a9d0d, 790cc, 12141, 77a7b, e2d14, f0c1c, 4c858, 5ebff, 17364, 2337d, bf007, ba48d, d17e2, 189aa, e867a, 5631e, 83049, d62a3, 68990, 8a12d, 3fc00, c7ce5, 197c9, b5d59, 9412d, c17ce, 34f3b, 13520, 06c10, 49675, d598f, 8e450, 575e2, 7edf5, c7a1e, 66726, 2a215, 0bd40, df38c, 7a768, 509a6, a4d60, 6fba5, dce73, 4d586, e725e, 7f01c, 4c00e, 866cc, 9434e, b6228, f86d1, 3ea89, b2571, fc756, 22c98, cd0f7, fe3d3, 30508, f07b7, 6b52d, 35dae, d5542, 60111, a929f, be95f, 1cd45, f0534, f0aeb, 6c845, 44c0e, faae1, 6fd45, 5e2f3, a46fe, 699f4, 19431, 5cd45, 39916, 1dfb2, f3ff1, b6098, 359ee, 56383, 55f04, aab10, 2975a, e1c87, 69f40, d9382, 9db1b, f6616, b0c5b, 698b5, 07f41, 56b79, 8d1a0, 1781e, 1f24f, c00d4, 10cbf, 26531, cbf75, 59925, eb99f, 85f25, e3cca, 9e206, b3f84, c02c4, 8d426, 99ccf, 167f7, dcd8e, a033d, 9015d, e8c4b, c2114, 018e5, 975f6, b8b81, 6110a, de2e5, fcbe2, 652c9, 40012, 4b0d5, 63052, a7b6c, 130d9, e3112, 9dc91, 175d5, 7de12, 94afe, 88375, 65632, 5925e, 132ce, bd94a, 262e5, 953e8, 1dc03

2. [reland][ROCm] remove caffe2 from hipify: This pull request relands a previous attempt to remove caffe2 from the hipify tool in the ROCm project by eliminating all "MasqueradingAsCUDA" files and classes and avoiding renaming "CUDA" classes to "HIP," addressing infrastructure issues and incorporating multiple fixes, updates, and mapping improvements to ensure compatibility and build stability.

URL: pull/172796

Associated Commits: 4cb19, 5d694, c5a07, 7d7e3, c6f2c, e538b, f0fca, c55ca, 52f55, 7c7be, 6c7cb, 0f0a5, 43be9, fced1, 1bd02, 9cd91, b25bf, 84388, d21a7, fa4ea, bfc83, 3f702, dd3ca, 9b608, 03a40, be812, e7838, 9cf12, 3c4c1, b09bb, c3f73, ee1f7, 71e55, dca58, 5a319, b8641, 6b41f, e23ec, 1970c, 1bfb1, 11e1c, 64210, 7dcbb, 7d2af, e3652, a7c55, 3d3e4, 82a60, 6b226, 47962, f48f0, e259e, 7b449, a15de, 5a66c, 76c18, 64325, 6e55c, 8e093, 7a60c, d34f4, 68086, 61db1

Associated Commits: 4cb19, 5d694, c5a07, 7d7e3, c6f2c, e538b, f0fca, c55ca, 52f55, 7c7be, 6c7cb, 0f0a5, 43be9, fced1, 1bd02, 9cd91, b25bf, 84388, d21a7, fa4ea, bfc83, 3f702, dd3ca, 9b608, 03a40, be812, e7838, 9cf12, 3c4c1, b09bb, c3f73, ee1f7, 71e55, dca58, 5a319, b8641, 6b41f, e23ec, 1970c, 1bfb1, 11e1c, 64210, 7dcbb, 7d2af, e3652, a7c55, 3d3e4, 82a60, 6b226, 47962, f48f0, e259e, 7b449, a15de, 5a66c, 76c18, 64325, 6e55c, 8e093, 7a60c, d34f4, 68086, 61db1

3. Handle List/Dict Comprehension Graph Breaks for Python3.12+: This pull request addresses the changes in Python 3.12+ where list and dict comprehensions are inlined into their surrounding functions by enhancing Dynamo's tracing mechanism to handle graph breaks within comprehensions more precisely, implementing bytecode analysis and checkpointing to skip only the comprehension-related code rather than the entire function, and covering numerous edge cases including nested comprehensions, side effects, and variable scope mutations, with new tests added to ensure correctness.

URL: pull/173558

Associated Commits: 56724, c55d7, 5757c, a0eb3, 084e5, 13b84, 9a755, 7ac14, 48d2c, 98374, 9a9db, b31da, dd72d, b4f59, 74fe6, 6c567, 63fd0, a4d89, 8547f, 2129b, c4b09, 4ce41, 3a27d, 8895e, ba25a, d3811, 6cb43, 00a98, 2ee74, 960e9, 1342d, ee671, cf5bd, 411d5, 89182, 3909a, 2b02f, d5fdd, d3015, cb56d, a4524, 8cc33, edc3f, 29785

Associated Commits: 56724, c55d7, 5757c, a0eb3, 084e5, 13b84, 9a755, 7ac14, 48d2c, 98374, 9a9db, b31da, dd72d, b4f59, 74fe6, 6c567, 63fd0, a4d89, 8547f, 2129b, c4b09, 4ce41, 3a27d, 8895e, ba25a, d3811, 6cb43, 00a98, 2ee74, 960e9, 1342d, ee671, cf5bd, 411d5, 89182, 3909a, 2b02f, d5fdd, d3015, cb56d, a4524, 8cc33, edc3f, 29785

Other Closed Pull Requests

Dynamo VariableTracker Construction Consolidation: Multiple pull requests focus on consolidating the construction of VariableTracker instances in various PyTorch Dynamo modules by routing direct variable creation through centralized builders like SourcelessBuilder.create() or VariableBuilder. These changes address import circularity issues and implement the first step of a related issue to improve code structure and maintainability.  
pull/173439, pull/173441, pull/173442, pull/173449, pull/173450, pull/173451, pull/173458

Dynamo Profiler Enhancements: Pull requests introduce a Dynamo-native profiler operating at the tracing layer to measure time spent tracing Python functions, improving user-level visibility beyond cProfile. Additional improvements include recording generator frames during profiling to enhance accuracy and completeness of performance data.  
pull/173942, pull/174440

ROCm Backend Fixes and Improvements: Several pull requests address ROCm-specific issues including fixing unit tests by extending skips and correcting grid value expectations, removing caffe2 dependency from hipify, and attempting fixes for ROCm forward issues with related bug cleanups. These changes improve ROCm compatibility and infrastructure stability.  
pull/172780, pull/174087, pull/174388

Dynamic Shape Support in Linear Algebra Operations: A pull request fixes dimension-dependent errors in 18 linear algebra operations by replacing direct dimension comparisons with runtime checks and handling unbacked symbolic dimensions properly. This enables these operations to support dynamic shapes effectively.  
pull/173399

DTensor Single-Dimension Pointwise Operation Rule: One pull request completes the implementation of the single-dimension pointwise operation rule within the DTensor component, advancing DTensor functionality.  
pull/172278

ONNX Export Support for Higher Order Operators: A pull request implements ONNX export support for torch.ops.higher_order.invoke_subgraph, preserving functions created by nested compilation as separate entities in the ONNX graph. It notes that further updates to the onnxscript optimizer and version converter are needed to fully prevent inlining.  
pull/174283

XPU Build Lazy Dependency on Intel Level Zero: A pull request implements a lazy dependency on the Intel Level Zero library for the XPU build by linking against a stub that defers loading until runtime. This prevents failures on CPU-only machines lacking libze_loader.so and enforces API calls through an indirection layer.  
pull/173497

Dynamo Cache and Performance Optimizations: Pull requests propose alternative implementations to cache attribute source construction and simplify the variable tracker cache by caching only on the Source object. These changes reduce redundant calls and improve compile time performance in Dynamo.  
pull/174020, [pull/174242](https://github.com/pytorch/pytorch/pull/174242]

Pallas TPU CI Security Enhancement: A follow-up pull request adds checksum verification for the Bazelisk download in the Pallas TPU continuous integration setup to enhance security and integrity.  
pull/174531

Torch Stable ABI Enhancements: A pull request adds deleter support to torch::stable::from_blob by introducing a new function and necessary scaffolding to facilitate a clean port of TorchCodec to the stable ABI.  
pull/173371

Automated PR Review Skill Implementation: One pull request proposes the initial implementation of a "Claude review skill" to improve automated pull request reviews with a balance of specific examples and general guidance, outlining future enhancements for compatibility and validation.  
pull/174419

Inductor and Triton BlockPatternMatch Improvements: A pull request improves BlockPatternMatch by preventing premature expansion of expressions, removing precomputed sizes for dynamic shapes, defining non-negativity for FloorDiv, and fixing a long-running test related to low memory max pooling.  
pull/173374

Function Saved Tensors Clearing Option: A pull request introduces an option to clear saved_tensors in a Function upon access, addressing a specific issue and including related commits for documentation and test skipping.  
pull/173833

Unit Test Fix for XPU Inductor: A pull request fixes the unit test SDPAPatternRewriterGpuDynamicTests.test_skip_non_tf32 within the [xpu][fix][inductor] scope.  
pull/173343

3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment. 
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week. 

IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month. 
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor
Commits
Pull Requests
Issues
Comments

albanD
155
27
1
8

malfet
113
15
1
48

wconstab
156
13
0
8

pianpwk
161
11
0
2

ydwu4
156
15
0
1

laithsakka
145
18
0
0

NikhilAPatel
136
0
0
0

anijain2305
107
13
0
13

BenjaminDEMAILLE
128
0
0
0

kurtamohler
117
2
2
6

Access Last Week's Newsletter:  

Link

                            Don't miss what's next. Subscribe to Weekly Project News:

                        https://github.com/owner/public_repo (required)

            Email address (required)

Contributor	Commits	Pull Requests	Issues	Comments
albanD	155	27	1	8
malfet	113	15	1	48
wconstab	156	13	0	8
pianpwk	161	11	0	2
ydwu4	156	15	0	1
laithsakka	145	18	0	0
NikhilAPatel	136	0	0	0
anijain2305	107	13	0	13
BenjaminDEMAILLE	128	0	0	0
kurtamohler	117	2	2	6