Weekly Project News

Archives

Weekly GitHub Report for Pytorch: April 27, 2026 - May 04, 2026 (14:32:45)

Weekly GitHub Report for Pytorch

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

The current version of this repository is v2.6.0

1.2 Version Information:

Released on January 29, 2025, PyTorch 2.6 introduces significant enhancements including torch.compile support for Python 3.13, a new dynamic compilation control API torch.compiler.set_stance, and improved AOTInductor packaging and ABI compatibility. Notable highlights also include FP16 support on x86 CPUs, expanded Intel GPU support, FlexAttention for x86 CPUs targeting LLMs, and a backward-incompatible security improvement flipping the default of torch.load to weights_only=True, alongside the deprecation of official Conda package publishing.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

As of our latest update, there are no active issues with ongoing comments this week.

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

As of our latest update, there are no stale issues for the project this week.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 0

Summarized Issues:

As of our latest update, there are no open issues for the project this week.

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 2

Summarized Issues:

  • ROCm CI Test Failures and Instability: The ROCm trunk distributed tests are timing out due to issues with rocshmem tests, which has led to their temporary disabling while a fix is being developed. Additionally, ROCm trunk CI jobs have become unstable because test failures were initially masked by a Kineto submodule update, but subsequent updates re-exposed these failures, causing ROCm jobs to be marked as unstable as a fix is pursued.
  • [issues/178884, issues/179911]

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 0

As of our latest update, there are no open pull requests for the project this week.

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 16

Key Closed Pull Requests

1. [overlap] pre-bucketing of fsdp collectives: This pull request introduces a pre-bucketing strategy for Fully Sharded Data Parallel (FSDP) collectives in the overlap scheduling algorithm to improve bucketing efficiency by calibrating bucket sizes based on bandwidth and latency, enabling reliable detection of FSDP collectives even with irregular patterns from autoparallel shardings, and includes configurable parameters and thorough testing to ensure correctness.

  • URL: pull/179935
  • Associated Commits: da2ee, 5d768, a8c13, 0a3a8, 4e8f7, a873b, 96a54
  • Associated Commits: da2ee, 5d768, a8c13, 0a3a8, 4e8f7, a873b, 96a54

2. [ROCm] - Reduce generated CK kernel files and build by default: This pull request updates the ROCm build configuration to enable the CK kernel build by default while implementing various filters to reduce the number of generated CK kernel files, optimizing the build process.

  • URL: pull/178310
  • Associated Commits: 4c736, 98f13, b25f3, c0d33
  • Associated Commits: 4c736, 98f13, b25f3, c0d33

3. torch.backends.fp32_precision setter propagate to cudnn.conv/rnn: This pull request addresses the issue where the torch.backends.fp32_precision setter did not propagate to cudnn.conv and cudnn.rnn modules by implementing a default handling mechanism, adding try-except blocks to prevent runtime errors, and providing a workaround suggestion.

  • URL: pull/179750
  • Associated Commits: afae0, 51237, 2b2aa, 82f17
  • Associated Commits: afae0, 51237, 2b2aa, 82f17

Other Closed Pull Requests

  • Cache isolation in PyTorch Dynamo: This pull request introduces a region_id flag to cache entries in PyTorch's Dynamo compiler, enabling multiple torch.compile() calls targeting the same function to maintain separate, region-specific caches. This prevents cross-region interference in cache lookups, recompile limits, and execution strategies while still allowing shared profile-guided optimizations across regions.
    • pull/178351
  • Iterator protocol implementation: A generic_iternext function is added to object_protocol.py implementing CPython's PyIter_Next semantics, including checks for the tp_iternext slot and dispatching to iternext_impl on VariableTracker subclasses. This provides a standardized iterator next method and raises a TypeError if the object is not an iterator.
    • pull/178561
  • Pipeline RECV deferral on AMD ROCm: This pull request adds a configurable flag to defer pipeline RECV operations on platforms like AMD ROCm, postponing RECVs until just before the compute operations that consume their data. This eliminates pipeline bubbles and avoids deadlocks by using a rank-parity peer-to-peer ordering strategy.
    • pull/178815
  • Distributed and inductor test configurations for ROCm CI: Distributed tests with 3 shards on 4-GPU runners and inductor tests with 2 shards on single-GPU runners are added to the rocm-nightly continuous integration workflow. This setup mirrors existing periodic workflows to improve test coverage on ROCm platforms.
    • pull/179628
  • Handling stale CUDA streams in autograd during CUDA graph capture: The pull request introduces a default behavior that raises a clear RuntimeError when autograd nodes hold stale references to the default CUDA stream during graph capture. It also adds an opt-in override flag that redirects stale non-capturing streams to the producer’s capturing stream to prevent crashes and enable successful graph capture.
    • pull/180090
  • Source file mirroring moved from setup.py to CMake: Source file mirroring is moved from the setup.py script to CMake by copying source files into specific directories included in the wheel. This removes the previous mirroring logic from setup.py and handles file mirroring through cmake/FileMirroring.cmake for both scikit-build-core and setuptools builds.
    • pull/177642
  • Package data installation support for scikit-build-core migration: A new cmake/PackageData.cmake file is added to support installing non-Python package data into the wheel, replacing setup.py's package_data configuration. It also introduces a fallback for the unused SKBUILD_PLATLIB_DIR to ensure the module remains self-contained.
    • pull/177643
  • Fixes for PyTorch Inductor compiler CUDA issues: Two bugs are fixed: a device string comparison error causing the e8m0_rceil_log2 pattern to fail on CUDA devices, and incorrect uint8 outputs from ceil(log2(...)) pipelines on pre-SM100 GPUs. The faulty PTX instruction is replaced with an exact IEEE 754 bit-manipulation method to ensure correct behavior across all CUDA hardware.
    • pull/178698
  • Pluggable MaterializeFn hook in StorageImpl: A pluggable MaterializeFn function pointer hook is introduced to replace the hard-coded copy-on-write materialization logic in StorageImpl. This enables any backend to intercept write-path data pointer access in a general and extensible way while preserving existing copy-on-write semantics and improving modularity.
    • pull/179063
  • Enhanced PyObject_GetItem dispatch in vt_getitem: The second branch of CPython's PyObject_GetItem dispatch is implemented by adding a sq_item branch and replacing the Python-level hasattr(key_type, "__index__") check with a more efficient C-level slot detection using has_slot(NB_INDEX). This enables support for types that implement sq_item without mp_subscript, such as collections.deque.
    • pull/179251
  • AOTInductor fallback ops support for grid sampler operations: C-shim support is added for aten.grid_sampler_3d, aten.grid_sampler_3d_backward, aten.cudnn_grid_sampler, and aten.cudnn_grid_sampler_backward operations to the AOTInductor fallback ops. This prevents fallback to the proxy executor and enables correct execution through torch.compile(backend='inductor') on CPU.
    • pull/179440
  • Fix CUDA IPC deserialization mismatch: The pull request ensures that the handle type used in ExpandableSegment::share() is properly communicated and recognized during deserialization in ExpandableSegment::fromShared(). This prevents errors caused by uninitialized handle type states in newly spawned consumer processes.
    • pull/179618
  • Pure-Python polyfill for unittest.TestCase.assertRaisesRegex in TorchDynamo: A pure-Python polyfill is added to bypass untraceable C-extension calls during regex assertions in TorchDynamo. This enables successful tracing of standard CPython 3.13 tests like test_keyword_args by redirecting assertRaisesRegex to the traceable assertRaises while preserving the core test intent.
    • pull/179928

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
bobrenjc93 201 0 0 0
anijain2305 163 0 0 0
huydhn 65 0 0 0
malfet 53 0 0 0
yushangdi 50 0 0 0
aorenste 46 0 0 0
daisyden 43 0 0 0
weifengpy 33 1 0 0
colesbury 31 1 0 0
fxdawnn 27 1 0 0

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.