Weekly Project News

Archives

Weekly GitHub Report for Pytorch: April 25, 2026 - May 02, 2026 (19:18:20)

Weekly GitHub Report for Pytorch

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

The current version of this repository is v2.6.0

1.2 Version Information:

Released on January 29, 2025, PyTorch 2.6 introduces significant enhancements including torch.compile support for Python 3.13, a new dynamic compilation control API torch.compiler.set_stance, and improved AOTInductor packaging and ABI compatibility. Notable highlights also include beta-level FP16 support on X86 CPUs, expanded Intel GPU support with simplified installation and Windows binaries, and a backward-incompatible security improvement flipping the default weights_only parameter in torch.load; additionally, PyTorch has deprecated its official Anaconda channel and updated Linux binaries to use Manylinux 2.28 with CXX11_ABI=1.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

As of our latest update, there are no active issues with ongoing comments this week.

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

As of our latest update, there are no stale issues for the project this week.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 0

Summarized Issues:

As of our latest update, there are no open issues for the project this week.

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 2

Summarized Issues:

  • ROCm CI Test Failures and Instability: The ROCm trunk distributed tests are timing out due to issues with rocshmem tests, which has led to their temporary disabling while a fix is being developed. Additionally, ROCm trunk CI jobs have become unstable because of test failures that were initially hidden by a Kineto submodule update, but reappeared after the workaround was removed, causing ongoing instability in ROCm jobs.
  • issues/178884, issues/179911

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 0

As of our latest update, there are no open pull requests for the project this week.

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 16

Key Closed Pull Requests

1. [overlap] pre-bucketing of fsdp collectives: This pull request introduces a pre-bucketing strategy for Fully Sharded Data Parallel (FSDP) collectives in the overlap scheduling algorithm to improve bucketing efficiency by calibrating bucket sizes based on process group bandwidth and latency, enabling reliable detection of FSDP collectives even with irregular patterns from Autoparallel shardings, and includes testing to ensure correctness.

  • URL: pull/179935
  • Associated Commits: da2ee, 5d768, a8c13, 0a3a8, 4e8f7, a873b, 96a54
  • Associated Commits: da2ee, 5d768, a8c13, 0a3a8, 4e8f7, a873b, 96a54

2. [ROCm] - Reduce generated CK kernel files and build by default: This pull request updates the ROCm build configuration to enable the CK kernel build by default while implementing various filters to reduce the number of generated CK kernel files, optimizing the build process.

  • URL: pull/178310
  • Associated Commits: 4c736, 98f13, b25f3, c0d33
  • Associated Commits: 4c736, 98f13, b25f3, c0d33

3. torch.backends.fp32_precision setter propagate to cudnn.conv/rnn: This pull request addresses the issue where the torch.backends.fp32_precision setter did not propagate to cudnn.conv and cudnn.rnn modules by implementing a default handling mechanism, adding try-except blocks to prevent runtime errors, and providing a workaround suggestion to a collaborator.

  • URL: pull/179750
  • Associated Commits: afae0, 51237, 2b2aa, 82f17
  • Associated Commits: afae0, 51237, 2b2aa, 82f17

Other Closed Pull Requests

  • Cache isolation in PyTorch Dynamo: This pull request introduces an isolated cache mechanism by adding a region_id flag to cache entries, allowing multiple torch.compile() calls on the same function to maintain separate caches. This prevents interference in cache lookups, compilation limits, and execution strategies while still sharing profile-guided optimizations across regions.
    • pull/178351
  • Iterator protocol enhancements: This pull request adds a generic_iternext function implementing CPython's PyIter_Next semantics and introduces iternext_impl as the override point for the tp_iternext slot on VariableTracker subclasses. It also maintains next_variable as a public wrapper delegating to iternext_impl.
    • pull/178561
  • Pipeline RECV deferral on AMD ROCm: This pull request adds a configurable flag to defer pipeline RECV operations on platforms like AMD ROCm, postponing RECVs until just before compute operations consume their data. This eliminates pipeline bubbles and avoids deadlocks using a rank-parity peer-to-peer ordering strategy.
    • pull/178815
  • ROCm CI workflow improvements: This pull request adds distributed and inductor test configurations to the rocm-nightly CI workflow, enabling distributed tests with 3 shards on 4-GPU runners and inductor tests with 2 shards on single-GPU runners. These additions mirror existing periodic workflows for ROCm.
    • pull/179628
  • CI jobs upgrade to CUDA 13.0: This pull request migrates all continuous integration jobs from CUDA version 12.8 to 13.0, reflecting the update to 13.0 as the stable release for the PyTorch 2.11 branch.
    • pull/180052
  • CUDA graph capture stale stream detection: This pull request introduces detection logic for autograd nodes holding stale references to non-capturing CUDA streams during CUDA graph capture, raising a clear RuntimeError for default-stream stale references. It also adds an opt-in override flag that redirects stale non-capturing streams to the capturing stream, preventing opaque CUDA errors and enabling correct gradient computation.
    • pull/180090
  • Build process migration to CMake: These pull requests move pre-build steps such as git submodule initialization and NCCL checkout, as well as source file mirroring, from setup.py to CMake. This ensures safe state checking and unconditional inclusion of these steps, improving build reliability and maintainability.
    • pull/177641, pull/177642
  • PyTorch Inductor CUDA device fixes: This pull request fixes bugs in the Inductor compiler related to the e8m0_rceil_log2 pattern failing on CUDA devices due to device string mismatches and incorrect uint8 outputs from ceil(log2(...)) pipelines. It implements hardware-specific fixes using PTX instructions on SM100+ GPUs and IEEE 754 bit-manipulation on pre-SM100 hardware for accurate computation.
    • pull/178698
  • Extensible StorageImpl materialization: This pull request introduces a pluggable MaterializeFn hook to StorageImpl that replaces hard-coded copy-on-write materialization logic. This allows any backend to intercept write-path data pointer access for materialization while preserving existing semantics and avoiding overhead on hot paths.
    • pull/179063
  • Enhanced PyObject_GetItem dispatch: This pull request implements the second branch of CPython's PyObject_GetItem dispatch by adding an sq_item branch and replaces a Python-level hasattr check with a more efficient C-level slot detection. This enables support for types implementing only sq_item without mp_subscript, such as collections.deque.
    • pull/179251
  • AOTInductor fallback ops support for grid sampler: This pull request adds c-shim support for aten.grid_sampler_3d, its backward variants, and cudnn grid sampler ops to the AOTInductor fallback operations. This enables these functions to run correctly through torch.compile(backend='inductor') without falling back to the proxy executor.
    • pull/179440
  • CUDA IPC deserialization fix: This pull request fixes a deserialization mismatch in CUDA IPC by explicitly communicating the handle type used in ExpandableSegment::share() to the consumer process. This prevents incorrect deserialization caused by uninitialized handle type state.
    • pull/179618

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
bobrenjc93 319 0 0 0
anijain2305 179 0 0 0
huydhn 74 0 0 0
malfet 67 0 0 0
weifengpy 63 1 0 0
yushangdi 51 0 0 0
aorenste 46 0 0 0
daisyden 43 0 0 0
colesbury 36 1 0 0
fxdawnn 34 2 0 0

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.