Weekly GitHub Report for Pytorch: April 25, 2026 - May 02, 2026 (19:18:20)
Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant enhancements including torch.compile support for Python 3.13, a new dynamic compilation control API torch.compiler.set_stance, and improved AOTInductor packaging and ABI compatibility. Notable highlights also include beta-level FP16 support on X86 CPUs, expanded Intel GPU support with simplified installation and Windows binaries, and a backward-incompatible security improvement flipping the default weights_only parameter in torch.load; additionally, PyTorch has deprecated its official Anaconda channel and updated Linux binaries to use Manylinux 2.28 with CXX11_ABI=1.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
As of our latest update, there are no active issues with ongoing comments this week.
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
As of our latest update, there are no stale issues for the project this week.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 0
Summarized Issues:
As of our latest update, there are no open issues for the project this week.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 2
Summarized Issues:
- ROCm CI Test Failures and Instability: The ROCm trunk distributed tests are timing out due to issues with rocshmem tests, which has led to their temporary disabling while a fix is being developed. Additionally, ROCm trunk CI jobs have become unstable because of test failures that were initially hidden by a Kineto submodule update, but reappeared after the workaround was removed, causing ongoing instability in ROCm jobs.
- issues/178884, issues/179911
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 0
As of our latest update, there are no open pull requests for the project this week.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 16
Key Closed Pull Requests
1. [overlap] pre-bucketing of fsdp collectives: This pull request introduces a pre-bucketing strategy for Fully Sharded Data Parallel (FSDP) collectives in the overlap scheduling algorithm to improve bucketing efficiency by calibrating bucket sizes based on process group bandwidth and latency, enabling reliable detection of FSDP collectives even with irregular patterns from Autoparallel shardings, and includes testing to ensure correctness.
- URL: pull/179935
2. [ROCm] - Reduce generated CK kernel files and build by default: This pull request updates the ROCm build configuration to enable the CK kernel build by default while implementing various filters to reduce the number of generated CK kernel files, optimizing the build process.
- URL: pull/178310
3. torch.backends.fp32_precision setter propagate to cudnn.conv/rnn: This pull request addresses the issue where the torch.backends.fp32_precision setter did not propagate to cudnn.conv and cudnn.rnn modules by implementing a default handling mechanism, adding try-except blocks to prevent runtime errors, and providing a workaround suggestion to a collaborator.
- URL: pull/179750
Other Closed Pull Requests
- Cache isolation in PyTorch Dynamo: This pull request introduces an isolated cache mechanism by adding a
region_idflag to cache entries, allowing multipletorch.compile()calls on the same function to maintain separate caches. This prevents interference in cache lookups, compilation limits, and execution strategies while still sharing profile-guided optimizations across regions.
- Iterator protocol enhancements: This pull request adds a
generic_iternextfunction implementing CPython'sPyIter_Nextsemantics and introducesiternext_implas the override point for thetp_iternextslot on VariableTracker subclasses. It also maintainsnext_variableas a public wrapper delegating toiternext_impl.
- Pipeline RECV deferral on AMD ROCm: This pull request adds a configurable flag to defer pipeline RECV operations on platforms like AMD ROCm, postponing RECVs until just before compute operations consume their data. This eliminates pipeline bubbles and avoids deadlocks using a rank-parity peer-to-peer ordering strategy.
- ROCm CI workflow improvements: This pull request adds distributed and inductor test configurations to the rocm-nightly CI workflow, enabling distributed tests with 3 shards on 4-GPU runners and inductor tests with 2 shards on single-GPU runners. These additions mirror existing periodic workflows for ROCm.
- CI jobs upgrade to CUDA 13.0: This pull request migrates all continuous integration jobs from CUDA version 12.8 to 13.0, reflecting the update to 13.0 as the stable release for the PyTorch 2.11 branch.
- CUDA graph capture stale stream detection: This pull request introduces detection logic for autograd nodes holding stale references to non-capturing CUDA streams during CUDA graph capture, raising a clear RuntimeError for default-stream stale references. It also adds an opt-in override flag that redirects stale non-capturing streams to the capturing stream, preventing opaque CUDA errors and enabling correct gradient computation.
- Build process migration to CMake: These pull requests move pre-build steps such as git submodule initialization and NCCL checkout, as well as source file mirroring, from setup.py to CMake. This ensures safe state checking and unconditional inclusion of these steps, improving build reliability and maintainability.
- PyTorch Inductor CUDA device fixes: This pull request fixes bugs in the Inductor compiler related to the
e8m0_rceil_log2pattern failing on CUDA devices due to device string mismatches and incorrectuint8outputs fromceil(log2(...))pipelines. It implements hardware-specific fixes using PTX instructions on SM100+ GPUs and IEEE 754 bit-manipulation on pre-SM100 hardware for accurate computation.
- Extensible StorageImpl materialization: This pull request introduces a pluggable
MaterializeFnhook toStorageImplthat replaces hard-coded copy-on-write materialization logic. This allows any backend to intercept write-path data pointer access for materialization while preserving existing semantics and avoiding overhead on hot paths.
- Enhanced PyObject_GetItem dispatch: This pull request implements the second branch of CPython's PyObject_GetItem dispatch by adding an
sq_itembranch and replaces a Python-levelhasattrcheck with a more efficient C-level slot detection. This enables support for types implementing onlysq_itemwithoutmp_subscript, such ascollections.deque.
- AOTInductor fallback ops support for grid sampler: This pull request adds c-shim support for
aten.grid_sampler_3d, its backward variants, and cudnn grid sampler ops to the AOTInductor fallback operations. This enables these functions to run correctly throughtorch.compile(backend='inductor')without falling back to the proxy executor.
- CUDA IPC deserialization fix: This pull request fixes a deserialization mismatch in CUDA IPC by explicitly communicating the handle type used in
ExpandableSegment::share()to the consumer process. This prevents incorrect deserialization caused by uninitialized handle type state.
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| bobrenjc93 | 319 | 0 | 0 | 0 |
| anijain2305 | 179 | 0 | 0 | 0 |
| huydhn | 74 | 0 | 0 | 0 |
| malfet | 67 | 0 | 0 | 0 |
| weifengpy | 63 | 1 | 0 | 0 |
| yushangdi | 51 | 0 | 0 | 0 |
| aorenste | 46 | 0 | 0 | 0 |
| daisyden | 43 | 0 | 0 | 0 |
| colesbury | 36 | 1 | 0 | 0 |
| fxdawnn | 34 | 2 | 0 | 0 |