Weekly GitHub Report for Pytorch: November 03, 2025 - November 10, 2025 (12:07:05)
Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant enhancements including torch.compile support for Python 3.13, a new performance tuning API torch.compiler.set_stance, and improved AOTInductor packaging and ABI compatibility. Notable highlights also include FP16 support on X86 CPUs, expanded Intel GPU support, a backward-incompatible security improvement changing the default of torch.load to weights_only=True, and the deprecation of official Conda package publishing, reflecting a trend toward improved performance, security, and streamlined deployment.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
max pooling on CUDA: RuntimeError: integer out of range: This issue reports a RuntimeError caused by integer overflow when performing max pooling on a large CUDA tensor with the NCHW memory format in PyTorch. The problem arises because the current CUDA kernel uses 32-bit integers for indexing, which cannot handle tensor dimensions exceeding 2^31, leading to a crash during max pooling operations on very large inputs.
- The discussion centers on a temporary workaround involving converting tensors to the channels-last memory format, which uses 64-bit indexing and avoids the overflow, though this incurs extra overhead. Contributors debate the feasibility of a full refactor to support 64-bit indexing in the NCHW kernel, noting it would be a large and complex change due to widespread use of 32-bit integers in kernel code, with consensus that the current fix is a practical short-term solution while a more comprehensive fix could be pursued later.
- Number of comments this week: 9
-
torch.onnx.exportwithdynamobreaksFakeQuantizeexporting: This issue reports that exporting the officialFakeQuantizelayer to ONNX usingtorch.onnx.exportworks with the legacy TorchScript-based exporter (dynamo=False), but fails when using the new Dynamo-based exporter (dynamo=True) becauseFakeQuantizecannot be converted to anExportedProgram. The failure is caused by the use of boolean conditionals on buffer tensors likeobserver_enabledandfake_quant_enabledinsideFakeQuantize, and attempts to replace it with a simplified version still fail due to missing ONNX export implementations for certain aten operators related to fake quantization.- The discussion in the comments centers around the missing ONNX function implementations for specific aten operators required by the new exporter, with suggestions to implement and register these in the onnxscript repository. Contributors also note that some export logic currently lives in different parts of the PyTorch and onnxscript repos, causing confusion, and that the recommended quantization API in PyTorch 2 is
torch.ao, which may have better support. The issue remains open with no immediate fix, and there is a call for improved documentation and clearer guidance on where to contribute export logic. - Number of comments this week: 6
- The discussion in the comments centers around the missing ONNX function implementations for specific aten operators required by the new exporter, with suggestions to implement and register these in the onnxscript repository. Contributors also note that some export logic currently lives in different parts of the PyTorch and onnxscript repos, causing confusion, and that the recommended quantization API in PyTorch 2 is
-
DTensor operations incur unnecessary gathering and memory consumption resulting therefrom: This issue reports that DTensor operations in PyTorch incur unnecessary tensor gathering and increased memory consumption during both forward and backward passes, particularly in batch matrix multiplication scenarios. The user provides minimal reproducible examples demonstrating that DTensor replicates tensors or performs implicit all-gathers when such communication should be avoidable, highlighting limitations in current sharding representations that only support outermost dimension sharding.
- The comments discuss reproducing the issue on CPU, revealing runtime errors related to sharding propagation and redistribution restrictions. They link this problem to existing issues about DTensor’s inability to represent strided sharding and note that current DTensor implementations only shard outermost dimensions, causing unnecessary communication. A proposal to extend sharding specifications to support ordering is mentioned as a potential solution, with requests for further investigation.
- Number of comments this week: 5
-
[BUG] Optimizer should validate
betasparameter type to prevent unexpected serialization behavior with OmegaConf objects: This issue addresses a bug in PyTorch optimizers where thebetasparameter accepts any sequence-like object without type validation, causing unexpected serialization behavior when an OmegaConfListConfigis passed. This leads to the entire configuration tree being serialized into the optimizer’s state dict, resulting in oversized checkpoint files and loading failures in PyTorch 2.6+ when usingweights_only=True.- The discussion confirms that converting
betasto a tuple resolves the issue, and a pull request was made to add input type validation; after this change, the problem no longer occurs, so further error checking was deemed unnecessary to avoid breaking existing use cases. - Number of comments this week: 5
- The discussion confirms that converting
-
[DTensor] DTensor randn inconsistent with single device behavior: This issue reports that the DTensor implementation of the randn function does not produce random values consistent with single-device behavior, despite a proposed consistency guarantee. The discrepancy is demonstrated with a minimal reproducible example showing that the local shard on rank 1 differs from the corresponding shard in the single-device full tensor, indicating a problem with how random number generation offsets are handled in distributed settings.
- The comments discuss a related solution from the veScale paper that ensures consistent global random matrices across sharded and non-sharded cases by subclassing the RNG state. Investigation traces the issue to the OffsetBasedRNGTracker and the CUDA kernel’s RNG state initialization, highlighting that the current grid-stride kernel does not maintain linear alignment between RNG offsets and output elements, and suggesting the need for a new kernel API and a tensor-shape aware RNG tracker similar to veScale’s approach.
- Number of comments this week: 4
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler': This issue reports an ImportError encountered when attempting to import the name 'triton_key' from the 'triton.compiler.compiler' module, which causes a backend compiler failure in PyTorch's inductor backend during model compilation. The user provides detailed environment information, including PyTorch version 2.4.0 development build, CUDA 12.1, and Ubuntu 22.04, and demonstrates the error occurring while compiling specific pipeline components with torch.compile using the "reduce-overhead" mode.
- Alternate algorithm for computing MaxPool2D under specific condition.: This issue proposes an alternate algorithm for computing MaxPool2D when the stride is equal to 1, by representing a larger kernel size (e.g., 5 or 7) as multiple smaller MaxPool2D operations with kernel size 3, which reduces the computational cost per cell. The suggested modification targets the MaxPool2D layer directly to avoid additional overhead during backpropagation and is expected to yield performance improvements specifically on CPU.
- cuda_utils.so: failed to map segment from shared object: This issue describes a problem encountered when running a PyTorch model inside a Docker container with a tmpfs mounted at
/tmphaving permissions set to1777. Although the model compiles successfully, execution fails with an error indicating that the shared objectcuda_utils.socannot be mapped due to missing execute permissions on the file, despite the script running as root and directory permissions being correct. - Enable UFMT on all files in PyTorch: This issue addresses the task of enabling uniform formatting (UFMT) across all files in the PyTorch codebase, specifically targeting approximately 1,500 files that are currently excluded from UFMT enforcement. It outlines the process for removing files from the exclusion list, running the formatter, handling known formatting-related problems, and organizing the work by directory to facilitate incremental and reviewable commits.
- [JIT archive] Add a flag to not include debug files: This issue proposes adding a flag to the
torch.jit.save()function that allows users to exclude debug files, specifically.debug_pklfiles, from the saved JIT archive to reduce file size. The motivation stems from observations that these debug files, which are only used for debugging purposes, can significantly increase the archive size without affecting model correctness, especially impacting deployment on mobile devices where storage is limited.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 97
Summarized Issues:
- Compilation and Tracing Issues with torch.compile and Dynamo: Multiple issues report failures or unexpected behavior when using
torch.compileor TorchDynamo, including errors tracing Python built-ins likebool()andprint(), segmentation faults withtorch.condin distributed programs, and bugs with dynamic shape handling and nested graph breaks. These problems highlight limitations in tracing, compilation, and backend support that cause crashes, incorrect outputs, or unsupported operator errors during model compilation and execution.
- DTensor Feature Limitations and Bugs: Several issues describe missing operator support, assertion errors, and performance inefficiencies in the DTensor implementation, including lack of support for
argmax/argmin, errors withconv2dwhen bias is None, and unnecessary tensor gathering causing memory overhead. These problems indicate incomplete sharding strategies and incorrect assumptions in DTensor's handling of distributed tensor operations.
- ROCm Platform Test Failures and Build Issues: Multiple test cases and builds on ROCm platforms are failing or disabled due to compatibility problems, including failures of
test_3layer_split_reductionandtest_fuzzer_issue_163674, nightly build failures for ROCm 7.1, and version mismatches causing runtime errors. These issues reflect ongoing instability and maintenance challenges in ROCm support.
- Memory and Performance Overheads in Compilation and Runtime: Several issues report excessive memory consumption and performance bottlenecks, such as increased memory use with
torch.compileon CPU, O(n²) complexity in shard validation causing long delays, and repeated memory allocations in AOTInductor leading to kernel launch interruptions. These inefficiencies degrade runtime performance and resource utilization.
- Exporting and ONNX Compatibility Problems: Issues highlight failures and regressions in model export workflows, including loss of operator argument names during export, failure to export models with tensor subclasses or hooks, and ONNX export errors with
FakeQuantizelayers and Hugging Face models. These problems hinder model interoperability and deployment.
- Triton Kernel Compilation and Feature Limitations: Several issues report Triton kernel compilation failures due to size limits, unsupported data types like float8, and missing debugging features such as input tensor dumping. These limitations affect the Inductor backend's ability to compile and debug GPU kernels efficiently.
- Bugs and Crashes in Specific PyTorch Functions and Modules: Various bugs cause crashes or incorrect behavior in PyTorch functions, including a buffer overflow in
pad_packed_sequence, a bug inaoti_compile_and_packagecausing CUDA guard errors, and a crash with full activation checkpointing combined with Dynamo's LRU cache. These critical bugs impact stability and correctness.- issues/166841, issues/166881, [issues/166926](https://github.com/issues/166926]
- Type Checking and Annotation Issues: Some issues address missing or problematic type annotations causing type checking failures, such as the lack of return type annotation in
named_modules()and problems with string type annotations breaking serialization ofGraphModule. These hinder static analysis and tooling.- issues/166905, issues/167117, [issues/167119](https://github.com/issues/167119]
- Distributed and Parallelism-Related Issues: Problems include FSDP2 omitting linear layers during compilation, automatic movement of inputs to devices causing inefficiencies, and RPC failures on Jetson Orin due to GPU UUID format. These issues affect distributed training workflows and device management.
- issues/167009, issues/167036, issues/167304, [issues/167305](https://github.com/issues/167305]
- CUDA and GPU Driver Related Failures: Issues include CUDA kernel indexing overflows causing incorrect results, missing OpenMP headers causing compile errors on Windows, and FFT hangs on RTX 5000 Ada GPUs with specific driver versions. These problems affect GPU computation reliability and compatibility.
- issues/167086, issues/167062, [issues/167409](https://github.com/issues/167409], issues/167253
- Requests for Feature Enhancements and API Improvements: Proposals include adding a unified memory pool for accelerators, fine control over FP32 precision in autograd backward, support for CUDA event input/output in streams, and improved support for
torch.cuda.use_mem_poolin compiled code. These aim to enhance flexibility and performance.
- Documentation and Usability Concerns: Issues include requests to fix tab order on documentation pages for better accessibility, removal of a large "Ask AI" button due to poor UI, and questioning the accuracy of a figure on the GLU documentation page. These affect user experience and clarity.
- Miscellaneous Bugs and Build Issues: Other problems include build failures due to incorrect package folder structure, optimizer parameter type validation missing causing large checkpoint files, and issues with PyTorch import on Jetson due to missing attributes. These affect development and deployment workflows.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 22
Summarized Issues:
- Build and Compilation Failures: Multiple issues report build or compilation failures caused by missing includes, syntax errors, or broken scripts. These include a syntax error in
cuda/Blas.cppcausing build failure, a missing#include <algorithm>leading to a compilation error, and a deleted Android build script causing confusion due to outdated documentation. - [issues/166810, issues/167315, issues/167186]
- ROCm Workflow Instability and Network Issues: Several issues describe instability and long queue times in ROCm-related workflows, as well as network outages on the MI250 Cirrascale cluster causing job failures. Mitigations include reducing workflow frequency and rerouting workloads, with plans to improve network monitoring.
- [issues/166866, issues/166874, issues/166875]
- Inductor and Compiler Bugs: There are multiple bugs in PyTorch's Inductor backend and Dynamo compiler, including a
NameErrorin Triton kernel generation when using.item(), a KeyError during guard generation for temporary variables, and incorrect parameter handling in theaddmmtoadd(mm)transformation. These bugs cause compilation failures and incorrect computation results. - [issues/166888, issues/166900, issues/167313]
- Memory and Resource Management Issues: Reports include a potential memory leak in the pybind11 type_caster for
at::Tensordue to improper reference counting, and redundant compiled code in the context parallel module that may unintentionally affect distribution operations. These issues could impact performance and resource usage. - [issues/167124, issues/167064]
- Tensor and Operation Bugs: Several issues describe incorrect behavior in tensor operations, such as incorrect stride calculation in einsum gradients, a regression in MPS backend buffer allocation for non-contiguous tensors, and unexpected concretization of symbolic dimensions in
FakeTensorPropwithaten.polar. These bugs lead to runtime errors or incorrect results. - [issues/167263, issues/167154, issues/167278]
- ONNX Export and Data Type Support Issues: One issue reports a failure when exporting models containing
float8_e4m3fntensors due to an invalid JitScalarType during ONNX optimization. Another issue requests CUDA 13 support on Windows, as current PyTorch versions only support CUDA 12, limiting GPU compatibility. - [issues/166933, issues/167042]
- Tracing and Debugging Tool Limitations: The
fx_traceback.annotatecontext manager fails to add annotations to assert nodes in traced models, resulting in missing stacktrace information. This limits debugging capabilities for exported program graphs. - [issues/166906]
- CUDA Compatibility and GPU Support: The official PyTorch wheels lack CUDA kernel support for the sm_120 compute capability required by NVIDIA RTX 5090 GPUs, causing runtime errors and preventing GPU acceleration until compatible wheels are released.
- [issues/167244]
- Local Build Process Breakage: A recent pull request breaks the local source build process by causing a runtime error in
mirror_inductor_external_kernels(), which is resolved by reverting the PR. This affects developers building PyTorch from source. - [issues/167321]
- Runtime Errors in PyTorch 2.9.0: An UnboundLocalError occurs due to accessing a local variable before assignment in version 2.9.0, with a request for a fix in version 2.9.1. This runtime error affects stability in the specified version.
- [issues/167344]
- GitHub Copilot Generated File Management: There is a discussion on whether the automatically generated
.github/copilot-instructions.mdfile by GitHub Copilot should be added to.gitignoreto avoid tracking it in git repositories. - [issues/166850]
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 227
Key Open Pull Requests
1. Add new CI jobs to run dynamo tests on all python versions supported: This pull request adds four new continuous integration jobs that run dynamo tests across all supported Python versions using the linux.2xlarge instance type.
- URL: pull/166978
- Merged: No
- Associated Commits: 77306, ded92, bcc44, ce743, e8b9b, fe69b, 3e6aa, ac12f, 1b579, 9c7ec, 16954, 8d60c, 7e474, 90238, 3cca4, c7829, a26ac, 404b1, bde50, ade38, a274e, 983bb, 9eeae, c95ea, 4cb9c, 78d9b, 370d6, 65519, 8a8b9, ad2b3, cae25
2. [ROCm][CI] Add job to cache ROCm docker images: This pull request adds a continuous integration job to cache ROCm Docker images in the PyTorch project, improving build efficiency by reusing these images, and includes various fixes and adjustments to the job configuration and dependencies.
- URL: pull/167379
- Merged: No
- Associated Commits: 74f0e, 7c386, 84aa7, f2975, 1f4e2, 066bd, 7a6ce, 27456, 95174, 05c37, dbefd, b1427, 57068, 56fac, cfe32, 97b97, e3c7a, 08f04, 933a0, c3a9e
3. [WIP][FlexAttention] Enable tensor descriptor for FlexAttention backward: This pull request aims to enable and integrate tensor descriptors for the FlexAttention backward pass in PyTorch, improving the implementation and testing of the FlexAttention module's backward computations.
- URL: pull/166927
- Merged: No
- Associated Commits: 2624f, 3ce05, c19ab, dfe0e, 912ec, d24c3, 6c9b3, af55e, a10d2, 4736a, 4d270, c866b, 92a65, e3788, 213be, 46176, b6ec0, 69d92, fcc19
Other Open Pull Requests
- Inductor Debugging and Modes: Multiple pull requests enhance PyTorch's inductor by adding runtime recording of Triton kernel executions for debugging and introducing an "inductor lite mode" that allows selective optimizations with numeric correctness guarantees. These changes improve debugging capabilities and give users more control over compilation behavior.
- Refactoring and Header-Only Improvements: Refactoring efforts include making TensorAccessor classes header-only with template parameters and decoupling graph outputs from output VariableTrackers in Dynamo to better support higher-order operators. These changes improve modularity, compatibility, and simplify output handling in the graph system.
- Segmentation Fault Reporting and JSON Uploads: A pull request adds manual JSON report generation for segmentation faults, which lack XML output, and uploads these reports to S3 for later ingestion, improving visibility in reporting systems like ClickHouse. This also suggests modifying test report code to utilize the new JSON data.
- Python Wrappers and Integration Tests: The
mtia_graphPython wrapper module is implemented to provide a Python interface to underlying C++ logic, accompanied by Python-level integration tests to verify functionality.
- DTensor Dispatch Optimization: A C++ fast path is introduced for the
DTensor.__torch_dispatch__mechanism, enabling efficient detection and delegation to C++ code, replacing previous dispatch key approaches and aiming for full native integration.
- Python 3.14 Support and CI Enhancements: Efforts to support Python 3.14 include enabling dynamo tests on this version and adding continuous integration support to build Docker images specifically for Python 3.14t.
- CDATA Handling Fixes: An unreviewed preliminary update addresses CDATA handling for the "pp" project, consisting of multiple automated commits and not yet merged.
- Functionalization of Print Operation: Functionalization is added to the print operation to ensure correct ordering and side effects are properly handled during execution.
- CUDA Backend Precision Configuration: A new API
torch.backends.cuda.math_sdp.fp32_precisionis introduced to configure float32 precision in theSDPBackend.MATH, improving numerical accuracy by replacing TF32 with IEEE FP32 precision. This includes a decorator for tests and a sanity check for precision correctness.
- Scaled Dot Product Attention FA4 Backend Support: Support for the FA4 backend is added to the scaled dot product attention implementation, including installation, benchmarking, and example usage on CUDA devices to demonstrate integration and performance.
- Linters for Stable Shim and Versioning: Linters are introduced to ensure that new function declarations in
torch/csrc/stable/c/shim.hand calls to stable shim APIs are properly enclosed within correctTORCH_FEATURE_VERSIONmacros, enforcing consistent versioning and preventing unversioned usage.
- Native Ops Schema and Adapter Checks: A linter parses functions called via
torch_call_dispatcherto populatenative_ops.txtand checks for schema changes, warning users to updateops.hand enforce registration of schema adapters when default argument values change.
- CUDA 13.0 Eager Tests and Build Updates: Eager tests for CUDA 13.0 are added and configured, including updates to build settings, CUDA architecture specifications, and related flags to ensure compatibility and proper testing on the latest CUDA version.
- XPU Support Upgrades and CI Maintenance: The XPU support package is upgraded from version 2025.2 to 2025.3 with added build and test support in CI on Linux and Windows, while maintaining the older version until PyTorch 2.10 release to avoid breakage. Related fixes include Ubuntu upgrades and GCC 13+ warning resolutions.
- XNNPack Submodule Update for GCC14: The XNNPack submodule is updated to a version compatible with GCC14, addressing build issues with the newer compiler.
- Intel GPU Inductor XPU Tests: Missing Intel GPU inductor unit tests for the XPU backend are enabled and fixed, improving test coverage and stability.
- AArch64 Floating-Point Test Fixes: A fix is reinstated for the
test_matmul_mv_cpu_float32to address floating-point associativity errors on AArch64, proposing adjusted test tolerances to reduce flakiness and enabling thetest_linalgsuite for this architecture.
- cuDNN Support Restriction for CUDA Graphs: cuDNN support is disabled for cases where the query sequence length modulo 128 is not zero to prevent issues with CUDA graph captures.
- Autograd Retracing Optimization: Unnecessary retracing in PyTorch when a subclass has
requires_gradset to True is eliminated, improving efficiency in the autograd system and addressing issue #132651.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 228
Key Closed Pull Requests
1. [xpu][test]Enable more test cases for inductor.: This pull request aims to enable more test cases for the Inductor component on the XPU backend, improving test coverage and validation for this part of the PyTorch project.
- URL: pull/166834
- Merged: No
- Associated Commits: 4a0a8, 69789, 07fc7, 24261, e8c82, 83433, 0ef95, acb91, 5db29, 8a919, d3514, 7da3a, b7e32, 347c8, c05a6, a3a0f, 61be2, df30c, 86b6b, 32bb8, a5a4c, 776c4, bfe37, 5d904, 2fe17, 53d66, 86d60, 67dc0, 137ca, e4795, d9ccb, 17f44, df968, 55af9, bce27, 73c61, ac565, 770e3, 5a757, 9f7ca, 218d6, 57e9b, d04bd, 394e4, 0d600, 7121a, 4a164, c7f03, fea88, 25061, e3d9c, 071de, f7dba, 24c41, ea84b, 9f32d, b231f, 9a72c, 57cdc, 6d820, a88c8, 797f8, 55780, d9513, 314d1, 9f42c, 9d7a8, a0997, 32e28, 1339b, 3a98b, c8526, 3a76d, aa87f, 30d0e, d4529, 840bd, dbb21, 62f5b, 182fa, 0bed0, 46455, ea92e, e05da, cf195, 3ad63, fd9d1, c9415, 98bab, fa701, 06025, d0f83, 36c34, 6369d, 50769, d5ea4, e92f4, d8c92, aebc2, 8e69c, 9676d, f5ecd, f7776, 3bdef, 3d9ac, 9b38d, 069fb, fc77a, 01c36, 5d573, ac065, 1cef1, 13a8a, 36bf2, e436d, 72fc0, c427b, eb540, 92f94, cdac4, 93f25, eaaed, f23df, 81c17, 7e654, 441a2, bd434, 46419, d7f02, ede4f, 8cecc, 25c85, ebb63, 75e9f, bb9c2, b6f73, d74ab, e7f8b, 36580, 8bcc5, dbcfc, 856c4, 6e8e9, 2bd28, 36473, b451b, 29be6, 2ddd4, 25e49, 2be44, 335a5, 0ee16, fb411, f0020, dd516, a07e1, 7ff42, 30fc5, ad1da, e86ca, d2956, 4e6f6, 88d1b, 6ca38, d93e4, c6a71, c5077, d32a2, ddd72, fe215, 270f5, df10b, 65eea, 1721d, fc43d, 1ef86, 6fa82, ff7c1, ba939, 924eb, 51356, d1d10, 20ea9, 27031, ace87, e84a0, 1f5da, f7493, d9b2b, 98a41, 6583c, 6bebd, 46fcf, 563d9, 29ea0, b8b20, 41e9f, ce2f1, d9c65, 4f067, 0f8f1, b8eab, 9d824, b8025, a1002, 5d59c, b2e33, fcb46, dff1e, 0d747, a8b86, ce91b, 93cad, c00f3, 87f57, da8e3, 832ea, ed0ad, 05ea0, 39b24, 3bf2d, 3e5d3, 5935f, f4c93, f8c5e, b3169, ce3d9, febb9, d5488, 9876a, 4cd77, 0d4f4, 2520a, f8f4b, e345d, 18b22, 81a77, 43d83, 00083, cc689, f9ab0, eeab7, c91ac, 7b6ef, 8d4ee, 53bb5, 0857d, 6a563, 59af9, 0cb87
2. [HOP][print]Add make_fx for the proxy with graph module print: This pull request proposes adding a make_fx test for the proxy with graph module print functionality in the PyTorch project, aiming to enhance testing coverage for this feature, although it was not merged.
- URL: pull/166920
- Merged: No
- Associated Commits: 782ae, e72a7, 72a1e, 27a74, 92fc0, 95d91, 4d003, 1d2e4, d18cc, 81ebc, ae8d8, d2e4c, b2b26, 22d5b, 1bdee, a150e, 04c18, 5e08f, 568a5, 4880b
3. compile time comm benchmarking: This pull request adds an option for compile-time collective communication benchmarking to evaluate comms and compute overlap scheduling by gathering median results across ranks and logging these alongside inductor analytic and NCCL estimator results to tlparse, with plans for further enhancements to improve deterministic and latency estimates.
- URL: pull/167100
- Merged: No
- Associated Commits: 67143, c6e0c, 4570e, 3a72c, 0eea9, e68d1, ba551, d2f7a, 18eca, c9870, 894ec, d15ea, f98a3, 8a6dc, b59bf, dec2a, 9603c
Other Closed Pull Requests
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
| Contributor | Commits | Pull Requests | Issues | Comments |
|---|---|---|---|---|
| cyyever | 190 | 55 | 0 | 33 |
| guangyey | 187 | 18 | 1 | 52 |
| malfet | 118 | 21 | 4 | 75 |
| williamwen42 | 120 | 32 | 9 | 31 |
| anijain2305 | 151 | 20 | 0 | 15 |
| Skylion007 | 11 | 7 | 1 | 161 |
| pianpwk | 119 | 22 | 1 | 12 |
| Lucaskabela | 64 | 15 | 0 | 63 |
| ezyang | 62 | 19 | 8 | 36 |
| eellison | 85 | 4 | 2 | 32 |