Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Pytorch: November 03, 2025 - November 10, 2025 (12:06:47)

Weekly GitHub Report for Pytorch

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

The current version of this repository is v2.6.0

1.2 Version Information:

Released on January 29, 2025, PyTorch 2.6 introduces significant enhancements including torch.compile support for Python 3.13, a new performance tuning API torch.compiler.set_stance, and improved AOTInductor packaging and ABI compatibility. Notable highlights also include FP16 support on X86 CPUs, expanded Intel GPU support, a backward-incompatible security improvement changing the default of torch.load to weights_only=True, and the deprecation of official Conda package publishing, reflecting a trend toward improved performance, security, and streamlined deployment.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. max pooling on CUDA: RuntimeError: integer out of range: This issue reports a RuntimeError caused by integer overflow when performing max pooling on a large CUDA tensor with the NCHW memory format in PyTorch. The problem arises because the current CUDA kernel uses 32-bit integers for indexing, which cannot handle tensor dimensions exceeding 2^31, leading to a crash during max pooling operations on very large inputs.

    • The discussion centers on a temporary workaround involving converting tensors to the channels-last memory format, which uses 64-bit indexing and avoids the overflow, though this incurs extra overhead. Contributors debate the feasibility of a full refactor to support 64-bit indexing in the NCHW kernel, noting it would be a large and complex change due to widespread use of 32-bit integers in kernel code, with consensus that the current fix is a practical short-term solution while a more comprehensive fix could be pursued later.
    • Number of comments this week: 9
  2. torch.onnx.export with dynamo breaks FakeQuantize exporting: This issue reports that exporting the official FakeQuantize layer to ONNX using torch.onnx.export works with the legacy TorchScript-based exporter (dynamo=False), but fails when using the new Dynamo-based exporter (dynamo=True) because FakeQuantize cannot be converted to an ExportedProgram. The failure is caused by the use of boolean conditionals on buffer tensors like observer_enabled and fake_quant_enabled inside FakeQuantize, and attempts to replace it with a simplified version still fail due to missing ONNX export implementations for certain aten operators related to fake quantization.

    • The discussion in the comments centers around the missing ONNX function implementations for specific aten operators required by the new exporter, with suggestions to implement and register these in the onnxscript repository. Contributors also note that some export logic currently lives in different parts of the PyTorch and onnxscript repos, causing confusion, and that the recommended quantization API in PyTorch 2 is torch.ao, which may have better support. The issue remains open with no immediate fix, and there is a call for improved documentation and clearer guidance on where to contribute export logic.
    • Number of comments this week: 6
  3. DTensor operations incur unnecessary gathering and memory consumption resulting therefrom: This issue reports that DTensor operations in PyTorch incur unnecessary tensor gathering and increased memory consumption during both forward and backward passes, particularly in batch matrix multiplication scenarios. The user provides minimal reproducible examples demonstrating that DTensor replicates tensors or performs implicit all-gathers when such communication should be avoidable, highlighting limitations in current sharding representations that only support outermost dimension sharding.

    • The comments discuss reproducing the issue on CPU, revealing runtime errors related to sharding propagation and redistribution restrictions. They link this problem to existing issues about DTensor’s inability to represent strided sharding and note that current DTensor implementations only shard outermost dimensions, causing unnecessary communication. A proposal to extend sharding specifications to support ordering is mentioned as a potential solution, with requests for further investigation.
    • Number of comments this week: 5
  4. [BUG] Optimizer should validate betas parameter type to prevent unexpected serialization behavior with OmegaConf objects: This issue addresses a bug in PyTorch optimizers where the betas parameter accepts any sequence-like object without type validation, causing unexpected serialization behavior when an OmegaConf ListConfig is passed. This leads to the entire configuration tree being serialized into the optimizer’s state dict, resulting in oversized checkpoint files and loading failures in PyTorch 2.6+ when using weights_only=True.

    • The discussion confirms that converting betas to a tuple resolves the issue, and a pull request was made to add input type validation; after this change, the problem no longer occurs, so further error checking was deemed unnecessary to avoid breaking existing use cases.
    • Number of comments this week: 5
  5. [DTensor] DTensor randn inconsistent with single device behavior: This issue reports that the DTensor implementation of the randn function does not produce random values consistent with single-device behavior, despite a proposed consistency guarantee. The discrepancy is demonstrated with a minimal reproducible example showing that the local shard on rank 1 differs from the corresponding shard in the single-device full tensor, indicating a problem with how random number generation offsets are handled in distributed settings.

    • The comments discuss a related solution from the veScale paper that ensures consistent global random matrices across sharded and non-sharded cases by subclassing the RNG state. Investigation traces the issue to the OffsetBasedRNGTracker and the CUDA kernel’s RNG state initialization, highlighting that the current grid-stride kernel does not maintain linear alignment between RNG offsets and output elements, and suggesting the need for a new kernel API and a tensor-shape aware RNG tracker similar to veScale’s approach.
    • Number of comments this week: 4

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler': This issue reports an ImportError encountered when attempting to import the name 'triton_key' from the 'triton.compiler.compiler' module, which causes a backend compiler failure in PyTorch's inductor backend during model compilation. The user provides detailed environment information, including PyTorch version 2.4.0 development build, CUDA 12.1, and Ubuntu 22.04, and demonstrates the error occurring while compiling specific pipeline components with torch.compile using the "reduce-overhead" mode.
  2. Alternate algorithm for computing MaxPool2D under specific condition.: This issue proposes an alternate algorithm for computing MaxPool2D when the stride is equal to 1, by representing a larger kernel size (e.g., 5 or 7) as multiple smaller MaxPool2D operations with kernel size 3, which reduces the computational cost per cell. The suggested modification targets the MaxPool2D layer directly to avoid additional overhead during backpropagation and is expected to yield performance improvements specifically on CPU.
  3. cuda_utils.so: failed to map segment from shared object: This issue describes a problem encountered when running a PyTorch model inside a Docker container with a tmpfs mounted at /tmp having permissions set to 1777. Although the model compiles successfully, execution fails with an error indicating that the shared object cuda_utils.so cannot be mapped due to missing execute permissions on the file, despite the script running as root and directory permissions being correct.
  4. Enable UFMT on all files in PyTorch: This issue addresses the task of enabling uniform formatting (UFMT) across all files in the PyTorch codebase, specifically targeting approximately 1,500 files that are currently excluded from UFMT enforcement. It outlines the process for removing files from the exclusion list, running the formatter, handling known formatting-related problems, and organizing the work by directory to facilitate incremental and reviewable commits.
  5. [JIT archive] Add a flag to not include debug files: This issue proposes adding a flag to the torch.jit.save() function that allows users to exclude debug files, specifically .debug_pkl files, from the saved JIT archive to reduce file size. The motivation stems from observations that these debug files, which are only used for debugging purposes, can significantly increase the archive size without affecting model correctness, especially impacting deployment on mobile devices where storage is limited.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 97

Summarized Issues:

  • Compilation and Tracing Issues with torch.compile and Dynamo: Multiple issues report failures or unexpected behavior when using torch.compile or TorchDynamo, including errors tracing Python built-ins like bool() and print(), segmentation faults with torch.cond in distributed programs, and bugs with dynamic shape handling and nested graph breaks. These problems highlight limitations in tracing, compilation, and backend support that cause crashes, incorrect outputs, or unsupported operator errors during model compilation and execution.
    • issues/166918, issues/167266, issues/167269, issues/167275, issues/167276, issues/167393, issues/167400, issues/167116, issues/167191
  • DTensor Feature Limitations and Bugs: Several issues describe missing operator support, assertion errors, and performance inefficiencies in the DTensor implementation, including lack of support for argmax/argmin, errors with conv2d when bias is None, and unnecessary tensor gathering causing memory overhead. These problems indicate incomplete sharding strategies and incorrect assumptions in DTensor's handling of distributed tensor operations.
    • issues/166930, issues/167072, issues/167090, issues/167091, issues/167106, issues/167219, issues/167349
  • ROCm Platform Test Failures and Build Issues: Multiple test cases and builds on ROCm platforms are failing or disabled due to compatibility problems, including failures of test_3layer_split_reduction and test_fuzzer_issue_163674, nightly build failures for ROCm 7.1, and version mismatches causing runtime errors. These issues reflect ongoing instability and maintenance challenges in ROCm support.
    • issues/166836, issues/167215, issues/166992, issues/167324, issues/167411
  • Memory and Performance Overheads in Compilation and Runtime: Several issues report excessive memory consumption and performance bottlenecks, such as increased memory use with torch.compile on CPU, O(n²) complexity in shard validation causing long delays, and repeated memory allocations in AOTInductor leading to kernel launch interruptions. These inefficiencies degrade runtime performance and resource utilization.
    • issues/166939, issues/166941, issues/167199
  • Exporting and ONNX Compatibility Problems: Issues highlight failures and regressions in model export workflows, including loss of operator argument names during export, failure to export models with tensor subclasses or hooks, and ONNX export errors with FakeQuantize layers and Hugging Face models. These problems hinder model interoperability and deployment.
    • issues/166851, issues/167007, issues/167329, issues/167063, issues/167068, issues/167284
  • Triton Kernel Compilation and Feature Limitations: Several issues report Triton kernel compilation failures due to size limits, unsupported data types like float8, and missing debugging features such as input tensor dumping. These limitations affect the Inductor backend's ability to compile and debug GPU kernels efficiently.
    • issues/166949, issues/167050, issues/167082, issues/167098
  • Bugs and Crashes in Specific PyTorch Functions and Modules: Various bugs cause crashes or incorrect behavior in PyTorch functions, including a buffer overflow in pad_packed_sequence, a bug in aoti_compile_and_package causing CUDA guard errors, and a crash with full activation checkpointing combined with Dynamo's LRU cache. These critical bugs impact stability and correctness.
    • issues/166841, issues/166881, [issues/166926](https://github.com/issues/166926]
  • Type Checking and Annotation Issues: Some issues address missing or problematic type annotations causing type checking failures, such as the lack of return type annotation in named_modules() and problems with string type annotations breaking serialization of GraphModule. These hinder static analysis and tooling.
    • issues/166905, issues/167117, [issues/167119](https://github.com/issues/167119]
  • Distributed and Parallelism-Related Issues: Problems include FSDP2 omitting linear layers during compilation, automatic movement of inputs to devices causing inefficiencies, and RPC failures on Jetson Orin due to GPU UUID format. These issues affect distributed training workflows and device management.
    • issues/167009, issues/167036, issues/167304, [issues/167305](https://github.com/issues/167305]
  • CUDA and GPU Driver Related Failures: Issues include CUDA kernel indexing overflows causing incorrect results, missing OpenMP headers causing compile errors on Windows, and FFT hangs on RTX 5000 Ada GPUs with specific driver versions. These problems affect GPU computation reliability and compatibility.
    • issues/167086, issues/167062, [issues/167409](https://github.com/issues/167409], issues/167253
  • Requests for Feature Enhancements and API Improvements: Proposals include adding a unified memory pool for accelerators, fine control over FP32 precision in autograd backward, support for CUDA event input/output in streams, and improved support for torch.cuda.use_mem_pool in compiled code. These aim to enhance flexibility and performance.
    • issues/167210, issues/167254, issues/167257, issues/167026
  • Documentation and Usability Concerns: Issues include requests to fix tab order on documentation pages for better accessibility, removal of a large "Ask AI" button due to poor UI, and questioning the accuracy of a figure on the GLU documentation page. These affect user experience and clarity.
    • issues/166823, issues/167229, issues/167240
  • Miscellaneous Bugs and Build Issues: Other problems include build failures due to incorrect package folder structure, optimizer parameter type validation missing causing large checkpoint files, and issues with PyTorch import on Jetson due to missing attributes. These affect development and deployment workflows.
    • issues/167365, issues/167319, issues/167309

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 22

Summarized Issues:

  • Build and Compilation Failures: Multiple issues report build or compilation failures caused by missing includes, syntax errors, or broken scripts. These include a syntax error in cuda/Blas.cpp causing build failure, a missing #include <algorithm> leading to a compilation error, and a deleted Android build script causing confusion due to outdated documentation.
  • [issues/166810, issues/167315, issues/167186]
  • ROCm Workflow Instability and Network Issues: Several issues describe instability and long queue times in ROCm-related workflows, as well as network outages on the MI250 Cirrascale cluster causing job failures. Mitigations include reducing workflow frequency and rerouting workloads, with plans to improve network monitoring.
  • [issues/166866, issues/166874, issues/166875]
  • Inductor and Compiler Bugs: There are multiple bugs in PyTorch's Inductor backend and Dynamo compiler, including a NameError in Triton kernel generation when using .item(), a KeyError during guard generation for temporary variables, and incorrect parameter handling in the addmm to add(mm) transformation. These bugs cause compilation failures and incorrect computation results.
  • [issues/166888, issues/166900, issues/167313]
  • Memory and Resource Management Issues: Reports include a potential memory leak in the pybind11 type_caster for at::Tensor due to improper reference counting, and redundant compiled code in the context parallel module that may unintentionally affect distribution operations. These issues could impact performance and resource usage.
  • [issues/167124, issues/167064]
  • Tensor and Operation Bugs: Several issues describe incorrect behavior in tensor operations, such as incorrect stride calculation in einsum gradients, a regression in MPS backend buffer allocation for non-contiguous tensors, and unexpected concretization of symbolic dimensions in FakeTensorProp with aten.polar. These bugs lead to runtime errors or incorrect results.
  • [issues/167263, issues/167154, issues/167278]
  • ONNX Export and Data Type Support Issues: One issue reports a failure when exporting models containing float8_e4m3fn tensors due to an invalid JitScalarType during ONNX optimization. Another issue requests CUDA 13 support on Windows, as current PyTorch versions only support CUDA 12, limiting GPU compatibility.
  • [issues/166933, issues/167042]
  • Tracing and Debugging Tool Limitations: The fx_traceback.annotate context manager fails to add annotations to assert nodes in traced models, resulting in missing stacktrace information. This limits debugging capabilities for exported program graphs.
  • [issues/166906]
  • CUDA Compatibility and GPU Support: The official PyTorch wheels lack CUDA kernel support for the sm_120 compute capability required by NVIDIA RTX 5090 GPUs, causing runtime errors and preventing GPU acceleration until compatible wheels are released.
  • [issues/167244]
  • Local Build Process Breakage: A recent pull request breaks the local source build process by causing a runtime error in mirror_inductor_external_kernels(), which is resolved by reverting the PR. This affects developers building PyTorch from source.
  • [issues/167321]
  • Runtime Errors in PyTorch 2.9.0: An UnboundLocalError occurs due to accessing a local variable before assignment in version 2.9.0, with a request for a fix in version 2.9.1. This runtime error affects stability in the specified version.
  • [issues/167344]
  • GitHub Copilot Generated File Management: There is a discussion on whether the automatically generated .github/copilot-instructions.md file by GitHub Copilot should be added to .gitignore to avoid tracking it in git repositories.
  • [issues/166850]

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 227

Key Open Pull Requests

1. Add new CI jobs to run dynamo tests on all python versions supported: This pull request adds four new continuous integration jobs that run dynamo tests across all supported Python versions using the linux.2xlarge instance type.

  • URL: pull/166978
  • Merged: No
  • Associated Commits: 77306, ded92, bcc44, ce743, e8b9b, fe69b, 3e6aa, ac12f, 1b579, 9c7ec, 16954, 8d60c, 7e474, 90238, 3cca4, c7829, a26ac, 404b1, bde50, ade38, a274e, 983bb, 9eeae, c95ea, 4cb9c, 78d9b, 370d6, 65519, 8a8b9, ad2b3, cae25

2. [ROCm][CI] Add job to cache ROCm docker images: This pull request adds a continuous integration job to cache ROCm Docker images in the PyTorch project, improving build efficiency by reusing these images, and includes various fixes and adjustments to the job configuration and dependencies.

  • URL: pull/167379
  • Merged: No
  • Associated Commits: 74f0e, 7c386, 84aa7, f2975, 1f4e2, 066bd, 7a6ce, 27456, 95174, 05c37, dbefd, b1427, 57068, 56fac, cfe32, 97b97, e3c7a, 08f04, 933a0, c3a9e

3. [WIP][FlexAttention] Enable tensor descriptor for FlexAttention backward: This pull request aims to enable and integrate tensor descriptors for the FlexAttention backward pass in PyTorch, improving the implementation and testing of the FlexAttention module's backward computations.

  • URL: pull/166927
  • Merged: No
  • Associated Commits: 2624f, 3ce05, c19ab, dfe0e, 912ec, d24c3, 6c9b3, af55e, a10d2, 4736a, 4d270, c866b, 92a65, e3788, 213be, 46176, b6ec0, 69d92, fcc19

Other Open Pull Requests

  • Inductor Debugging and Modes: Multiple pull requests enhance PyTorch's inductor by adding runtime recording of Triton kernel executions for debugging and introducing an "inductor lite mode" that allows selective optimizations with numeric correctness guarantees. These changes improve debugging capabilities and give users more control over compilation behavior.
    • pull/167028, pull/167115
  • Refactoring and Header-Only Improvements: Refactoring efforts include making TensorAccessor classes header-only with template parameters and decoupling graph outputs from output VariableTrackers in Dynamo to better support higher-order operators. These changes improve modularity, compatibility, and simplify output handling in the graph system.
    • pull/166855, pull/167377
  • Segmentation Fault Reporting and JSON Uploads: A pull request adds manual JSON report generation for segmentation faults, which lack XML output, and uploads these reports to S3 for later ingestion, improving visibility in reporting systems like ClickHouse. This also suggests modifying test report code to utilize the new JSON data.
    • pull/167250
  • Python Wrappers and Integration Tests: The mtia_graph Python wrapper module is implemented to provide a Python interface to underlying C++ logic, accompanied by Python-level integration tests to verify functionality.
    • pull/166964
  • DTensor Dispatch Optimization: A C++ fast path is introduced for the DTensor.__torch_dispatch__ mechanism, enabling efficient detection and delegation to C++ code, replacing previous dispatch key approaches and aiming for full native integration.
    • pull/167051
  • Python 3.14 Support and CI Enhancements: Efforts to support Python 3.14 include enabling dynamo tests on this version and adding continuous integration support to build Docker images specifically for Python 3.14t.
    • pull/167246, pull/167376
  • CDATA Handling Fixes: An unreviewed preliminary update addresses CDATA handling for the "pp" project, consisting of multiple automated commits and not yet merged.
    • pull/167006
  • Functionalization of Print Operation: Functionalization is added to the print operation to ensure correct ordering and side effects are properly handled during execution.
    • pull/167016
  • CUDA Backend Precision Configuration: A new API torch.backends.cuda.math_sdp.fp32_precision is introduced to configure float32 precision in the SDPBackend.MATH, improving numerical accuracy by replacing TF32 with IEEE FP32 precision. This includes a decorator for tests and a sanity check for precision correctness.
    • pull/167157
  • Scaled Dot Product Attention FA4 Backend Support: Support for the FA4 backend is added to the scaled dot product attention implementation, including installation, benchmarking, and example usage on CUDA devices to demonstrate integration and performance.
    • pull/167348
  • Linters for Stable Shim and Versioning: Linters are introduced to ensure that new function declarations in torch/csrc/stable/c/shim.h and calls to stable shim APIs are properly enclosed within correct TORCH_FEATURE_VERSION macros, enforcing consistent versioning and preventing unversioned usage.
    • pull/166995, pull/166996
  • Native Ops Schema and Adapter Checks: A linter parses functions called via torch_call_dispatcher to populate native_ops.txt and checks for schema changes, warning users to update ops.h and enforce registration of schema adapters when default argument values change.
    • pull/166997
  • CUDA 13.0 Eager Tests and Build Updates: Eager tests for CUDA 13.0 are added and configured, including updates to build settings, CUDA architecture specifications, and related flags to ensure compatibility and proper testing on the latest CUDA version.
    • pull/167207
  • XPU Support Upgrades and CI Maintenance: The XPU support package is upgraded from version 2025.2 to 2025.3 with added build and test support in CI on Linux and Windows, while maintaining the older version until PyTorch 2.10 release to avoid breakage. Related fixes include Ubuntu upgrades and GCC 13+ warning resolutions.
    • pull/166829
  • XNNPack Submodule Update for GCC14: The XNNPack submodule is updated to a version compatible with GCC14, addressing build issues with the newer compiler.
    • pull/166873
  • Intel GPU Inductor XPU Tests: Missing Intel GPU inductor unit tests for the XPU backend are enabled and fixed, improving test coverage and stability.
    • pull/167047
  • AArch64 Floating-Point Test Fixes: A fix is reinstated for the test_matmul_mv_cpu_float32 to address floating-point associativity errors on AArch64, proposing adjusted test tolerances to reduce flakiness and enabling the test_linalg suite for this architecture.
    • pull/167069
  • cuDNN Support Restriction for CUDA Graphs: cuDNN support is disabled for cases where the query sequence length modulo 128 is not zero to prevent issues with CUDA graph captures.
    • pull/167101
  • Autograd Retracing Optimization: Unnecessary retracing in PyTorch when a subclass has requires_grad set to True is eliminated, improving efficiency in the autograd system and addressing issue #132651.
    • pull/167135

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 228

Key Closed Pull Requests

1. [xpu][test]Enable more test cases for inductor.: This pull request aims to enable more test cases for the Inductor component on the XPU backend, improving test coverage and validation for this part of the PyTorch project.

  • URL: pull/166834
  • Merged: No
  • Associated Commits: 4a0a8, 69789, 07fc7, 24261, e8c82, 83433, 0ef95, acb91, 5db29, 8a919, d3514, 7da3a, b7e32, 347c8, c05a6, a3a0f, 61be2, df30c, 86b6b, 32bb8, a5a4c, 776c4, bfe37, 5d904, 2fe17, 53d66, 86d60, 67dc0, 137ca, e4795, d9ccb, 17f44, df968, 55af9, bce27, 73c61, ac565, 770e3, 5a757, 9f7ca, 218d6, 57e9b, d04bd, 394e4, 0d600, 7121a, 4a164, c7f03, fea88, 25061, e3d9c, 071de, f7dba, 24c41, ea84b, 9f32d, b231f, 9a72c, 57cdc, 6d820, a88c8, 797f8, 55780, d9513, 314d1, 9f42c, 9d7a8, a0997, 32e28, 1339b, 3a98b, c8526, 3a76d, aa87f, 30d0e, d4529, 840bd, dbb21, 62f5b, 182fa, 0bed0, 46455, ea92e, e05da, cf195, 3ad63, fd9d1, c9415, 98bab, fa701, 06025, d0f83, 36c34, 6369d, 50769, d5ea4, e92f4, d8c92, aebc2, 8e69c, 9676d, f5ecd, f7776, 3bdef, 3d9ac, 9b38d, 069fb, fc77a, 01c36, 5d573, ac065, 1cef1, 13a8a, 36bf2, e436d, 72fc0, c427b, eb540, 92f94, cdac4, 93f25, eaaed, f23df, 81c17, 7e654, 441a2, bd434, 46419, d7f02, ede4f, 8cecc, 25c85, ebb63, 75e9f, bb9c2, b6f73, d74ab, e7f8b, 36580, 8bcc5, dbcfc, 856c4, 6e8e9, 2bd28, 36473, b451b, 29be6, 2ddd4, 25e49, 2be44, 335a5, 0ee16, fb411, f0020, dd516, a07e1, 7ff42, 30fc5, ad1da, e86ca, d2956, 4e6f6, 88d1b, 6ca38, d93e4, c6a71, c5077, d32a2, ddd72, fe215, 270f5, df10b, 65eea, 1721d, fc43d, 1ef86, 6fa82, ff7c1, ba939, 924eb, 51356, d1d10, 20ea9, 27031, ace87, e84a0, 1f5da, f7493, d9b2b, 98a41, 6583c, 6bebd, 46fcf, 563d9, 29ea0, b8b20, 41e9f, ce2f1, d9c65, 4f067, 0f8f1, b8eab, 9d824, b8025, a1002, 5d59c, b2e33, fcb46, dff1e, 0d747, a8b86, ce91b, 93cad, c00f3, 87f57, da8e3, 832ea, ed0ad, 05ea0, 39b24, 3bf2d, 3e5d3, 5935f, f4c93, f8c5e, b3169, ce3d9, febb9, d5488, 9876a, 4cd77, 0d4f4, 2520a, f8f4b, e345d, 18b22, 81a77, 43d83, 00083, cc689, f9ab0, eeab7, c91ac, 7b6ef, 8d4ee, 53bb5, 0857d, 6a563, 59af9, 0cb87

2. [HOP][print]Add make_fx for the proxy with graph module print: This pull request proposes adding a make_fx test for the proxy with graph module print functionality in the PyTorch project, aiming to enhance testing coverage for this feature, although it was not merged.

  • URL: pull/166920
  • Merged: No
  • Associated Commits: 782ae, e72a7, 72a1e, 27a74, 92fc0, 95d91, 4d003, 1d2e4, d18cc, 81ebc, ae8d8, d2e4c, b2b26, 22d5b, 1bdee, a150e, 04c18, 5e08f, 568a5, 4880b

3. compile time comm benchmarking: This pull request adds an option for compile-time collective communication benchmarking to evaluate comms and compute overlap scheduling by gathering median results across ranks and logging these alongside inductor analytic and NCCL estimator results to tlparse, with plans for further enhancements to improve deterministic and latency estimates.

  • URL: pull/167100
  • Merged: No
  • Associated Commits: 67143, c6e0c, 4570e, 3a72c, 0eea9, e68d1, ba551, d2f7a, 18eca, c9870, 894ec, d15ea, f98a3, 8a6dc, b59bf, dec2a, 9603c

Other Closed Pull Requests

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
cyyever 190 55 0 33
guangyey 187 18 1 52
malfet 118 21 4 75
williamwen42 120 32 9 31
anijain2305 151 20 0 15
Skylion007 11 7 1 161
pianpwk 119 22 1 12
Lucaskabela 64 15 0 63
ezyang 62 19 8 36
eellison 85 4 2 32

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.