Weekly GitHub Report for Pytorch: April 21, 2025 - April 28, 2025 (12:01:41)
Weekly GitHub Report for Pytorch
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
The current version of this repository is v2.6.0
1.2 Version Information:
Released on January 29, 2025, PyTorch 2.6 introduces significant updates including support for torch.compile
with Python 3.13, a new performance-related feature torch.compiler.set_stance
, and FP16 support on X86 CPUs. Notably, the release marks a shift away from publishing on Conda, with a focus on using official wheel packages, and introduces a backward-incompatible change by setting weights_only=True
as the default for torch.load
, enhancing security.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
construcing DTensor on a 2D device mesh SIGTERMs: This issue involves a bug encountered while constructing a DTensor on a 2D device mesh using PyTorch, which results in SIGTERM errors during execution. The problem arises specifically when running a script on a multi-GPU setup, where the last example in a tutorial fails due to segmentation faults, potentially linked to the interaction between
init_process_group
andinit_device_mesh
.- The comments discuss attempts to reproduce the error, with some users unable to replicate it locally. Suggestions include checking CUDA device settings and using
torch.cuda.set_device
. The issue seems related to device initialization, with a proposed solution to avoid using bothinit_process_group
andinit_device_mesh
simultaneously. Debugging efforts include using gdb to capture stack traces, and there is a consensus that the problem might be due to a conflict in device handling between the two initialization methods. - Number of comments this week: 19
- The comments discuss attempts to reproduce the error, with some users unable to replicate it locally. Suggestions include checking CUDA device settings and using
-
RFC: The State of Custom CUDA extensions in PyTorch: This issue discusses the current state and challenges of implementing custom CUDA extensions in PyTorch, highlighting the trade-offs between different methods such as
torch.utils.cpp_extension.load_inline()
and the proposedtorch.cuda._compile_kernel()
, which aims to significantly reduce compilation times by leveragingnvrtc
. The document also explores potential improvements and integrations with other PyTorch components, such as AOTInductor, and considers the future direction of CUDA kernel development in the context of emerging Pythonic DSLs for GPU programming.- The comments discuss the need for higher-level APIs to improve usability, the potential integration of
torch.cuda._compile_kernel
withtorch.compile
and AOTInductor, and the importance of supporting complex data structures and headers for performance. There is also a focus on leveragingcuda-python
for better access to device libraries and improving tensor marshaling, with examples provided to illustrate potential implementations. - Number of comments this week: 9
- The comments discuss the need for higher-level APIs to improve usability, the potential integration of
-
Inconsistent
sum
/dot
/norm
behavior: This issue highlights the inconsistent behavior of thesum
,dot
, andnorm
functions in PyTorch when dealing with largefloat32
arrays, wheretorch.sum
is noted for its precision, whiletorch.linalg.norm
is slower and less accurate, especially when using multiple CPU threads. The user is seeking clarification on how to achieve consistent results across different PyTorch versions and whether there is a way to normalize these discrepancies.- The comments discuss the expected numerical inaccuracies due to the precision limits of
float32
, the impact of operations like square root and squaring on precision, and the potential reasons for the observed discrepancies, such as rounding errors and the use of different algorithms like Kahan summation. There is also a mention of the lack of documentation on these behaviors and a suggestion to generalize existing tooling to improve consistency, with an invitation for contributions to the codebase. - Number of comments this week: 8
- The comments discuss the expected numerical inaccuracies due to the precision limits of
-
[rfc][c10d] RDMA APIs (read/write, rkey): This issue discusses the need for RDMA-like APIs in PyTorch that allow communication without requiring both sides to initiate the process, which is beneficial for distributed inference, checkpointing, and other advanced use cases. The proposal includes two design options: using the ProcessGroup API for manual tensor operations and creating a "Ghost" Tensor subclass for more abstracted tensor registration and exchange.
- The comments explore the implications of the proposed designs, debating serialization and implicit transfers, and suggest alternatives like a collective handle creation. They discuss the value of using side channels for handle transmission, compare RMA and PGAS models, and address technical details like file descriptor handling in symmetric memory.
- Number of comments this week: 7
-
Expanding subset of tensor reads wrong memory: This issue describes a bug in PyTorch where expanding a subset of a tensor on a CUDA device results in incorrect memory reads, producing unexpected output when the function is compiled with
torch.compile()
. The problem does not occur when using a CPU, not compiling the function, or when cloning the tensor elements, and it is influenced by the use of different data types and thedynamic
parameter.- The comments discuss attempts to reproduce the issue with different configurations, noting that using
dynamic=True
affects the output, and highlight the importance of using different values forn
to trigger the bug. There is also a discussion on how the output varies with different data types, and a suggestion to clone tensor elements to achieve the expected results. - Number of comments this week: 6
- The comments discuss attempts to reproduce the issue with different configurations, noting that using
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler': This issue involves an ImportError encountered when attempting to import 'triton_key' from 'triton.compiler.compiler', which is causing a backend compiler failure in a PyTorch environment using the 'inductor' backend. The problem arises during the execution of a Python script that utilizes the OotdPipeline and attempts to compile certain components with Torch's compile function, specifically affecting users working with PyTorch version 2.4.0.dev20240330+cu121 on an Ubuntu 22.04.3 LTS system with CUDA 12.1.
- Alternate algorithm for computing MaxPool2D under specific condition.: This issue proposes an alternative algorithm for computing MaxPool2D in PyTorch when the stride is equal to 1, suggesting that a kernel size of 5 can be represented by two MaxPool2D operations with a kernel size of 3, and similarly for other sizes, to reduce computational cost on the CPU. The approach aims to optimize performance by decreasing the computation for each cell, and testing has shown a speedup of approximately 1.293 times compared to the traditional method.
- cuda_utils.so: failed to map segment from shared object: This issue involves a bug encountered when running a PyTorch model within a Docker container, where the execution of a cached
cuda_utils.so
file in the/tmp
directory fails due to a missing execution permission, despite the directory having the correct permissions. The error occurs specifically when using a tmpfs with a permission setting of1777
, and the problem persists even when the script is executed with root privileges, which should theoretically allow full execution rights. - Enable UFMT on all files in PyTorch: This issue addresses the need to apply uniform formatting (UFMT) to approximately 1,500 files in the PyTorch codebase that are currently exempt from this formatting standard. The process involves removing file names from the
exclude_patterns
in theUFMT
section of the.lintrunner.toml
file and running a specific command to ensure all files adhere to the desired formatting, with additional preparatory work required to resolve known issues in certain files before applying the UFMT changes. - [JIT archive] Add a flag to not include debug files: This issue proposes the addition of a flag to the
torch.jit.save()
function in PyTorch to exclude.debug_pkl
files, which are primarily used for debugging purposes and can significantly increase the file size of TorchScript models compared to ONNX models. The motivation behind this feature request is to reduce the size of JIT archives, particularly for small models with quantization, to facilitate more efficient deployment on mobile devices by eliminating unnecessary debug files that do not affect the model's functionality.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 110
Summarized Issues:
- Inconsistent behavior in PyTorch functions: This issue highlights the inconsistent behavior and precision discrepancies in PyTorch's
sum
,dot
, andnorm
functions when handling largefloat32
arrays. It notes thattorch.sum
is the most precise, whiletorch.linalg.norm
is slower and less accurate, and seeks guidance on achieving consistent results across different versions and hardware configurations.
- ONNX export failures and feature requests: Several issues discuss failures and feature requests related to ONNX export in PyTorch. These include the failure of ONNX export with
enabled_gqa
andscaled_dot_product_attention
, the need for a decomposition for thesearchsorted
function, and the unsupported operator 'aten::lift_fresh' in opset version 17.
- Bugs in PyTorch's tensor operations and compilation: Various issues report bugs in PyTorch's tensor operations and compilation processes. These include a bug with the
.t()
method on a tensor subclass, inefficiencies in__torch_function__
handling, and incorrect memory reads on CUDA devices.
- Feature requests for PyTorch functions: There are requests for new features in PyTorch functions, such as an option to disable gradient caching in
torch.func.jvp
and enhancements to thetorch.distributed.tensor.debug.visualize_sharding
function.
- Bugs and inefficiencies in PyTorch's export and deserialization: Issues highlight bugs in PyTorch's export and deserialization processes, such as incorrect clamping of ShapeEnv range information and the lack of support for certain operations on the MPS backend.
- Performance and profiling issues in PyTorch: Several issues report performance and profiling problems in PyTorch, including inefficiencies in printing SymPy expressions, unexpected profiling results, and slow
torch.bmm
operations with BF16 tensors.
- Bugs in PyTorch's distributed and parallel processing: Issues describe bugs in PyTorch's distributed and parallel processing, such as errors with DTensor on a 2D device mesh and unexpected profiling results with
torch.add
.
- Documentation and implementation discrepancies in PyTorch: Several issues highlight discrepancies between PyTorch's documentation and implementation, such as the
CosineAnnealingLR
scheduler formula and thetorch.bernoulli()
function signature.
- Bugs in PyTorch's autograd and gradient computation: Issues report bugs in PyTorch's autograd and gradient computation, such as memory leaks during backward passes and incorrect gradient results with
torch.log1p
.
- Bugs in PyTorch's compilation and execution: Various issues describe bugs in PyTorch's compilation and execution, such as segmentation faults with
torch.fliplr
and incorrect iterator behavior intorch.compile
.
- Bugs in PyTorch's memory management and usage: Issues highlight problems with PyTorch's memory management, such as unexpected peak memory usage with FSDP and out-of-memory errors during training.
- Bugs in PyTorch's sharding and distributed operations: Issues report bugs in PyTorch's sharding and distributed operations, such as the lack of a sharding strategy for
aten.masked_fill_.Scalar
and inefficiencies in NCCL backend operations.
- Bugs in PyTorch's function implementations: Various issues describe bugs in PyTorch's function implementations, such as incorrect behavior with
torch.flipud
and runtime errors withtorch.compile
.
- Bugs in PyTorch's testing and CI processes: Issues highlight bugs in PyTorch's testing and CI processes, such as segmentation faults during builds and disabled tests on ROCm platforms.
- Bugs in PyTorch's attention and sharding mechanisms: Issues report bugs in PyTorch's attention and sharding mechanisms, such as discrepancies in attention module outputs and the lack of support for scaled dot product attention.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 65
Summarized Issues:
- Internal Compiler Errors and Segmentation Faults: This topic covers issues related to internal compiler errors and segmentation faults encountered during PyTorch compilation or execution. These errors often arise due to incompatible compiler versions or incorrect environment configurations, leading to unexpected crashes or runtime errors.
- Documentation and Usability Concerns: Several issues highlight the need for improved documentation and usability in PyTorch. These include clarifying function arguments, ensuring accurate descriptions of supported features, and addressing discrepancies that lead to user confusion or errors during implementation.
- Performance and Optimization Issues: Performance regressions and optimization challenges are common in PyTorch, affecting operations like Conv2D and tensor conversions. These issues often require detailed profiling and adjustments to achieve desired efficiency across different hardware platforms.
- Export and Compatibility Problems: Exporting models to ONNX or other formats can encounter compatibility issues, particularly with unsupported operators or dynamic dimensions. These problems necessitate updates to the export process or additional support for specific operations to ensure successful model deployment.
- Test Failures and Disabling: Certain tests in the PyTorch project are disabled due to failures on specific platforms, such as ROCm, indicating underlying compatibility or configuration issues. These failures require investigation and resolution to ensure robust testing across all supported environments.
- pytorch/pytorch/issues/151078, pytorch/pytorch/issues/151081, pytorch/pytorch/issues/151082, pytorch/pytorch/issues/151083, pytorch/pytorch/issues/151084, pytorch/pytorch/issues/151085, pytorch/pytorch/issues/151086, pytorch/pytorch/issues/151087, pytorch/pytorch/issues/151088, pytorch/pytorch/issues/151089, pytorch/pytorch/issues/151090
- Bugs in PyTorch Operations: Various bugs in PyTorch operations, such as incorrect results or runtime errors, are reported across different functions and backends. These issues often require fixes in the underlying implementation to ensure correct and reliable behavior.
- pytorch/pytorch/issues/150674, pytorch/pytorch/issues/150776, pytorch/pytorch/issues/150851, pytorch/pytorch/issues/150853, pytorch/pytorch/issues/151522, pytorch/pytorch/issues/151523, pytorch/pytorch/issues/151589, pytorch/pytorch/issues/151610, pytorch/pytorch/issues/151735, pytorch/pytorch/issues/152205
- Security and Vulnerability Concerns: Security vulnerabilities, such as potential remote code execution, are critical issues that require prompt attention and fixes. These vulnerabilities highlight the importance of secure coding practices and thorough testing to prevent exploitation.
- Dependency and Installation Issues: Conflicts and errors during installation, often due to dependency mismatches or incorrect configurations, can hinder the setup of PyTorch environments. Resolving these issues typically involves updating or aligning package versions to ensure compatibility.
- GPU and Hardware Compatibility: Compatibility issues with specific GPUs or hardware configurations can lead to runtime errors or performance bottlenecks. These issues often require updates to software or drivers to ensure proper support for the latest hardware capabilities.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 192
Key Open Pull Requests
1. Add scripts to check xrefs and urls: This pull request introduces scripts designed to traverse the documentation and code within the PyTorch project to identify and address any broken cross-references and URLs, as evidenced by multiple updates and renaming of scripts such as check_xrefs.sh
to lint_xrefs.sh
and check_urls.sh
to lint_urls.sh
, along with modifications to the _docs.yml
file.
- URL: pull/151844
- Merged: No
- Associated Commits: 1d738, dcd6e, 72d59, 9605a, 2a07f, a37df, 150c5, 48c4e, ee06f, 83ed6, 8037b, e963c, 0a041, 90093, a0cbb, 8f72f, 8ad07, 55b02, 4edf5
2. [WIP] Deprecate AcceleratorHooksInterface isPinnedPtr, use at::getHostAllocator()->is_pinned instead: This pull request aims to deprecate the AcceleratorHooksInterface
's isPinnedPtr
method in favor of using at::getHostAllocator()->is_pinned
within the PyTorch project, as part of a series of changes tracked by the ghstack tool.
- URL: pull/151916
- Merged: No
- Associated Commits: 37774, 87fb0, 90a69, 91261, 3b679, 014ed, 1d73c, 0b19f, 51e74, b3260, 03ba4, 82484, b23d0, da02e, a7c30, da742, 0623b, 7217f, e8058
3. [Graph Partition] Pass all cudagraph tree tests: This pull request addresses the issue of passing all cudagraph tree tests in the PyTorch project by implementing various fixes and updates, such as correcting test input and output orders, enabling certain features by default, and supporting additional functionalities like the ForeachKernelSchedulerNode, as evidenced by multiple commits aimed at refining the graph partitioning process.
- URL: pull/152048
- Merged: No
- Associated Commits: 81a80, ed63b, 05880, f6758, 7193e, ff4ae, 0b311, 75b71, 69415, b93f3, 81a39, 35470, 74624
Other Open Pull Requests
- Metal Performance Shaders (MPS) Support in PyTorch: This pull request enhances the PyTorch library by adding support for Metal Performance Shaders (MPS) to the
at::getHostAllocator
API. It is designed to facilitate writing device-agnostic code, although certain functionalities likerecord_event
,get_stats
,reset_accumulated_stats
, andreset_peak_stats
are not yet supported for MPS.
- AOTAutogradCache Enhancements: This pull request addresses an issue with the AOTAutogradCache in PyTorch by saving the
bw_module
in the cache after removing unserializable metadata. It ensures that both the lowered backward and thebw_module
are cached to support runs with and without compiled autograd, while also differentiating cached and non-cached versions to prevent crashes during AOT compilation with a restoredbw_module
.
- Visualization and Sharding in PyTorch: This pull request adds enhanced support for rich visualization to the
torch.distributed.tensor.debug.visualize_sharding
function in PyTorch. It addresses issue #151857 and includes updates to adapt the functionality for execution on systems with at least four GPUs.
- Export Functionality Enhancements: This pull request aims to enhance the export functionality by supporting the export of hops with function schema arguments. It makes the function schema proxyable to trace auto-functionalized hops and simplifies the implementation using
pytree.register_constant
, with plans to add support for serialization and deserialization (serde) in future updates.
- Windows Arm64 Runners Integration: This pull request aims to integrate new Windows Arm64 runners into the PyTorch project by utilizing pre-installed Visual Studio. It removes unnecessary installations, enables long paths, and updates the action configuration to accommodate these changes.
- ROCm Backend Updates: This pull request aims to reland changes from a previous pull request by removing all "MasqueradingAsCUDA" files and classes from the hipify process in the ROCm backend of the PyTorch project. It also updates the hipify version to 2.0.0, reverts certain changes, and addresses deprecation warnings.
- Unimplemented Function Replacement: This pull request involves replacing the
unimplemented
function withunimplemented_v2
in thetorch/_dynamo/variables/nn_module.py
file as part of a larger task (#147913). It includes multiple commits with contributions from William Wen, addressing various updates and improvements to the code.
- Group GEMM Template and Epilogue Fusion: This pull request addresses enhancements in the PyTorch project by enabling Group GEMM Template and Epilogue Fusion. It removes redundant buffers post-fusion and supports Linear Silu Mul fusion when concatenated linear operations are enabled, as well as enabling horizontal transverse operations.
- Gather Operation Data Type Support: This pull request aims to enhance the PyTorch library by supporting additional data types for both input and indices in the gather operation. It is part of a series of changes tracked through the ghstack tool and involves multiple contributors and reviewers from the community.
- Unimplemented Function Replacement in Lists: This pull request involves replacing the
unimplemented
function withunimplemented_v2
in thetorch/_dynamo/variables/lists.py
file as part of issue #147913. It includes multiple commits, some of which are co-authored by William Wen.
- Docstring Linter Exemption: This pull request aims to address issue #151692 by exempting overriding methods from the docstring linter in the PyTorch project. It is part of a series of changes managed through the ghstack tool.
- ConvTranspose*d Padding Option: This pull request introduces the
padding="same"
option to theConvTranspose*d
andconv_transpose*d
functions in the PyTorch library. It ensures compatibility with existing convolution layers and includes updates such as feature implementation, testing, documentation, and handling of padding discrepancies.
- Dynamic Shape Error Readability: This pull request introduces the
is_exporting()
function and the_is_dynamo_exporting()
function to aid in determining when to utilize stack traces for enhancing the readability of dynamic shape errors. It is part of upcoming work in the PyTorch project.
- Hiding Unused Scalar Integer Sizes: This pull request aims to hide unused scalar integer sizes from Dynamo in the PyTorch project. It is part of a stack of related changes and addresses issues #113129 and #146168 while involving multiple contributors.
- Dynamic Annotations on Tensors: This pull request aims to enhance the PyTorch project by adding support for dynamic annotations on tensors within ListVariables and TupleVariables. It addresses a specific issue and works in conjunction with another related pull request to resolve a previously identified problem.
- CUDA Script Unification: This pull request aims to unify the
install_cuda
andinstall_cuda_aarch64
scripts by generalizing theinstall_cuda
script to handle both standard and aarch64 architectures. It eliminates the need for a separateinstall_cuda_aarch64
script and consolidates common code intoinstall_cuda
andinstall_cudnn
functions.
- Dynamo Configuration in HOPify: This pull request aims to integrate a dynamo configuration into the HOPify context managers within the PyTorch project. It is part of a stack of changes and includes multiple updates with a focus on adding tests, as indicated by the detailed comments and the involvement of several contributors.
- Embedding Function Index Handling: This pull request addresses issue #151918 by modifying the
aten.embedding
function to prevent it from wrapping negative indices. It includes several commits that add and improve functionalities related to index handling and embedding operations in the PyTorch project.
- Global Tensor Shape Calculation: This pull request introduces a utility function named
compute_global_tensor_shape
for a 1D device mesh. It calculates the global tensor shape by gathering local tensor shapes from shards and constructing the global shape based on the placement type, currently supporting only "Shard" and "Replicate" placements.
- AC_TRACER Infrastructure Mode: This pull request introduces a new infrastructure mode called AC_TRACER in the PyTorch project. It is designed to trace and replay computational graphs during backward passes by prioritizing it above other infra modes, ensuring efficient recomputation without needing to reenable ambient modes or execute user logic in the same manner.
- TritonTemplate Function Refactor: This pull request refactors the
TritonTemplate.generate
function by moving the code generation logic to a newgenerate_and_load
function. It is part of a split from a larger pull request and includes updates to typing.
- SymmMem Module Optimization: This pull request introduces a work-in-progress feature called "all_to_all_vdev" in the SymmMem module. It aims to optimize tensor operations by merging input and output splits into a single tensor, utilizing multi-block processing, and employing nvshmemx_collective_launch for efficient communication.
- Dim.AUTO Warning Mechanism: This pull request addresses issue #151582 by implementing a warning mechanism in the PyTorch library to alert users when dimensions specified with
Dim.AUTO
are likely to specialize to 0 or 1. Changes are made in thedynamic_shapes.py
file.
- XPU Documentation Update: This pull request aims to update the "get start xpu" documentation by revising links, updating product names, and adding a print statement to display the result of
torch.xpu.is_available()
in the code snippet. It removes references to Windows 10 while ensuring compatibility with both Linux and Windows binaries.
- Graph Partitioning Optimization: This pull request introduces an optimal reordering strategy using a breadth-first search (BFS) approach to minimize the number of partitions in a graph. It schedules nodes based on their indegree and categorizes them into cudagraphable and non-cudagraphable queues, considering peak memory usage for efficient execution.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 235
Key Closed Pull Requests
1. Generate test reports for pytest when option is given: This pull request introduces a feature to generate test reports for pytest when a specific option is provided, by appending an argument to enable test report generation, and suggests checking the TEST_SAVE_XML
environment variable instead of IS_CI
to conditionally enable test reports, aligning with practices in other parts of the codebase.
- URL: pull/152167
- Merged: No
- Associated Commits: f20a2, 1b267, 7ffa9, 88b05, b0f26, a6e46, 6e7b6, b74be, 02dd0, 56d31, bd77c, 97d97, fc7d4, 704a5, 313ce, cac8d, e4818, 359e1, adf5f, cfc4d, 843e4, 6261d, 414ce, 2673e, f6c1c, 92d0c, 8e5fe, 92bae, 1e1d0, 68f74, 483e6, ed511, 9b74e, c4482, 48761, a40e8, 6b45b, c3a72, 47013, fc2dd, 9c2ac, 8eb21, f7ddc, 2a9af, bf28d, 2eacd, 515a0, 33808, 93740, cea43, 28799, 67c28, 0f861, e2b1c, f37e1, fd04c, d7914, 96800, b7c70, 2fb13, 1a6ef, 191b0, 02cec, 1f0d7, 35201, 25a11, c312d, cd131, 6ea2e, 79a94, efdcc, 01f1c, 4d78e, 99aee, a35e7, b3b16, 80a38, a02ea, 40cf4, a4fda, 14e3f, b7a77, edba2, 529f6, 29811, 6f327, 95abc, 0ff30, e76c0, 4a643, 28388, a09a3, dfdf7, 73d95, ccd00, 3aeeb, 159e2, d778c, c729f, ed0d2, f072b, 45049, 3804a, 5d316, 5fc1e, 06a3c, 264e8, 2c275, 834a0, 4bf09, fa0f1, 98206, fbd29, 337ca, 69ee6, 6cd17, 8ca79, d0d4e, 0bb9b, 7e4b8, 59629, a48cc, 3380a, 2f74c, aaf71, 459c6, bc6c0, 83541, 017a6, e05ac, b8f4d, 3aecf, 2f851, c0b70, 43de9, 6a1b8, a7ccd, f4ac9, c9834, 4f8ad, cd576, aa617, 334aa, 72f71, 49b7f, 015b5, 68a75, 74074, 13339, cd021, 25305, 78bbb, f9bdf, cc793, b37fa, 54f73, b247e, 097fa, 7c977, ee81f, 62b56, 5b9df, 6d28d, b32b0, 21b0e, 73100, dcc32, e31e2, 05114, a5602, 9422e, 2ab75, 34827, 9344d, 5f637, aa285, 3c1a1, 5acc3, 69e41, 99ae7, c1f51, 98c53, 56232, dccb7, bd191, 47ad3, fd3d3, 4d2d8, 81723, f2cfe, 2455d, 4e1d4, f39a1, c91ac, 4ac2e, d703f, 2ee8d, e2cf6, 43f1b, 5de92, fabbc, 05597, 89a85, b2372, 68454, 76cc3, 2a58d, 2102b, a3898, 5e9bd, 2ea86, 78953, 5b368, 5e320, 3278d, 41285, 1d73b, d743a, 3a170, 56e67, 0eb55, 9c1bc, 402d1, 03970, 81c43, ff075, b11c9, b1d05, 6efc5, 24bda, 92f12, bd09d, dccc4, 8a9c6, d78d2, 6ced5, 04133, fc6e3, 2089b, d7049, 75c71, 8313b, 7f28c, 1a6d5, e2c7a, dda0c, a936d, 6120c
2. [rocm6.4_internal_testing] Dockerfile swap: This pull request involves swapping the contents of the CentOS Stream Dockerfile into the main Dockerfile for the ROCm 6.4 internal testing environment, as part of a series of updates and optimizations to support various builds and tests, including CentOS Stream 9 and Ubuntu 24.04, while addressing specific issues and enhancing compatibility with PyTorch and ROCm features.
- URL: pull/151927
- Merged: No
- Associated Commits: 51ce1, b966e, 0e96f, ec70f, e85cf, a4b50, 4ce57, d4fbf, 090d9, 2b0f3, 7d339, d6879, 6d5c3, a6d96, 4a42d, 8c393, d7265, 6f76e, 046a0, fc66f, c8ae4, dfcad, 17c65, e3810, 956c1, 7a198, 74e1e, 0ce9f, 99b07, e24ef, 3d0ad, 59e14, d3c94, 52172, d4d0b, 15a21, 432b2, f7ad5, 46868, 2e486, 8b86e, 057e9, 80b4c, 26585, d5d9d, e9712, 398bd, 896c7, 14c14, 6e971, 0b21f, 22512, fb4d1, b273a, 0e216, a7d21, 9f390, cfb67, e8629, d9668, f24c4, 93895, fddb7, c873a, 9c50b, 79c54, 5f50c, bc969, 8190c, fc899, 402de, ef297, a8d55, e03df
3. [WIP][CUDA][cuBLAS][cuBLASLt] Opt-in unified cuBLAS + cuBLASLt workspaces: This pull request introduces an opt-in feature for unified workspaces between cuBLAS and cuBLASLt in PyTorch, addressing a previously reported 70% forward issue by allowing users to enable the feature with the TORCH_CUBLASLT_UNIFIED_WORKSPACE=1
environment variable, and includes multiple commits for updates and fixes in the CUDABlas.cpp file and other related components.
- URL: pull/151163
- Merged: No
- Associated Commits: ad7e3, 34e4c, 2afd7, 6af3d, cebef, 2a746, 296e6, e097f, 38f37, cf18f, bcce8, ed4cb, 1c107, 63eea, 400ea, 05ab7, 666bc
Other Closed Pull Requests
- Device Tests in FlexAttention Module: This pull request focuses on fixing the instantiation of device tests in the FlexAttention module of the PyTorch project. It includes a series of updates and commits aimed at addressing the issue effectively.
- Docker Image Testing Workflow: This pull request modifies the workflow for testing binary Docker images by fetching correct tags from AWS ECR and implementing reusable actions. It also addresses access issues for certain architectures and ensures Docker images are rebuilt when relevant scripts change.
- Torch.Event Function Signature and Documentation: Two pull requests address the
torch.Event
function signature and its documentation. One focuses on the discrepancy in theenable_timing
parameter, while the other enhances documentation by adding detailed signatures and correcting display issues.
- Caching Mechanism for Fake Tensors: Multiple pull requests aim to enhance the PyTorch project by implementing caching mechanisms for fake tensors. These changes are part of a series tracked through the ghstack tool, focusing on scenarios with None, integer, and symbolic integers in the output.
- Dynamic Shapes and Built-in Operations: Two pull requests focus on enhancing dynamic shapes and handling built-in operations in PyTorch. One uses the
bound_sympy
library for size-oblivious reasoning, while the other reestablishes infrastructure for operations likemin
,max
, andmath.pow
.
- MPSInductor and GPU Test Enhancements: Two pull requests address issues in MPSInductor and GPU tests. One implements the
atomic_add
store mode, resolving several GPU test issues, while the other enables a specific test by modifying data types.
- Deprecation of Host Allocator Legacy API: This pull request aims to deprecate the host allocator legacy API in favor of a unified API,
getHostAllocator(device_type)
. It streamlines memory allocation processes and improves user experience by providing a consistent interface.
- Enhancements in Guard Checking and Static Value Functions: Two pull requests focus on enhancing guard checking logic and introducing static value functions. One refactors guard checking logic, while the other introduces
has_static_value
to determine static boolean, float, or integer values.
- Enhancements in Dynamic Sources Allowlist and Testing Utilities: Two pull requests enhance the dynamic sources allowlist and relocate operation modifiers to testing utilities. These changes facilitate capturing allowlist changes over time and reusing operation modifiers in other tests.
- Token::text_view() Method Introduction: This pull request introduces a new method,
Token::text_view()
, which returns astring_view
instead of astring
. It aims to avoid potential lifetime issues in existing code and includes multiple updates and rebases.
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
-
- Toxicity Score: 0.55 (Escalating frustration, defensive responses, unresolved tension.)
- This GitHub conversation involves username1 expressing dissatisfaction with the progress of a pull request, while username2 responds with a defensive tone. The conversation escalates as username1 continues to express frustration, leading to a tense exchange. Username3 attempts to mediate by suggesting a compromise, but the initial tension remains unresolved.
-
Update description for torch.random.fork_rng
- Toxicity Score: 0.55 (Defensive responses, critical feedback, escalating tension.)
- This GitHub conversation involves username1 proposing an update, with username2 providing feedback that is initially neutral but becomes increasingly critical. Username1 responds defensively, leading to a tense exchange. The tone shifts from collaborative to confrontational, with both parties expressing frustration.
-
- Toxicity Score: 0.55 (Defensive responses, unresolved tension, mediation attempts.)
- This GitHub conversation involves username1 expressing dissatisfaction with username2's approach, which is perceived as ineffective. Username2 responds defensively, leading to a tense exchange. Username3 attempts to mediate, but the conversation remains strained, with underlying frustration evident.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
malfet | 192 | 24 | 6 | 145 |
FFFrog | 140 | 10 | 0 | 8 |
anijain2305 | 122 | 17 | 1 | 9 |
mlazos | 131 | 13 | 0 | 5 |
pianpwk | 100 | 24 | 2 | 21 |
swolchok | 108 | 17 | 0 | 16 |
guangyey | 103 | 8 | 0 | 18 |
justinchuby | 52 | 4 | 6 | 66 |
laithsakka | 66 | 19 | 5 | 29 |
guilhermeleobas | 100 | 13 | 1 | 1 |