Weekly GitHub Report for Tensorflow: March 31, 2025 - April 07, 2025 (12:10:19)
Weekly GitHub Report for Tensorflow
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
The current version of this repository is v2.19.0
1.2 Version Information:
The TensorFlow 2.19.0 release, created on March 5, 2025, introduces breaking changes to the LiteRT
C++ and Python APIs, including the transition of tf.lite.Interpreter
to a new location with a deprecation warning. Notable improvements include support for bfloat16
in the tfl.Cast
operation, and the discontinuation of publishing libtensorflow
packages, which can still be accessed via PyPI.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
TFLite Compilation help with External delegates: This issue involves a user seeking support for compiling TensorFlow Lite with external delegates on a Raspberry Pi 5 running Bookworm. The user encounters undefined reference errors when attempting to use specific functions related to loading external delegates and parsing settings, and they are requesting guidance on resolving these compilation issues.
- The comments suggest that the linker errors are due to the external delegate symbols not being compiled, and the user is advised to enable them with a specific flag. The user responds that the flag is enabled by default in their setup, and another user is asked for further assistance.
- Number of comments this week: 3
-
TensorFlow with CUDA: RTX 5xxx series isn't supported (CUDA_ERROR_INVALID_HANDLE): This issue is about a user experiencing compatibility problems with TensorFlow when using a new NVIDIA RTX 5070 Ti GPU, resulting in a CUDA_ERROR_INVALID_HANDLE error. The user is seeking advice on when a fix might be available or if there are any workarounds to resolve the issue, as they are unable to run their code with the new GPU.
- The user initially tried to build a custom TensorFlow version without success but found a partial solution through NVIDIA's documentation. They expressed frustration with the lack of updates and considered switching to PyTorch due to ongoing issues with running their code, especially with NVIDIA Docker, which complicates their workflow and prevents them from deserializing custom layers.
- Number of comments this week: 2
-
Build issues with local cuda installation: Target 'cuda_runtime' not declared in package 'cuda' defined in local_config_cuda/cuda/BUILD: This issue involves a build failure when compiling TensorFlow version 2.18 with a local CUDA installation on Ubuntu 22.04 using Bazel and Clang within an Nvidia Docker environment. The error indicates that the target 'cuda_runtime' is not declared in the package 'cuda', which is causing the build process to abort.
- The comments discuss whether 'cudart' should be used instead of 'cuda_runtime' and suggest that the issue might be due to incompatible versions of the installed software. A recommendation is made to update the installations according to the official TensorFlow documentation to resolve the issue.
- Number of comments this week: 2
-
NaN loss on multi-GPU MirroredStrategy since tf 2.16: This issue describes a bug in TensorFlow version 2.16 where models that train successfully on a single A100 GPU encounter NaN loss when using multiple GPUs with the MirroredStrategy, a problem not present in version 2.15. The issue affects multiple users and machines, hindering the ability to scale models effectively, and persists despite attempts to mitigate it by clipping gradients, lowering the learning rate, and ensuring no empty batches are created.
- The comments discuss potential causes such as changes in NCCL communication, mixed precision scaling, and weight synchronization in TensorFlow 2.16, and suggest workarounds like reverting to TensorFlow 2.15, using explicit FP32 mode, testing alternative reduction strategies, and enabling NCCL debugging.
- Number of comments this week: 2
-
TensorFlow issue with data generator used for training a Keras LSTM autoencoder: This issue involves a bug encountered while training a Keras LSTM autoencoder using TensorFlow 2.18.0 on a Linux Ubuntu 22.04.5 LTS platform. The user reports that the training process fails with an error message indicating "None values not supported," despite attempts to minimize batch size and data dimensions, and even when using random data generation to reproduce the error.
- A commenter acknowledged the issue and replicated it using TensorFlow 2.19.0 and the nightly version, suggesting that the problem might be related to Keras and recommending that the issue be posted on the Keras repository for further assistance.
- Number of comments this week: 1
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- TF-TRT Warning: Could not find TensorRT: This issue involves a user experiencing difficulties with TensorFlow on an Ubuntu 22.04 system, specifically encountering a warning that TensorRT could not be found despite multiple installation attempts. The user suspects the problem may be related to driver compatibility, as they are using an NVIDIA RTX 3050 TI GPU with a 535 driver instead of the automatically installed 550 driver, and they are seeking assistance to resolve this issue to focus on their machine learning coursework.
SystemError
intf.ensure_shape
andtf.compat.v1.ensure_shape
whendtype
ofshape
istf.uint64
and its value is too large.: This issue involves a bug in TensorFlow where usingtf.ensure_shape
ortf.compat.v1.ensure_shape
with ashape
ofdtype
tf.uint64
and a value close to 2^64 results in aSystemError
andOverflowError
. The problem is reproducible in TensorFlow version 2.15 on a Linux Ubuntu 20.04 system, and it occurs when the specified shape value is excessively large, as demonstrated by the exampleshape = tf.constant([18446743219011059112, 1], dtype=tf.uint64)
.- [DOCS] Missing complex input for Round op: This issue pertains to a documentation bug in TensorFlow's
Round
operation, where the official documentation inaccurately suggests that a complex tensor can be directly used as input for the operation. The user reports that attempting to use a complex tensor results in an error, and they must separately apply theRound
operation to the real and imaginary parts of the tensor to achieve the expected outcome. - tf.raw_ops.Unbatch aborts with "Check failed: d < dims()": This issue reports a bug in TensorFlow version 2.17 where the
tf.raw_ops.Unbatch
operation aborts unexpectedly with an error message indicating a failed check on tensor dimensions. The problem occurs on a Linux Ubuntu 20.04.3 LTS system using Python 3.11.8, and the error can be reproduced with TensorFlow Nightly, suggesting a persistent issue in the codebase. Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 15
Summarized Issues:
- TensorFlow GPU Compatibility Issues: Users are experiencing problems with TensorFlow not supporting new NVIDIA RTX 5xxx series GPUs, leading to CUDA errors. This lack of support is causing frustration and consideration of alternative frameworks like PyTorch.
- TensorFlow Build and Compilation Errors: Multiple issues have been reported regarding build failures and compilation errors in TensorFlow across different environments and configurations. These include problems with local CUDA installations, Bazel build errors, and compatibility issues with specific compilers and operating systems.
- TensorFlow Model Training Bugs: There are several bugs affecting model training in TensorFlow, including issues with LSTM autoencoders and RNNs. These bugs result in errors such as "None values not supported" and "IndexError: tuple index out of range," impacting the training process.
- TensorFlow Environment and Integration Issues: Users are encountering problems when integrating TensorFlow with other tools and environments, such as Streamlit and VS Code. These issues include TypeErrors and import problems, which may be related to specific configurations or extensions.
- TensorFlow Lite Compilation and Integration Challenges: Compiling TensorFlow Lite on specific hardware, like Raspberry Pi, and integrating it into projects using CMake or WebAssembly presents challenges. Users face linker errors and seek guidance on building TFLite as a static library for specific use cases.
- TensorFlow Syntax and Documentation Errors: Syntax errors and misleading documentation in TensorFlow projects are causing build problems and user confusion. These issues highlight the need for accurate documentation and error-free code to prevent build failures and incorrect results.
- TensorFlow Multi-GPU Training Issues: A bug in TensorFlow version 2.16 affects models trained on multiple GPUs, leading to infinite and NaN loss. This issue is suspected to be related to changes in gradient aggregation and weight synchronization, which were not present in earlier versions.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 6
Summarized Issues:
- Build Issues with TensorFlow on Various Platforms: Users have encountered multiple issues while building TensorFlow on different platforms. One issue involves a TypeError when building TensorFlow 2.19.0 from source on Ubuntu 20.04 due to a potential Clang version mismatch. Another issue involves build errors when attempting to compile TensorFlow Lite for WebAssembly using Emscripten and CMake, particularly with the XNNPACK library.
- Bugs in TensorFlow API and Runtime: Several bugs have been reported in TensorFlow's API and runtime. One bug in TensorFlow 2.18.0 involves the
tf.keras.layers.InputLayer
API not raising aValueError
as expected when givenNone
as the input shape. Another issue is an ImportError on Windows 11 with TensorFlow 2.8 due to a DLL load failure, although TensorFlow 2.8 is no longer supported.
- Feature Request for TensorFlow Configuration: A feature request was made to enhance the TensorFlow configuration script to accept empty environment variables without user prompts. This change would facilitate automated script execution, and it was noted that a trivial code patch could implement this, although the issue was resolved by a specific commit.
- Unspecified Bug in TensorFlow on Windows 11: A bug was reported in a TensorFlow project on Windows 11 with TensorFlow version 2 and Python 3.13.2. However, the issue lacks detailed information or code to reproduce the problem, as noted in the comments.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 9
Key Open Pull Requests
1. Fix compilation error due to overloads of cub::ThreadLoadVolatilePointer: This pull request addresses a compilation error in the TensorFlow project by applying a fix to the gpu_prim.h
file, specifically targeting the overloads of cub::ThreadLoadVolatilePointer
, which previously caused failures when compiling sparse_grad_op_gpu.cu.cc
with the clang compiler.
- URL: pull/90494
- Merged: No
2. Remove ambiguous inherited constructor in default_quant_params.cc: This pull request addresses an issue in the TensorFlow project by removing an ambiguous inherited constructor in the default_quant_params.cc
file, which was causing complaints from the GCC compiler, and provides a trivial and harmless fix to resolve the problem, as referenced in a Stack Overflow discussion and linked to issue #84977.
- URL: pull/90558
- Merged: No
3. feat: add datatype support for add, ceil, mul, range, sign, sub: This pull request introduces support for additional data types, including bf16, f16, i8, i16, and i32, across various TensorFlow Lite operations such as add, ceil, mul, range, sign, and sub, along with corresponding unit tests and non-quantized int8 and int16 type support, while also addressing a CONV error with the EIGEN_TFLITE flag.
- URL: pull/90351
- Merged: No
- Associated Commits: e0d36
Other Open Pull Requests
- oneDNN Library Upgrade: This topic covers the upgrade of the oneDNN library from version 3.5 to 3.7, skipping version 3.6.2 due to known issues. The upgrade addresses several bug fixes and ensures compatibility across various platforms such as cascade-lake, sapphire-rapids, and granite-rapids.
- MLIR TOSA Legalization and Test Fixes: This topic includes the legalization of the LOG operator for int8 and int16 data types from TensorFlow Lite to TOSA within the MLIR framework. It also addresses and resolves issues with failing lit tests in the MLIR TOSA component caused by recent updates to the TOSA LLVM.
- Build Failures and Compiler Issues: This topic addresses build failures in TensorFlow with newer CUDA versions and NVCC with Clang. The fixes involve adapting a solution from XLA and correcting the misidentification of the compiler in
gpu_device_functions.h
.
- Documentation Typographical Errors: This topic involves the correction of typographical errors in the documentation strings of the TensorFlow project. The corrections were identified and fixed by a contributor and are currently open for review and merging.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 5
Key Closed Pull Requests
1. Fix 03 broken links in object_detection.md: This pull request addresses the issue of three broken documentation links in the file object_detection.md
within the TensorFlow GitHub project by updating them to functional links, and it was successfully merged on April 3, 2025.
- URL: pull/88803
- Merged: 2025-04-03T02:50:34Z
- Associated Commits: 91bf4
2. Update 07 broken links in text_classification.md: This pull request addresses the issue of seven broken documentation links in the file text_classification.md
within the TensorFlow project by updating them to new functional LiteRT webpage links, and it was successfully merged on April 3, 2025.
- URL: pull/89671
- Merged: 2025-04-03T21:21:50Z
- Associated Commits: 7dc87
3. [TOSA] Fix legalizing CONV bias: This pull request addresses the unification and simplification of the logic for handling bias in various CONV operations within the TOSA framework by consolidating it into a single function, leveraging the TOSA 1.0 specification that allows bias to be of shape [1], and updating the tests to reflect these changes.
- URL: pull/90118
- Merged: 2025-04-01T09:56:36Z
- Associated Commits: 988a1
Other Closed Pull Requests
- Documentation Link Updates: This topic involves updating broken documentation links to ensure they point to the correct resources. The pull request specifically addresses four broken links in the
gpu_native.md
file, updating them to new LiteRT functional webpage links to maintain the accuracy and accessibility of the documentation.
- Python 3 Compatibility: This topic focuses on enhancing compatibility with Python 3 by modifying code that is outdated. The pull request replaces the
raw_input
function withinput
in theconfigure.py
file, ensuring the codebase is up-to-date with Python 3 standards.
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
Venkat6871 | 5 | 2 | 0 | 33 |
mihaimaruseac | 4 | 0 | 0 | 26 |
chunhsue | 13 | 2 | 0 | 4 |
maludwig | 2 | 0 | 1 | 16 |
weilhuan-quic | 4 | 0 | 0 | 14 |
sjh0849 | 0 | 0 | 9 | 8 |
jiunkaiy | 3 | 2 | 0 | 6 |
default1360 | 0 | 0 | 9 | 2 |
kossyrev-bg | 0 | 0 | 1 | 10 |
gaikwadrahul8 | 2 | 2 | 0 | 6 |