Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Kubernetes: July 28, 2025 - August 04, 2025 (12:04:04)

Weekly GitHub Report for Kubernetes

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

The current version of this repository is v1.32.3

1.2 Version Information:

The Kubernetes version released on March 11, 2025, introduces key updates detailed in the official CHANGELOG, with additional binary downloads available. For comprehensive information on new features and changes, users are encouraged to refer to the Kubernetes announce forum and the linked CHANGELOG.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. [Flaking Test] UT k8s.io/apiserver/pkg/storage: cacher: This issue reports flaky behavior in the unit test TestWatchStreamSeparation within the k8s.io/apiserver/pkg/storage/cacher package, which has been intermittently failing since late July 2025. The failures involve unexpected etcd bookmark check results and errors related to the embedded etcd server shutting down, and attempts to reproduce the flake locally using stress testing have yielded inconsistent results, complicating efforts to diagnose and fix the problem.

    • The comment discussion includes initial triage and assignment, clarification that the issue is unrelated to other test moves, attempts to reproduce the flake locally with stress tests showing mixed success, notes about the feature being deprecated but not yet removed, and ongoing investigation without a reliable reproduction or fix identified so far.
    • Number of comments this week: 11
  2. Failure TestSchedulerScheduleOne: This issue reports a failing test case named TestSchedulerScheduleOne, specifically the subtest related to a prebind failure with a status code error, where the expected nominated node name "node1" is not set correctly. The failure appears consistently across multiple CI runs and platforms, indicating a persistent problem in the scheduler's handling of pod nomination during the PreBind phase.

    • The discussion includes triage labels being applied, reassignment of priority, and attempts to clarify the failure's cause, with multiple users confirming the test failure across different environments. Suggestions involve investigating delayed informer updates affecting pod nomination and testing a proposed patch to address the issue.
    • Number of comments this week: 9
  3. [Flaking Test] UT k8s.io/kubernetes/pkg/kubelet: podcertificate: This issue reports a flaking unit test failure in the Kubernetes project specifically related to the TestFullFlow test in the podcertificate package. The failure involves a timeout error where the pod manager does not recognize a workload pod within the expected time, and the flake has been reproducible locally using a stress testing tool.

    • The comments show the issue being assigned and triaged promptly, with a team member taking responsibility and submitting a pull request. There is also a discussion about whether the fix should be included in the upcoming 1.34 milestone, and the testing SIG label was removed from the issue.
    • Number of comments this week: 8
  4. Failure cluster [f87f7691...] serial node jobs failing all over the place: This issue reports a widespread failure cluster affecting multiple serial node jobs in the Kubernetes project, primarily caused by the kubelet crashing due to conflicts in CPU manager checkpoint policies. The failures are linked to errors where the configured CPU management policy differs from the checkpoint state, requiring node draining and deletion of the CPU manager checkpoint file before kubelet restart, and these problems are causing cascading test failures across different test lanes.

    • The comments confirm the issue is critical and release-blocking, with multiple test failures showing related kubelet configuration errors. Contributors discuss specific failing tests and lanes, note the problem’s consistency with recent SIG-node meetings, and mention ongoing efforts to reproduce and fix the issue, including a linked pull request addressing the problem.
    • Number of comments this week: 8
  5. When there are two resource quotas under the same namespace, the used resource values of the two resource quotas are inconsistent.: This issue describes a problem where two resource quotas under the same namespace have inconsistent used resource values, causing deployment creation failures due to one quota reaching its limit prematurely while the other does not. The user suspects this inconsistency arises from update conflicts during quota status updates, leading to one quota being updated successfully while the other fails, ultimately blocking further deployments despite available capacity.

    • The comments include requests for additional information such as Kubernetes version and reproduction steps, which were provided by the reporter. Discussion clarifies that the issue may be resolved eventually by the resourcequota controller in the Kubernetes controller manager (KCM) through synchronization and recalculation, raising the question of whether this eventual consistency but intermediate inconsistency should be classified as a bug. No immediate plans to fix the issue were expressed, emphasizing that bug fixes depend on volunteer contributions.
    • Number of comments this week: 6

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. apimachinery resource.Quantity primitive values should be public for recursive hashing: This issue addresses the problem that the primitive values within the API Machinery resource.Quantity struct are private, which prevents recursive hashing libraries from accurately detecting changes in these quantities when hashing custom resource definitions (CRDs). The requester highlights the need for these values to be made public or accessible through a public interface to enable more effective change detection and caching strategies, particularly in use cases like resource allocation and spec drift detection in Kubernetes controllers.
  2. APF borrowing by exempt does not match KEP: This issue highlights a discrepancy between the Kubernetes Enhancement Proposal (KEP) and the actual implementation regarding how the exempt priority level calculates its borrowing from other priority levels in the API Priority and Fairness (APF) system. Specifically, the implementation does not apply the special borrowing rules for exempt priorities described in the KEP, resulting in the exempt priority level having a minimum concurrency limit of zero, which leads to inconsistencies in resource allocation behavior.
  3. apimachinery's unstructured converter panics if the destination struct contains private fields: This issue describes a panic occurring in the apimachinery's DefaultUnstructuredConverter when it attempts to convert an unstructured object into a destination struct that contains private (non-exported) fields. The reporter expects the converter to safely ignore these private fields instead of panicking, as this problem arises particularly with protobuf-generated gRPC structs that include private fields for internal state.
  4. Jsonpath impl does not support left match regex: This issue requests the addition of support for the =~ operator in jsonpath filter expressions, enabling regex matching based on Golang regular expressions within jsonpath queries. The feature is needed to simplify the process of locating specific resources among many by allowing case-insensitive and pattern-based filtering, and the reporter has expressed willingness to contribute an implementation. Since there were fewer than 5 open issues, all of the open issues have been listed above.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 29

Summarized Issues:

  • Flaky Unit Tests and Race Conditions: Several Kubernetes unit tests intermittently fail due to timing issues, race conditions, or unexpected errors, making them difficult to reproduce reliably even with stress testing. These flakes affect components like podcertificate, apiserver storage cacher, scheduler dynamicresources plugin, and Dynamic Resource Allocation node tests, impacting test stability and developer confidence.
  • issues/133247, issues/133273, issues/133302, issues/133304
  • Kubelet Event Reporting and Duplicate Events: The kubelet currently lacks comprehensive event reporting in some components like PLEG, and it also generates duplicate SystemOOM events due to the oom_watcher_linux reading kernel logs from the start after restarts. These issues cause incomplete or confusing event logs, complicating lifecycle analysis and cluster operator auditing.
  • issues/133258, issues/133260
  • Pod Lifecycle and Secret Sync Delays: After pod creation and node binding, the kubelet experiences authorization delays that prevent immediate syncing of its secret cache, causing temporary access denials and warning events. This results in a 1-2 second delay in pod creation, affecting pod startup latency.
  • issues/133269
  • Resource Quota Inconsistencies and Feature Requests: When multiple resource quotas with identical hard limits exist in the same namespace, concurrent modification errors cause inconsistent usage values and premature quota exhaustion. Additionally, there is a request to add quota-scope filtering by pod status to better account for resource usage in scenarios with many pending pods.
  • issues/133274, issues/133280
  • Scheduler Test Failures and Data Races: The Kubernetes scheduler faces test failures due to plugin errors, race conditions between informer and cache updates, and data races in integration tests involving concurrent memory access. These issues cause scheduling inconsistencies, flaky tests, and runtime panics, reducing scheduler reliability and test robustness.
  • issues/133297, issues/133305, issues/133346
  • Kubelet Crashes and CPU Manager Mismatches: Serial node jobs fail widely because the kubelet crashes when the configured CPU manager policy does not match the checkpointed state, requiring manual intervention like node draining and checkpoint file deletion before restart. This instability disrupts node operations and job execution.
  • issues/133314
  • Documentation and Code Cleanup Requests: There are requests to improve documentation for emulation version testing practices and to clean up kubelet code related to DRA Extended Resources, aiming to enhance code health and clarify testing guidelines without introducing functional changes.
  • issues/133289, issues/133295
  • Security and Resource Exhaustion Vulnerabilities: A resource exhaustion vulnerability exists in the Kubernetes REST client where an attacker with webhook deployment permissions can trigger a gzip bomb response, causing API Server memory exhaustion and repeated crashes, leading to denial-of-service conditions that affect cluster availability.
  • issues/133296
  • Build and CI Pipeline Flakes: The Kubernetes CI pipeline experiences flakes causing crashes during the go build process due to fatal errors related to broken CPU time accounting, impacting continuous integration stability.
  • issues/133284
  • Pod Termination and Lifecycle Hook Issues: Pods deleted forcibly with zero grace period do not immediately terminate containers due to the configured terminationGracePeriodSeconds not being overridden, and PodSecurity policy violations block container lifecycle hook tests in rootless environments. These issues cause unexpected pod lifecycle behavior and test failures.
  • issues/133332, issues/133330
  • Pod Name and Hostname Mismatches in Indexed Jobs: Jobs with .spec.completionMode set to Indexed generate pod names with random suffixes that do not match the pod's hostname, complicating log correlation since hostnames are used in log messages but do not directly map to pod names.
  • issues/133312
  • PersistentVolume Binding Latency: Creating large numbers of static PersistentVolumes and PersistentVolumeClaims in bursts results in high latency and slow binding performance, despite no external calls being needed. This is likely due to controller resync periods and single-worker concurrency limitations, affecting scalability.
  • issues/133352
  • Network Test Failures Due to Kernel Regression: Kubernetes network tests fail when handling large UDP requests over IPv6, likely caused by a kernel regression affecting UDPv6 segmentation and packet processing, leading to test failures in specific test grids and Kubernetes versions.
  • issues/133361
  • PodResources API Test Skips and Debugging Improvements: Multiple PodResources API end-to-end tests are skipped in kubelet serial test suites, prompting investigation and fixes. Additionally, improvements are requested for the end-to-end test debugging experience by updating documentation, removing GCE dependencies, and lowering barriers for local debugging.
  • issues/133326, issues/133328
  • Global Tooling and Dependency Management Updates: The project plans to replace the existing tools.go file with the Go 1.24 tool directive to better manage tool dependencies separately from the main module, ensuring no reversion to Go 1.23 and improving dependency handling.
  • issues/133316
  • Event Broadcaster and Controller Startup Issues: The default_servicecidr_controller logs misleading shutdown messages at startup due to deferred logging before goroutine spawning, and the event broadcaster is shut down prematurely, potentially preventing proper event broadcasting during controller operation.
  • issues/133306
  • Device Resource Allocator Enhancements: There is a request to add Shared Counter attributes in the Device Resource Allocator to differentiate devices and partitions more clearly, which is necessary for improving cluster autoscaler behavior and preventing over-scaling of dynamic resources.
  • issues/133362

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 19

Summarized Issues:

  • Test Failures in End-to-End and Integration Suites: Multiple issues report failures and flakiness in Kubernetes end-to-end and integration tests, including problems with probing container timeout overrides, Dynamic Resource Allocation health status, Downward API resource defaults, flaky authentication config reloads, proxy service requests, and container runtime blackbox tests due to expired credentials. These failures cause unexpected container restarts, timeouts, readiness failures, and intermittent test errors, impacting test reliability and requiring configuration adjustments or credential updates.
  • issues/132974, issues/133216, issues/133219, issues/133232, issues/133249, issues/133250, issues/133257, issues/133261, issues/133288
  • Scheduler Integration Test Timeouts and Failures: Several scheduler-related integration tests have been failing or timing out, including preemption hook failures, performance suite timeouts related to Dynamic Resource Allocation and extended resources, causing master-blocking job failures. These issues have been persistent since mid-2025 but some have been resolved by splitting tests or fixing the underlying problems.
  • issues/133291, issues/133292, issues/133329
  • Pod Lifecycle and Resource Management Issues: Problems with pod termination and resource management include pods stuck terminating due to missing network namespace paths after node reboot, Pod InPlace Resize feature failures when decreasing memory limits, and eviction test failures caused by unavailable pause container images. These issues lead to pods stuck in termination, non-viable resize operations, and image pull errors, affecting node stability and pod lifecycle management.
  • issues/133081, issues/133343, issues/133348
  • Go Compiler and Protobuf Related Failures: A recent Go language commit introduced a panic in Kubernetes unit tests due to an invalid gzip checksum in protobuf handling, linked to CPU architecture differences in CI environments. Additionally, a compilation failure in the admission limitranger package was caused by invalid assembly instructions from a Go compiler bug, which was later fixed upstream.
  • issues/133224, issues/133351
  • Resource Allocation Optimization: An issue proposes improving resource allocation management by adding a method to fetch allocated resource state of individual pods directly, optimizing retrieval without deep copying the entire pod info map. This aims to enhance efficiency in resource state queries.
  • issues/132975
  • Test Process and Placeholder Reports: A placeholder issue was created to verify the issue submission process, containing repetitive "Testing" text without describing any specific problem or feature request.
  • issues/133318

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 43

Key Open Pull Requests

1. kubelet: Don't ignore idsPerPod config: This pull request fixes a bug in the Kubernetes kubelet where the userNamespaces.idsPerPod configuration was previously ignored due to the user namespace manager being created before the kubelet configuration was set, by modifying the initialization to pass idsPerPod directly to the user namespace manager and adding related improvements and tests to ensure the configuration is properly honored.

  • URL: pull/133278
  • Merged: No
  • Associated Commits: aa0c2, 31a6f, d4ef3, 873ba

2. Fix conversion-gen handling of unexported fields and custom conversions of pointers: This pull request improves the conversion-gen tool by fixing its handling of unexported fields and enabling custom conversions for pointers, particularly to support protoc-generated types with unexpected state fields, as a prerequisite for a related Kubernetes enhancement.

  • URL: pull/133325
  • Merged: No
  • Associated Commits: 808f8, f154d, a4763, 66f7e

3. Pluginmanager: unregister plugin on a service socket file removal: This pull request enhances the plugin manager by adding functionality to unregister plugins when their service socket files are removed, improving cleanup and reliability for setups like CSI and DRA that use separate registration and service sockets.

  • URL: pull/133308
  • Merged: No
  • Associated Commits: 55314, 3c185, ba3d5

Other Open Pull Requests

  • Performance and correctness improvements in controller manager: This set of pull requests fixes performance issues related to the growSlice function and corrects a bug in DeepEqual comparisons, enhancing efficiency and reliability in managing daemon sets across large clusters. These changes address both speed and correctness in the Kubernetes controller manager's internal operations.
    • pull/133317
  • Informer and store interface updates: These pull requests update the embedded interface in TransformingStore to ensure proper implementation by DeltaFIFO and RealFIFO, improving interface enforcement and test accuracy related to informer memory utilization. This ensures that informer components conform correctly to expected interfaces.
    • pull/133263
  • Protobuf migration and cleanup: This pull request removes all references to the deprecated gogo protobuf implementation from protobindings scripts, cleaning up unused code paths after migrating to native protobuf in Kubernetes. This helps maintain a cleaner and more modern codebase.
    • pull/133271
  • Concurrency and data race fixes in metrics: This pull request fixes a data race in the Histogram.WithContext function by storing context in a wrapper to prevent in-place modification, resolving concurrent access issues during x509 authentication in the metrics package. This improves thread safety in component-base metrics.
    • pull/133307
  • Dependency management improvements: This pull request replaces the use of the tools.go file with the Go tool directive to manage dependencies more cleanly and avoid sharing them across the project. This change modernizes dependency handling in the Kubernetes codebase.
    • pull/133315
  • Work queue retry limit and event handling fixes: These pull requests add a retry limit to the ResourceClaim controller's work queue to prevent infinite requeuing caused by malformed keys and fix event handling by processing events queued before initialization completes. These changes improve error handling and event processing reliability.
    • pull/133246, pull/133248
  • Test stability and debugging enhancements: These pull requests fix flaky test cases related to sidecar containers, enable local debugging of end-to-end tests using Delve without GCE, and address flakiness in Pod Certificates integration tests by increasing wait times. These improvements enhance test reliability and developer debugging experience.
    • pull/133251, pull/133253, pull/133268
  • New testing tool for image pulling: This pull request adds a command to the agnhost tool that starts an HTTP server implementing a limited OCI API subset to serve static images, enabling end-to-end testing of image pulling from registries. This facilitates testing of image retrieval workflows.
    • pull/133272
  • Admission plugin and resource allocation bug fixes: These pull requests fix bugs in the service account admission plugin by waiting for cache sync and remove problematic logic reading AllocatedResources from PodStatus during admission, relying instead on the allocation manager. These changes improve admission correctness and resource allocation handling.
    • pull/133277, pull/133281
  • Test case conflict resolution and linting configuration: These pull requests comment out a conflicting device plugin test case and disable a staticcheck warning related to style preferences, allowing for more flexible testing and linting without enforcing global style rules.
    • pull/133286, pull/133287
  • Documentation update: This pull request updates the template flag description by replacing an outdated and insecure URL with a current and secure one, improving documentation accuracy and security.
    • pull/133301
  • Race condition fix in authorizer response: This pull request delays the authorizer response after context cancellation to prevent race conditions with Go's net/http transport context cancellation handling, addressing a known issue.
    • pull/133310
  • Cronjob controller bug fix: This pull request ensures the cronjob controller correctly checks job owner references during reconciliation, fixing unexpected job event warnings.
    • pull/133313
  • Tracing tests migration to OpenTelemetry: This pull request migrates tracing tests to use the new stable OpenTelemetry HTTP semantic conventions, aligning with updated otelhttp specifications.
    • pull/133319
  • CSR update for FIPS compliance: This pull request updates the etcd CSR stub data from a 1024-bit RSA certificate to a 2048-bit CSR, replacing non-FIPS approved ciphers with FIPS-compliant ones to ensure compatibility.
    • pull/133320
  • Dynamic resources test flakiness fixes: This pull request adds methods to handle objects not in cache, tolerates differences between expected claims and API objects, and waits for informer synchronization to reduce test flakiness.
    • pull/133321
  • Protobuf debugging helper script: This pull request adds a helper script to view differences in protobuf-serialized Kubernetes API objects, aiding debugging and investigation of issues.
    • pull/133322
  • kubectl auth reconcile robustness: This pull request improves kubectl auth reconcile by enabling retries on conflict errors caused by concurrent API object modifications, enhancing command reliability.
    • pull/133323
  • API compatibility update: This pull request replaces deprecated calls to WaitForServiceEndpointsNum with WaitForEndpointCount to maintain compatibility with current API standards.
    • pull/133324
  • KYAML feature gate default enablement: This pull request changes the KYAML feature gate to be enabled by default in kubectl, allowing users to use kubectl get -o kyaml without extra configuration while providing an option to disable it.
    • pull/133327

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 52

Key Closed Pull Requests

1. [PodLevelResources] handle pod-level resource manager alignment: This pull request implements a feature that disables CPU and memory manager alignment and hint generation for pods using pod-level resources by making these managers ignore such pods with a loud log, thereby preventing unsupported exclusive resource allocation and ensuring compatibility with pod-level resource specifications.

  • URL: pull/133279
  • Merged: 2025-07-30T00:28:33Z
  • Associated Commits: 7804b, 56727, 766d0, 15b1a, 4ca47, a3a76

2. POC: Add a thread safe store that requires no read locking: This pull request proposes adding a thread-safe store for the informer cache that eliminates the need for read locking, although it is not ready to be merged as it is currently used unconditionally.

  • URL: pull/133241
  • Merged: No
  • Associated Commits: f9681, 7ee14, 34b6f, 9db71

3. validation: Return an error if user namespaces are used with volumeDevices: This pull request introduces a validation that returns an error when user namespaces (hostUsers: false) are used in combination with volumeDevices in Kubernetes pods, preventing unsupported configurations as specified in KEP 127 for release 1.34, and also updates related test names and error message formatting for consistency.

  • URL: pull/132868
  • Merged: 2025-07-28T17:06:30Z
  • Associated Commits: 2fc22, 56d1c, 94e49

Other Closed Pull Requests

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
BenTheElder 6 5 3 88
ylink-lfs 43 14 5 37
pohly 31 4 13 24
macsko 9 8 1 52
dims 9 2 7 30
tallclair 16 3 4 22
PatrickLaabs 30 2 2 10
liggitt 16 5 0 21
stlaz 6 4 0 26
aojea 4 3 2 26

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.