Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Kubernetes: September 08, 2025 - September 15, 2025 (12:06:20)

Weekly GitHub Report for Kubernetes

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

The current version of this repository is v1.32.3

1.2 Version Information:

The Kubernetes 1.32 release, announced on March 11, 2025, introduces several key updates and improvements detailed in the official CHANGELOG, with additional binary downloads available. This version continues to enhance Kubernetes' functionality and stability, reflecting ongoing development trends in the platform.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. use AllocatorPool to reduce memory allocation in protobuf stream list: This issue proposes switching the default memory allocator used in the streaming Protobuf list encoder of kube-apiserver from SimpleAllocator to runtime.AllocatorPool to reduce memory allocation overhead and alleviate garbage collection pressure under high-concurrency workloads. The motivation is to improve performance and prevent out-of-memory conditions by reusing memory buffers during Protobuf serialization, especially in scenarios involving large streaming list requests.

    • The discussion includes requests for benchmark data to validate the proposed improvement, with the original poster sharing internal test results showing a significant memory reduction using AllocatorPool. Reviewers ask for reproducible tests and express concerns about the lack of observed improvements in some benchmark scenarios, suggesting further profiling and higher query rates to better evaluate the change’s impact before acceptance.
    • Number of comments this week: 10
  2. golangci-lint incorrectly configured deprecatedComment warning: This issue reports a warning from golangci-lint about the "deprecatedComment" check being redundantly enabled during a Kubernetes verification job, which caused confusion despite the job ultimately passing after a retest. The reporter expects the linting job to pass cleanly without such warnings and notes that running make lint locally can reproduce the behavior.

    • The comments clarify that the warning is harmless but should be fixed, with the job failure actually caused by an interrupted process likely due to node upgrades. The issue was triaged and accepted, and ongoing work is addressing related comment formatting problems that the linter does not fully detect.
    • Number of comments this week: 6
  3. Kubelet rejects pod with "NodeAffinity failed" due to stale informer data: This issue describes a problem where the kubelet rejects pods with a "NodeAffinity failed" error due to stale node label data in its informer cache, even though the scheduler has successfully scheduled the pod based on updated labels. The proposed short-term solution involves implementing a layered cache system that falls back to synchronous API server fetches when stale data causes affinity check failures, while long-term considerations include re-evaluating the kubelet’s role in enforcing node affinity and improving synchronization between scheduler and kubelet.

    • The comments discuss the feasibility of combining cache layers into a single cache object with logic to determine the freshest node data, debate the purpose and backward compatibility of the kubelet’s node affinity check, and acknowledge ongoing efforts to allow clients to compare resource versions safely; overall, the conversation accepts the issue as a bug and prioritizes it for long-term resolution.
    • Number of comments this week: 6
  4. Timeout-based termination of Pod "stuck" in terminating state: This issue requests the addition of a Kubernetes-native timeout mechanism to automatically handle Pods stuck in the terminating state, which currently requires manual intervention or external controllers to resolve. The proposal suggests implementing a timeout after which a Pod is either marked as Failed or an OutOfService taint is applied, regardless of the node state, to prevent workload blocking and ecosystem fragmentation caused by inconsistent solutions across projects.

    • The comments include calls for attention from relevant SIG and WG leads, express concerns about the risks of forcefully terminating Pods without ensuring container processes have stopped, and discuss the trade-offs between adding a Pod-level timeout versus handling the issue in higher-level controllers; overall, there is cautious interest in the proposal but also recognition of potential application-specific risks and operational complexities.
    • Number of comments this week: 6
  5. kubectl prints a warning about spec.SessionAffinity being ignored when creating a headless service: This issue reports that when creating a headless service using kubectl version v1.34.1, a warning about spec.SessionAffinity being ignored is always printed, even though session affinity defaults to None and is not explicitly set by the user. The reporter expects no warning to appear in this scenario, as the warning is misleading and was introduced by a recent code change.

    • The comments discuss whether the warning is unnecessary noise or a valid message, concluding it is incorrect because the warning triggers even when no session affinity is defined. A maintainer acknowledges this is a bug caused by the default value of the SessionAffinity field, and commits to fixing it so the warning only appears when SessionAffinity is explicitly set to ClientIP.
    • Number of comments this week: 6

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. Zone-aware down scaling behavior: This issue describes a problem with the horizontal pod autoscaler's (HPA) scale-in behavior in a Kubernetes deployment that uses topology spread constraints to evenly distribute pods across multiple zones. Specifically, during scale-in events, the pods become unevenly distributed with one zone having significantly fewer pods than allowed by the maxSkew: 1 setting, causing high CPU usage on the lone pod in that zone and violating the expected balanced pod distribution.
  2. apimachinery's unstructured converter panics if the destination struct contains private fields: This issue describes a panic occurring in the apimachinery's DefaultUnstructuredConverter when it attempts to convert an unstructured object into a destination struct that contains private (non-exported) fields. The reporter expects the converter to safely ignore these private fields instead of panicking, as this problem arises notably with protobuf-generated gRPC structs that include private fields for internal state.
  3. Integration tests for kubelet image credential provider: This issue discusses the need to create integration tests specifically for the kubelet image credential provider, similar to the existing tests for client-go credential plugins. It suggests that since there are already integration tests for pod certificate functionality, adding tests for the kubelet credential plugins would be a logical and beneficial extension.
  4. conversion-gen generates code that leads to panics when fields are accessed after conversion: This issue describes a bug in the conversion-gen tool where it generates incorrect conversion code for structs that have changed field types between API versions, specifically causing unsafe pointer conversions instead of properly calling the conversion functions. As a result, accessing certain fields like ExclusiveMaximum after conversion leads to runtime panics, highlighting the need for conversion-gen to produce safe and correct code to prevent such errors.
  5. Failure cluster [ff7a6495...] TestProgressNotify fails when etcd in k/k upgraded to 3.6.2: This issue describes a failure in the TestProgressNotify test that occurs when the etcd component in the Kubernetes project is upgraded to version 3.6.2. The test times out after 30 seconds waiting on a result channel, with error logs indicating that the embedded etcd server fails to set up serving due to closed network connections and server shutdowns.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 26

Summarized Issues:

  • Pod and Controller Update Issues: Several issues highlight problems with Kubernetes controllers and pod updates, including the DaemonSet controller failing to update pods with OnDelete strategy causing timeouts, and proposals to propagate non-controller ownerReferences to pods to improve ownership tracking while managing risks like garbage collection. These reflect challenges in ensuring pod lifecycle and ownership metadata are correctly handled to maintain cluster stability and observability.
  • [issues/133919, issues/133974]
  • Resource Metrics and Reporting Inconsistencies: There are multiple reports of inconsistencies and inaccuracies in resource metrics, such as Ceph RBD CSI volume stats missing after upgrade, stale PodMemory metrics lingering post-termination, and inconsistent ephemeral-storage units between node allocatable and capacity fields. These issues affect monitoring accuracy and can lead to misleading resource usage data in Kubernetes clusters.
  • [issues/133927, issues/133961, issues/134049]
  • Code Modernization and Cleanup: Some issues focus on codebase improvements, including removing deprecated sets.String in favor of generic sets.Set[T], cleaning up dead gRPC connection code in the DRA Health Status plugin, and replacing the SimpleAllocator with runtime.AllocatorPool to reduce memory overhead and improve performance. These efforts aim to modernize the code and optimize resource usage under load.
  • [issues/133935, issues/133943, issues/133956]
  • API Server and Client Behavior Problems: Problems with the Kubernetes API server and client interactions include failures verifying signed API server images due to missing signatures, intermittent "not found" errors when updating custom resource status subresources, and startup log flooding caused by incorrect alpha API version checks. These issues impact cluster security, API reliability, and log clarity.
  • [issues/133936, issues/134016, issues/134023]
  • Scheduler and Node Label Synchronization Issues: The kubelet rejecting pods due to stale node label caches causing NodeAffinity failures and scheduler test inconsistencies with feature gates illustrate synchronization and state management challenges between scheduler, kubelet, and feature flag states. These lead to pod admission errors and non-deterministic test outcomes.
  • [issues/133997, issues/134009]
  • Custom Resource and Validation Challenges: Issues with custom resources include incorrect type matching in apimachinery schemes based solely on Kind without Group, and CEL validation cost estimator errors when multiple maxLength constraints are combined, causing CRD rejections. These problems affect CRD correctness and validation reliability.
  • [issues/134001, issues/134029]
  • Kubelet and Node Stability Problems: The kubelet crashing on AWS EKS Windows nodes due to Go panic recovery failures with compiler optimizations, and requests to enhance kubelet sysctl configuration with pattern matching for interface-scoped sysctls, highlight stability and configurability concerns at the node agent level.
  • [issues/134003, issues/134005]
  • Performance Bottlenecks and Optimization Proposals: Lock contention in KMSv2 metric updates causing latency and proposals to move scheduler plugins to staging for easier reuse reflect ongoing efforts to identify and reduce performance bottlenecks and improve modularity in Kubernetes components.
  • [issues/134002, issues/133994]
  • Logging and Observability Enhancements: Proposals to add verbose logging for outbound HTTP requests and rename probe helper files to reduce log noise aim to improve observability and auditing capabilities in Kubernetes networking components.
  • [issues/134025]
  • Pod Termination and Resource Cleanup Improvements: There is a proposal to implement native timeout-based handling of Pods stuck in terminating state to mark them Failed or taint them OutOfService, addressing current manual intervention needs and workload blocking issues in tools like Kueue.
  • [issues/134038]
  • Service Creation Warning Bug: A bug causes misleading warnings about spec.SessionAffinity being ignored when creating headless services with kubectl, despite default values being correct, indicating a problem in warning logic introduced recently.
  • [issues/134040]
  • DRA Resource Management Enhancements: Requests for rollout, upgrade, and rollback planning for DRA extended resources as part of their Beta promotion indicate ongoing efforts to improve lifecycle management of these resources.
  • [issues/134048]
  • PersistentVolume Controller Efficiency: Improving the pv-controller by directly binding PVs to PVCs when claimRefs are present aims to reduce re-queuing and scheduling delays, enhancing batch scheduling efficiency.
  • [issues/134055]

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 10

Summarized Issues:

  • Kubelet and Pod Termination Issues: The kubelet experiences a deadlock in its gRPC connection to the NVIDIA DRA driver during pod termination, causing pods to hang indefinitely in the Terminating state until the kubelet is restarted. This regression in Kubernetes 1.34 is triggered by synchronous calls inside a gRPC connection stats handler when the connection is idle.
  • issues/133920
  • Kube-Proxy and Network Service Failures: Windows KubeProxy intermittently deletes ClusterIP load balancers in the Host Network Service when internalTrafficPolicy is set to Local, causing unexpected service disruptions. Additionally, inconsistent kube-proxy test behaviors regarding skipping or failing when kube-proxy is absent have led to discussions on unifying test logic for better reliability.
  • issues/133928, issues/133950, issues/133950
  • Flaky and Failing Tests in Kubernetes: Multiple tests exhibit flakiness or high failure rates, including TestKMSv2ProviderKeyIDStaleness with timeouts and isolation issues, pod deletion tests failing due to race conditions and storage compaction, and the TestIsConnectionReset unit test intermittently not returning expected errors. These test failures affect reliability and require mitigation or investigation.
  • issues/133945, issues/133976, issues/133986, issues/133976, issues/133986
  • Resource Exhaustion and Pod Scheduling Failures: Downward API tests intermittently fail because pods with CPU limits close to node capacity are rejected due to resource exhaustion, resulting in "OutOfcpu" errors and preventing pod startup. This causes flakiness in sig-node tests related to pod-level resource management.
  • issues/134013
  • Logging and API Quota Issues: Excessive ERROR-level log noise occurs due to WebSocket streaming upgrade failures caused by a race condition during container lifecycle transitions, though functionality is unaffected due to client fallback. Separately, container image push jobs fail due to exceeding Google Cloud Build API quotas for log streaming, causing timeouts despite successful image builds.
  • issues/134000, issues/134008, issues/134008
  • Annotation Behavior Confusion in Kueue: The kueue.x-k8s.io/retriable-in-group annotation has inverted logic causing pods to be retriable by default instead of requiring explicit opt-in, which is confusing and non-idiomatic. A proposal suggests inverting this logic to align with typical Kubernetes conventions for clearer semantics.
  • issues/134051

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 57

Key Open Pull Requests

1. kubelet: Refactor ComputePodQOS by extracting resource-collection logic (no behavior change): This pull request refactors the ComputePodQOS function by extracting the resource-collection logic into smaller, focused helper functions to improve code readability, maintainability, and unit testing without changing any existing behavior.

  • URL: pull/133932
  • Merged: No
  • Associated Commits: 2466b, 63e47, de40e, 1233c, 2dd15, 02c8d, 6d8ad, 56b31, cc04c, 00c26, cc9da, ca786, 9b533

2. scheduler/volumebinding: passive assume cache: This pull request introduces a simpler, passive AssumeCache implementation specifically for the VolumeBinding scheduler plugin to resolve race conditions and improve scheduling efficiency by maintaining an up-to-date assumed state without duplicating informer data or dispatching events, thereby addressing differences from the DynamicResource plugin’s more complex AssumeCache usage.

  • URL: pull/133929
  • Merged: No
  • Associated Commits: dce23, eaf87, c385a, ed194, 5a708, 4b0ef, bbee7, e39ed

3. kubelet: refactor DRA plugin health client initialization: This pull request refactors the initialization of the DRA plugin health client in the kubelet to fix a bug where reusing an existing gRPC connection could leave the health client nil, causing panics, by ensuring the health client is always created when a connection is available, renaming the connection function, and removing unnecessary nil checks.

  • URL: pull/133964
  • Merged: No
  • Associated Commits: 42e91, 8f81e, 69190, 76d61, be263, 583ff, 522ad

Other Open Pull Requests

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 45

Key Closed Pull Requests

1. refactor(validation-gen): Refactor Union and Item validation: This pull request refactors the validation generation for unions and items by improving the UnionMembership API for better type safety, enhancing generated code comments for clarity, consistently using JSON field names in validation paths, updating context propagation to simplify union validation logic, and performing general cleanup and renaming to increase readability and maintainability of the validation framework.

  • URL: pull/133973
  • Merged: Yes
  • Associated Commits: fd3fc, 9e71a, f51d5, 348d5, 8130c, 3e2e2, 023c0, 1417e, 1f61a, 77c1a, 3f068, 26283, c047b, 8f679, 2d71a

2. feat(validation-gen): refactor ratcheting and add +k8s:unique: This pull request refactors the ratcheting emission logic in the validation generator to perform checks before calling type validation functions and introduces the +k8s:unique tag to support uniqueness validation on atomic lists.

  • URL: pull/133982
  • Merged: Yes
  • Associated Commits: b6fdb, 0d5e3, 3e15d, 6fb02, c4d8c, a5b29, 62662, bbdd2

3. update prometheus' client_golang and common packages: This pull request updates the Prometheus client_golang and common packages to specific versions to resolve compatibility breaks and remove unnecessary dependencies, coordinating with Prometheus maintainers to restore previous functionality and prevent users from pulling in problematic package versions.

  • URL: pull/133921
  • Merged: Yes
  • Associated Commits: bdfca, e2e7f

Other Closed Pull Requests

  • Scheduler performance and test improvements: Multiple pull requests enhance scheduler performance testing and integration reliability by refining workload definitions, adding sanity checks, and improving benchmark metrics to focus on workload execution periods. These changes reduce noise in scheduling benchmarks and ensure pods are properly scheduled during tests, improving overall test accuracy and stability.
    [pull/133941, pull/133960, pull/133939, pull/133981, pull/133983]
  • Scheduler cache and API call refactoring: Updates to the scheduler include refactoring the AssumeCache to read from both local and informer caches to fix scheduling delays, and proposing the migration of asynchronous API call types to a staging repository to facilitate out-of-tree plugin development. These changes improve scheduler responsiveness and modularity without affecting user-facing features.
    [pull/133924, pull/133965]
  • Kubelet logging and stability fixes: Several pull requests migrate kubelet components to use contextual logging and fix a deadlock issue in the gRPC connection to a DRA driver by moving connection attempts to a separate goroutine. These efforts improve logging clarity and prevent blocking issues in kubelet operations.
    [pull/133926, pull/133930, pull/133933, pull/133957]
  • Kube-proxy and network policy fixes: Fixes address kube-proxy issues where the ClusterIP load balancer would disappear unnecessarily under certain InternalTrafficPolicy settings by ensuring consistent endpoint hash checks. This restores expected network behavior and test reliability.
    [pull/133949, pull/133971]
  • Configuration and validation enhancements: Pull requests add validation for kubelet serving certificates and introduce UUID format validation tags for API fields, improving security and data integrity. Additionally, a special internal marker is introduced to handle certificate data clearing in kubeconfig overrides, maintaining backward compatibility.
    [pull/133918, pull/133947, pull/133948]
  • Test data and changelog maintenance: Updates include adding new API version test data while removing outdated versions, cleaning up duplicate changelog entries, and reverting flaky test retries to maintain test suite stability and accuracy.
    [pull/133966, pull/133958, pull/133979]
  • Miscellaneous fixes and cleanups: Other changes include restoring a kube-proxy test label for easier test skipping, merging feature gates and system-reserved items to fix failing tests, and cleaning up duplicate log entries to improve clarity. A proposed but unmerged configuration for a specific CI workflow was also noted.
    [pull/133955, pull/133971, pull/134046]

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
pohly 35 13 7 55
BenTheElder 20 1 1 75
liggitt 5 2 1 62
serathius 14 7 1 36
dims 9 2 7 35
huww98 18 5 0 20
pacoxu 6 5 1 30
thockin 22 0 0 13
jpbetz 16 1 1 16
nikos445 14 1 1 15

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.