Weekly Project News

Archives

Weekly GitHub Report for Kubernetes: April 06, 2026 - April 13, 2026 (19:29:48)

Weekly GitHub Report for Kubernetes

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

The current version of this repository is v1.32.3

1.2 Version Information:

The Kubernetes 1.32 release, announced on March 11, 2025, introduces key updates detailed in the official CHANGELOG, including new features and improvements accessible via additional binary downloads. For comprehensive information, users are encouraged to review the full changelog and related announcements.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. [SIG/SCHEDULING] [KIND/FEATURE] [NEEDS-TRIAGE] Scale max pod backoff time with the number of unscheduled pods in the cluster: This issue proposes enhancing the kube-scheduler by dynamically scaling the maximum pod backoff time based on the number of unscheduled pods and scheduling throughput, rather than relying on a static configuration. The goal is to prevent starvation of lower priority pods in large clusters where the current fixed backoff time is insufficient, and to potentially incorporate pod scheduling complexity into the backoff calculation for more nuanced control.

    • The comments discuss various approaches including logarithmic scaling of backoff time, introducing complexity multipliers based on pod constraints, and the implications of changing default behaviors or adding new API fields. There is debate over whether backoff time should be dynamic or static, concerns about packed cluster scenarios, and suggestions for separate backoff handling for pod groups, with consensus leaning towards careful API design and acknowledging that some cluster scheduling challenges may require solutions beyond backoff adjustments.
    • Number of comments this week: 17
  2. [SIG/SCHEDULING] [KIND/FEATURE] [TRIAGE/ACCEPTED] WAS: PodGroupCycleState and PlacementCycleState: This issue proposes adding two new states, PodGroupCycleState and PlacementCycleState, to optimize performance by allowing plugins to store and retrieve data more efficiently during scheduling cycles, thereby reducing duplicated calculations. It also discusses adopting these states across plugins to facilitate data sharing, particularly using PlacementCycleState to pass computed data from GeneratePlacement to other extension points.

    • The comments indicate acceptance of the feature request and coordination among contributors, with an initial pull request for the plumbing completed and readiness to adopt the new states across plugins once the foundational work is approved, emphasizing collaboration to minimize duplicated calculations.
    • Number of comments this week: 9
  3. [KIND/BUG] [SIG/APPS] [NEEDS-TRIAGE] kube-controller-manager: the maximum backoff 1000s was triggered incorrectly during DaemonSet upgrade: This issue describes a problem in the kube-controller-manager where the maximum backoff time of 1000 seconds is triggered incorrectly during a DaemonSet upgrade when pod creation fails due to an unavailable webhook. The DaemonSet controller retries pod creation multiple times rapidly, causing an exponential backoff that delays recovery even after the webhook becomes available again.

    • The comments discuss the behavior of the delaying queue and ratelimiter, noting that rapid consecutive failures cause the backoff to jump to the maximum delay immediately; participants acknowledge this explanation and express intent to investigate and improve the handling of this backoff behavior.
    • Number of comments this week: 6
  4. [SIG/API-MACHINERY] [SIG/ARCHITECTURE] [AREA/CODE-ORGANIZATION] [NEEDS-TRIAGE] [AREA/CODE-ORGANIZATION/FUTURE-DEPENDENCIES] sig-arch-code-organization#unit-master-dependencies is failing!: This issue reports a failure in the sig-arch-code-organization#unit-master-dependencies CI job caused by recent dependency updates, specifically changes in the cel-go library that altered error messages related to context cancellation. The problem arises because the updated cel-go version now returns a different error string ("context canceled" instead of "operation interrupted"), causing existing tests that expect the old error message to fail.

    • The comments identify the root cause as an intentional change in the cel-go library’s ContextEval method to improve error reporting, discuss the implications for Kubernetes tests, and consider possible solutions including updating Kubernetes code to handle both error messages or patching cel-go for backward compatibility.
    • Number of comments this week: 6
  5. [SIG/SCHEDULING] [KIND/FEATURE] [TRIAGE/ACCEPTED] WAS: Retry binding API calls on pod group binding cycle: This issue proposes adding retry logic with backoff to the pod group binding API calls during the scheduling cycle to improve reliability, especially for large workloads where binding failures can cause expensive rescheduling and stuck jobs. It discusses different retry strategies and highlights that while retries won't completely solve the problem, they can reduce the likelihood of failures caused by transient network or API server issues, ultimately enhancing scheduler resilience.

    • The comments show agreement on the need for retries similar to those used for pod status patching, discuss the historical reasons for not retrying binding calls individually, and consider the increased importance of retries due to the higher cost of scheduling large pod groups; a newcomer also expressed interest in contributing to the issue.
    • Number of comments this week: 5

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

As of our latest update, there are no stale issues for the project this week.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 24

Summarized Issues:

  • Scheduling Improvements and Optimizations: Several issues focus on enhancing the Kubernetes scheduler's efficiency and reliability. These include proposals for dynamic scaling of pod backoff times to prevent starvation, adding retry logic with backoff to pod binding API calls to reduce rescheduling, optimizing inter-pod affinity and anti-affinity to reduce CPU usage, and introducing new cycle states to reduce duplicated calculations and enable data sharing during scheduling cycles.
  • issues/138249, issues/138255, issues/138270, issues/138314
  • Pod Scheduling and Admission Failures: Multiple issues describe pod scheduling and admission problems causing pod failures or restarts. These include pods failing with UnexpectedAdmissionError after kubelet restarts when using device plugins, unschedulable pods being incorrectly added to node nominators causing inefficiencies, and a race condition in the scheduler causing infinite growth of inFlightEvents due to pod UID reuse.
  • issues/138251, issues/138267, issues/138316
  • Flaking and Failing Tests: There are reports of flaky or failing tests affecting Kubernetes stability and reliability. These include a flaking test in the pod autoscaler controller due to zero reconciliation duration, a failing topology manager test caused by CPU Manager policy mismatches, and a scheduler preemption test failure where lower priority pods fail to schedule as expected.
  • issues/138252, issues/138263, issues/138268
  • API Server and Metrics Enhancements: Issues address improvements to API server metrics and feature gate handling. Proposals include unifying apiserver storage metrics for etcd3 and watch cache with a common naming scheme, and deferring metric initialization until feature gates are applied to avoid silent failures and ordering issues.
  • issues/138264, issues/138320
  • Backoff and Retry Logic in Controllers: Problems with backoff and retry mechanisms in controllers are highlighted. One issue describes a DaemonSet upgrade triggering an excessive maximum backoff due to webhook unavailability, causing delayed retries and improper enqueue handling.
  • issues/138280
  • OpenAPI and Storage Version Compatibility: Issues discuss backward incompatibility and storage version management. One issue covers the removal of rest friendly name conversion for OpenAPI model names causing invalid schemas without a specific annotation, while others address removing storage version overrides and delays in promoting beta versions to storage versions.
  • issues/138247, issues/138283, issues/138292
  • Scheduler Identity and Scaling: A proposal introduces a kube-scheduler identity mechanism using Lease objects to enable active-active scheduler scaling, health monitoring, and advanced coordination by establishing unique scheduler instance identities within clusters.
  • issues/138310
  • Security Concerns with ClusterRole Permissions: A security issue is raised regarding the default edit ClusterRole allowing impersonation of namespace-scoped ServiceAccounts, which can lead to privilege escalation when privileged ServiceAccounts exist. The issue requests improved documentation, hardening guidance, and reconsideration of this impersonation capability.
  • issues/138315
  • Build and CI Failures: Several issues report build and continuous integration failures. These include build errors caused by an outdated git version in the containerized build environment, cluster bring-up script failures likely due to environment setup problems, and CI job failures caused by changes in the cel-go dependency altering error messages.
  • issues/138325, issues/138326, issues/138334
  • Enhancements to Client Caching and Pod Security: Proposals include adding a pluggable storage codec interface to client-go's ThreadSafeStore for memory-efficient caching, and adding the image volume type to the Restricted Pod Security Standard allowlist to enable OCI image volumes without relaxing security policies.
  • issues/138342, issues/138343
  • Admission Policy Enhancements: A request is made to add a high-precision, millisecond-level request.time or now variable in the CEL environment for Mutating Admission Policies to enable accurate performance tracking and autoscaling for latency-sensitive workloads.
  • issues/138276

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 7

Summarized Issues:

  • Feature Gate Naming Issues: The feature gate name EnableWorkloadWithJob is considered unhelpful because it includes the word "Enable," which is discouraged since feature gates are inherently enabled or disabled. There is a question about whether this naming can be changed before the v1.36 release.
  • issues/138204
  • Test Failures and Flakiness: Multiple test failures are reported, including flaky failures in the TestStorageVersionMigrationWithCRD due to context deadline exceeded errors, and verify job failures on the master branch specifically in the verify.openapi-spec test blocking a required pull request. These issues affect test reliability and release processes, with some resolved by subsequent pull requests.
  • issues/138218, issues/138282
  • Controller and Rollout Stability Problems: The Daemonset Controller can select unhealthy or stuck nodes during rollouts, causing the rollout process to become stuck and requiring manual intervention to adjust the maxUnavailable setting. This issue impacts the stability and automation of rollout procedures.
  • issues/138240
  • Image Replacement and Air-Gapped Cluster Failures: DRA end-to-end tests fail in air-gapped Kubernetes clusters when test images are mirrored to a local registry due to the image replacement process failing from missing image tags in the csi-driver-hostpath YAML manifest. This causes problems in testing and deployment in restricted network environments.
  • issues/138317
  • Security Vulnerabilities in CSI Drivers and SDKs: There are critical security vulnerabilities including a path traversal attack in the Kubernetes CSI Driver for SMB caused by insufficient validation of the subDir parameter, and a high-severity PATH hijacking risk in the OpenTelemetry SDK due to executing the BSD kenv command without an absolute path. Both require urgent mitigation to prevent potential exploitation.
  • issues/138319, issues/138329

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 59

Key Open Pull Requests

1. POC - clean up spurious inline json tag marker: This pull request aims to clean up and fix the detection of spurious inline JSON tag markers across various components such as kube-openapi, gengo, structured-merge-diff, and apply-configuration, including updating dependencies and regenerating API documentation to ensure consistent and correct handling of JSON tags in the Kubernetes codebase.

  • URL: pull/138260
  • Associated Commits: ec903, d245f, 5201b, 2e5d6, 02d65, 82818, e9868, 7b85d, 38e63, 7f4da

2. RFC: ktesting dependencies: This pull request restructures the ktesting module to remove dependencies on other staging repositories like client-go and component-base/logs, enabling ktesting to be published as a standalone staging repo suitable for testing across Kubernetes projects by moving client-go code into a separate submodule, replacing testify with Gomega for assertions, handling logging initialization internally, and introducing a new API for passing clients via context values.

  • URL: pull/138258
  • Associated Commits: 53119, 2cabb, 3669d, 8a994, f9226, bb9a0, 897af, a6f60, 8a8b5

3. Add API definition tests for standard strategy behaviors: This pull request adds API definition tests to ensure that standard strategy behaviors in Kubernetes APIs follow best practices, highlights existing exceptions in API behavior for further review, and organizes these tests using a new helper to improve consistency and coverage.

  • URL: pull/138254
  • Associated Commits: 643ec, f8a4f, cdcbd, eead2, d11e0, a56f0, 566e5

Other Open Pull Requests

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 32

Key Closed Pull Requests

1. [release-1.33] Bump to go 1.25.8: This pull request updates the Kubernetes release-1.33 branch to use Go version 1.25.8, including necessary compatibility changes from a related upstream pull request, thereby ensuring that Kubernetes 1.33 is built with Go 1.25.

  • URL: pull/138151
  • Associated Commits: 8489a, a4cc2
  • Associated Commits: 8489a, a4cc2

2. Rename feature gate EnableWorkloadWithJob to WorkloadWithJob: This pull request renames the feature gate from EnableWorkloadWithJob to WorkloadWithJob to remove redundancy and improve clarity in naming conventions, as feature gates are inherently enable/disable switches.

  • URL: pull/138210
  • Associated Commits: b9b0f, bfe8f
  • Associated Commits: b9b0f, bfe8f

3. Automated cherry pick of #133624: Fix flaking RunTestDelayedWatchDelivery: This pull request is an automated cherry pick of a previous fix addressing the flakiness of the RunTestDelayedWatchDelivery test, applied to the release-1.34 branch to improve test stability.

  • URL: pull/137997
  • Associated Commits: 0b87e
  • Associated Commits: 0b87e

Other Closed Pull Requests

  • Test Flakiness Fixes: Multiple pull requests focus on improving test stability by deflaking various tests such as RunTestDelayedWatchDelivery, TestPodSubresourceAuth, CSI mock storage-capacity exhausted late-binding, and device-plugin-failures. These fixes include waiting for effective permissions, ensuring test node readiness, and addressing connection refused errors to enhance reliability.
    • pull/137998, pull/138137, pull/138243, pull/138288
  • Device Plugin and Kubelet Improvements: Several pull requests address device plugin and kubelet issues, including fixing admission failures after container restarts, updating CDI device specs for compatibility, and optimizing topology hint computation to reduce kubelet stalls. These changes improve device management and resource allocation efficiency.
    • pull/138027, pull/138136, pull/138244
  • Go Version Updates: Multiple pull requests update the Go language version used in the Kubernetes project, including bumps to 1.25.8, 1.25.9, 1.26.2, and related container image updates. These updates ensure the project benefits from the latest security patches and improvements from the Go team.
    • pull/138150, pull/138261, pull/138290, pull/138303
  • OpenAPI Specification and Storage Version Pinning: Several pull requests update the OpenAPI spec for version v1.36.0-rc.0 and pin the storage version of MutatingAdmissionPolicy and MutatingAdmissionPolicyBinding to v1beta1. These changes prevent premature defaulting to v1 storage versions and fix verification issues blocking release processes.
    • pull/138278, pull/138279, pull/138284, pull/138290
  • API Server Metrics and Logging Improvements: Pull requests introduce new apiserver_storage_get_* metrics to better track GET request costs and reduce metric pollution, and downgrade log levels for unserved GroupVersionResources to reduce log noise. These changes enhance observability and reduce unnecessary error logging.
    • pull/138197, pull/138300
  • Deployment and Rollout Bug Fixes: A pull request fixes a bug in the Recreate deployment strategy by modifying pod deletion handling to prevent rollout stalls caused by lingering terminated pods. This includes adding a shared helper function to identify terminal pods consistently.
    • pull/138231
  • Test Suite Organization: One pull request proposes creating a dedicated directory for Windows node-level end-to-end tests to isolate them from Linux tests, ensuring clean separation and platform-specific build gating without affecting existing Linux tests.
    • pull/138234
  • Release Process Automation: A pull request adds publishing bot rules for the release-1.36 staging repositories to automate and manage the release process more effectively.
    • pull/138277
  • Miscellaneous Updates and Experiments: Other pull requests include reducing chaos workers in SVM tests to prevent timeouts, enabling load balancing for the watch feature, removing unnecessary struct tags for consistency, and testing GPU-related tests using AWS Lambda functions.
    • pull/138278, pull/138281, pull/138203, pull/138257, pull/138299

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
liggitt 16 4 0 31
Jefftree 26 4 2 3
jpbetz 18 5 0 12
dims 9 4 1 20
pohly 17 2 0 11
isumitsolanki 9 5 0 9
pacoxu 11 6 3 1
macsko 1 0 1 19
fanzhangio 2 1 0 18
Lidang-Jiang 18 2 0 0

Access Last Week's Newsletter:

  • Link
Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.