Ubuntu16.04手动部署Kubernetes(1)——Master和Node部署


之前在《Kubernetes初体验》中我们使用Minikube快速体验了一把Kubernetes,然后在《Kubernetes架构及资源关系简单总结》一文中我们又简单介绍了Kubernetes的框架以及Kubernetes中的一些关键术语和概念,或者称之为资源、对象。本文主要讲Kubernetes的一种原始部署方式。Kubernetes从开发至今,其部署方式已经变得越来越简单。常见的有三种:

  • 最简单的就是使用Minikube方式。下载一个二进制文件即可拥有一个单机版的Kubernetes,而且支持各个平台。
  • 从源码安装。这种方式也是简单的进行一些配置,然后执行kube-up.sh就可以部署一个Kubernetes集群。可参见官方文档《Manually Deploying Kubernetes on Ubuntu Nodes》。PS:目前,该文档部署Kubernetes 1.5.3版本会有些问题,可关注#39224
  • 通过kubeadm部署。可参见官方文档《Installing Kubernetes on Linux with kubeadm

其实,除了上面三种方式外,有些Linux发行版已经提供了Kubernetes的安装包,比如在CentOS 7上面,直接执行yum install -y etcd kubernetes即可安装Kubernetes,然后做些配置就可以完成部署了。我相信对于Google这种追求自动化、智能化的公司,他们会让Kubernetes部署方式还会更加简化。但这些都不是本文的重点,本文要讲述的是如何像堆积木一样一个模块一个模块的部署Kubernetes。为什么要这样做?

为了更好的理解学习Kubernetes。前面我们已经简单介绍过Kubernetes的架构,知道它其实是由几大模块组成,各个模块间合作构成一个集群。现在简单化的部署方式屏蔽了很多细节,使得我们对于各个模块的感知少了很多。而且很容器觉得Kubernetes的内部部署细节非常的麻烦或者复杂,但其实并非如此,其实Kubernetes集群就是由为数不多的几个二进制文件组成,部署一个基本的集群也非难事。因为是使用Go开发的,这些二进制文件也没有任何依赖,从别的地方拷贝过来就可使用。本文就介绍如何从这些二进制文件搭建一个Kubernetes集群, 以加深对Kubernetes的理解。而且,其他部署方式其实也只是对这种方式的一种封装。

现在Systemd逐渐替代了Upstart,有的部署方式也只支持Systemd的Linux发行版,如果是Upstart,还得做适配。至于什么是Systemd和Upstart,不是本文要讨论的,后续会总结发出来。这里我使用的Linux发行版是Ubuntu 16.04。当然Ubuntu 15.04+的都使用的是Systemd,应该都是适用的,其他使用Systemd的系统应该也是适用的,但可能需要做些小的改动。另外,前文介绍了Kubernetes集群分为Master和Node,所以我们部署也一样,分为Master的部署和Node的部署。

我的环境是用Virtualbox虚拟了两台Ubuntu 16.04,虚拟机和主机的通信方式是NAT和host-only方式。NAT用于访问外网,host-only用于两台虚拟机之间访问,IP分别为192.168.56.101和192.168.56.102.其中101这台机器机器既是Master,又是Node;102是Node。本文只装了101,后面再测试网络等需要多台的时候再安装102.因为Kubernetes里面Node是主动向Master注册的(通过Node上面的kubelet),所以要扩展Node的话也非常容易。

获取二进制文件

我们都需要哪些二进制文件呢?回想一下《Kubernetes架构及资源关系简单总结》中,Kubernetes集群内主要包含这些模块:Master中:APIServer、scheduler、controller manager、etcd;Node中:kubelet、kube-proxy、runtime(这里指Docker)。

上面每个模块都由一个二进制文件实现,所以我们需要上面每个模块对应的那个二进制文件。获取方式有很多。最直观的方式就是去github上面下载release的包,里面有二进制文件。但那个包有1GB+大小,特别对于中国用户就呵呵了,当然还有许多其他获取的方式。

注意:源代码目录里面也有很多名字和二进制名字相同的文件,但那些不是二进制文件,而是一些去掉后缀的shell脚步,都只有KB级别的大小,而真正的二进制文件都是MB级别的,注意别搞错了。

推荐使用下面的命令下载kubernetes-server-linux-amd64.tar.gz包:

curl -L https://storage.googleapis.com/kubernetes-release/release/v${KUBE_VERSION}/kubernetes-server-linux-amd64.tar.gz -o kubernetes-server-linux-amd64.tar.gz

这个包解压后的kubernetes/server/bin目录下就有我们需要的二进制文件(只使用了其中6个):

ubuntu➜  bin ll
total 1.3G
-rwxr-x--- 1 root root 145M Dec 14 09:06 hyperkube
-rwxr-x--- 1 root root 118M Dec 14 09:06 kube-apiserver
-rw-r----- 1 root root   33 Dec 14 09:06 kube-apiserver.docker_tag
-rw-r----- 1 root root 119M Dec 14 09:06 kube-apiserver.tar
-rwxr-x--- 1 root root  97M Dec 14 09:06 kube-controller-manager
-rw-r----- 1 root root   33 Dec 14 09:06 kube-controller-manager.docker_tag
-rw-r----- 1 root root  98M Dec 14 09:06 kube-controller-manager.tar
-rwxr-x--- 1 root root 6.6M Dec 14 09:06 kube-discovery
-rwxr-x--- 1 root root  44M Dec 14 09:05 kube-dns
-rwxr-x--- 1 root root  44M Dec 14 09:05 kube-proxy
-rw-r----- 1 root root   33 Dec 14 09:06 kube-proxy.docker_tag
-rw-r----- 1 root root 174M Dec 14 09:06 kube-proxy.tar
-rwxr-x--- 1 root root  51M Dec 14 09:06 kube-scheduler
-rw-r----- 1 root root   33 Dec 14 09:06 kube-scheduler.docker_tag
-rw-r----- 1 root root  52M Dec 14 09:06 kube-scheduler.tar
-rwxr-x--- 1 root root  91M Dec 14 09:06 kubeadm
-rwxr-x--- 1 root root  49M Dec 14 09:06 kubectl
-rwxr-x--- 1 root root  46M Dec 14 09:06 kubefed
-rwxr-x--- 1 root root 103M Dec 14 09:06 kubelet

我将这些二进制文件都放到了/opt/bin目录下,并且将该目录加到了PATH中。你也可以直接将这些文件放到系统的PATH路径中,比如/usr/bin

OK,有了这些二进制文件,我们就可以开始部署了。

部署Master

前文介绍过,Master上面主要四个模块:APIServer、scheduler、controller manager、etcd,我们一一来部署。

部署etcd

我建议直接使用apt install etcd命令去安装,这样同时也会安装etcdctl。安装完后etcd的数据默认存储在/var/lib/etcd/default目录,默认配置文件为/etc/default/etcd,可通过/lib/systemd/system/etcd.service文件进行修改。

2017.9.4更新

Kubernets新版本(我记得好像是1.6开始吧,记不清了)已经不支持etcd 2.x版本了,但是在Ubuntu 16.04上面通过apt install装的是2.2版本,这样会导致api-server无法和etcd通讯,而导致一些问题,所以建议从github下载最新etcd 3.x(https://github.com/coreos/etcd/releases),然后手动安装。创建/lib/systemd/system/etcd.service文件:

[Unit]
Description=Etcd Server
Documentation=https://github.com/coreos/etcd
After=network.target

[Service]
User=root
Type=simple
EnvironmentFile=-/etc/default/etcd
ExecStart=/opt/k8s/v1_6_9/etcd-v3.2.7-linux-amd64/etcd    # 改为你自己路径
Restart=on-failure
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target

安装好以后,执行以下命令:

# 重新加载systemd配置管理,切记增加`*.service`后一定要先执行该命令,否则启动服务时会报错
systemctl daemon-reload

systemctl enable etcd.service    # 将etcd加到开机启动列表中
systemctl start etcd.service    # 启动etcd

安装好以后,etcd默认监听http://127.0.0.1:2379地址供客户端连接。我们可以使用etcdctl来检查etcd是否正确启动:

ubuntu➜  bin etcdctl cluster-health
member ce2a822cea30bfca is healthy: got healthy result from http://localhost:2379
cluster is healthy

可以看到运行正常。当然,部署多台的话,因为所有Node都需要访问etcd,所以etcd必须要监听在其他Node可以访问的IP上面才可以,在/etc/default/etcd中增加以下两行:

ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"

重启etcd即可使etcd在所有IP上起监听。

部署APIServer

APIServer对应的二进制文件是kube-apiserver,我们先来设置systemd服务文件/lib/systemd/system/kube-apiserver.service:

[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes
After=etcd.service
Wants=etcd.service

[Service]
EnvironmentFile=/etc/kubernetes/apiserver
ExecStart=/opt/bin/kube-apiserver $KUBE_API_ARGS
Restart=on-failure
Type=notify
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

重点项简单说明:

  • kube-apiserver服务依赖etcd,所以设置了After
  • EnvironmentFile是该服务的配置文件。
  • ExecStart说明如何启动该服务。

我们看到kube-apiserver的启动参数为$KUBE_API_ARGS,我们在配置文件/etc/kubernetes/apiserver中定义这个环境变量:

KUBE_API_ARGS="--etcd_servers=http://127.0.0.1:2379 --insecure-bind-address=0.0.0.0 --insecure-port=8080 --service-cluster-ip-range=169.169.0.0/16 --service-node-port-range=1-65535 --admission_control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ResourceQuota --logtostderr=false --log-dir=/var/log/kubernetes --v=2"

选项说明:

  • --etcd_servers:就是etcd的地址。
  • --insecure-bind-address:apiserver绑定主机的非安全IP地址,设置0.0.0.0表示绑定所有IP地址。
  • --insecure-port:apiserver绑定主机的非安全端口,默认为8080。
  • --service-cluster-ip-range:Kubernetes集群中Service的虚拟IP地址段范围,以CIDR格式表示,该IP范围不能与物理机真实IP段有重合。
  • -service-node-port-range:Kubernetes集群中Service可映射的物理机端口范围,默认为30000~32767.
  • --admission_control: Kubernetes集群的准入控制设置,各控制模块以插件形式依次生效。
  • --logtostderr:设置为false表示将日志写入文件,不写入stderr。
  • --log-dir: 日志目录。
  • --v:日志级别。

OK,APIServer的部署配置完成了,其实主要分两部分:

  • 创建systemd服务文件,有了该文件,就可以使用systemd去控制该服务,比如启停、开机自启等。systemd的命令、语法等后面写文章介绍。
  • 模块的配置文件,用于控制模块如何启动及功能控制。

后面其他模块的配置与之大同小异。

部署controller manager

controller manager对应的二进制文件是kube-controller-manager,且该服务依赖于kube-apiserver。

依旧先配置systemd的服务文件/lib/systemd/system/kube-controller-manager.service

[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes
After=kube-apiserver.service
Requires=kube-apiserver.service

[Service]
EnvironmentFile=/etc/kubernetes/controller-manager
ExecStart=/opt/bin/kube-controller-manager $KUBE_CONTROLLER_MANAGER_ARGS
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

/etc/kubernetes/controller-manager中设置$KUBE_CONTROLLER_MANAGER_ARGS

KUBE_CONTROLLER_MANAGER_ARGS="--master=http://192.168.56.101:8080 --logtostderr=false --log-dir=/var/log/kubernetes --v=2"

--master指的是APIServer的地址。

部署scheduler

scheduler对应的二进制文件是kube-scheduler,scheduler依赖于APIServer。

配置systemd服务文件/lib/systemd/system/kube-scheduler.service

[Unit]
Description=Kubernetes Scheduler Manager
Documentation=https://github.com/kubernetes
After=kube-apiserver.service
Requires=kube-apiserver.service

[Service]
EnvironmentFile=/etc/kubernetes/scheduler
ExecStart=/opt/bin/kube-scheduler $KUBE_SCHEDULER_ARGS
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

在配置文件/etc/kubernetes/scheduler中设置$KUBE_SCHEDULER_ARGS

KUBE_SCHEDULER_ARGS="--master=http://192.168.56.101:8080 --logtostderr=false --log-dir=/var/log/kubernetes --v=2"

至此,Master上面的四个模块都部署完了,我们按照顺序启动他们,并将其加入到开机自启动选项中:

# 重新加载systemd配置管理,切记增加`*.service`后一定要先执行该命令,否则启动服务时会报错
systemctl daemon-reload

# enable表示该服务开机自启,start表示启动该服务
systemctl enable kube-apiserver.service   
systemctl start kube-apiserver.service
systemctl enable kube-controller-manager.service
systemctl start kube-controller-manager.service
systemctl enable kube-scheduler.service
systemctl start kube-scheduler.service

然后我们分别运行systemctl status <service_name>来验证服务的状态,“running”表示启动成功。如果未成,也可看到错误日志。

部署Node

Node上面运行三个模块:kubelet、kube-proxy、runtime。其中runtime目前指的是docker或者rkt,这里我们使用docker,docker的安装这里就不赘述了,最好安装最新版本的docker。

部署kubelet

kubelet对应的二进制文件是kubelet,且其依赖Docker服务。

配置systemd服务文件/lib/systemd/system/kubelet.service

[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
EnvironmentFile=/etc/kubernetes/kubelet
ExecStart=/opt/bin/kubelet $KUBELET_ARGS
Restart=on-failure

[Install]
WantedBy=multi-user.target

在配置文件/etc/kubernetes/kubelet中设置参数$KUBELET_ARGS

KUBELET_ARGS="--api-servers=http://192.168.56.101:8080 --hostname-override=192.168.56.101 --logtostderr=false --log-dir=/var/log/kubernetes --v=2"

其中--hostname-override设置本Node的名称。

部署kube-proxy

kube-proxy对应的二进制文件为kube-proxy,且该服务依赖于network服务。

配置systemd服务文件/lib/systemd/system/kube-proxy.service

[Unit]
Description=Kubernetes Kube-Proxt Server
Documentation=https://github.com/kubernetes
After=network.target
Requires=network.target

[Service]
EnvironmentFile=/etc/kubernetes/proxy
ExecStart=/opt/bin/kube-proxy $KUBE_PROXY_ARGS
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

在配置文件/etc/kubernetes/proxy中设置参数$KUBE_PROXY_ARGS

KUBE_PROXY_ARGS="--master=http://192.168.56.101:8080 --logtostderr=false --log-dir=/var/log/kubernetes --v=2"

然后我们依次启动Node上的服务(Docker安装好以后默认开机自启且已经启动,这里不再启动):

systemctl daemon-reload

systemctl enable kubelet.service
systemctl start kubelet.service
systemctl enable kube-proxy.service
systemctl start kube-proxy.service

待服务都成功启动后,kubelet会主动向Master注册自己所在的Node。如果所有服务都启动成功,我们就可以看到可用的Node了:

ubuntu➜  system kubectl get node
NAME             STATUS    AGE
192.168.56.101   Ready     1h

再在另外一台Node上面也部署一下,就可以看到两个节点了。

至此,本文就介绍完了。不过要应用到生产环境中,我们还有一些安全项和网络项需要配置,后面再介绍。


本文由时间轨迹创作,转载请注明出处及链接。
赞赏


微信赞赏

支付宝赞赏

已有 33 条评论

  1. 时间轨迹

    @all 因为换工作原因,已经很久不搞Kubernetes了,而K8s发展特别快,所以本文的一些方式对于新版本可能已经不太适用了,但因为没有尝试过新版本,所以很多问题无法解答,抱歉。

    时间轨迹 博主 回复
  2. 李

    运行etcd后变成这样了,一直保持这样不动!
    2018-03-27 17:10:59.476632 I | etcdmain: etcd Version: 3.0.4
    2018-03-27 17:10:59.477839 I | etcdmain: Git SHA: d53923c
    2018-03-27 17:10:59.478332 I | etcdmain: Go Version: go1.6.3
    2018-03-27 17:10:59.478870 I | etcdmain: Go OS/Arch: linux/amd64
    2018-03-27 17:10:59.479432 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
    2018-03-27 17:10:59.480142 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
    2018-03-27 17:10:59.481491 I | etcdmain: listening for peers on http://localhost:2380
    2018-03-27 17:10:59.482292 I | etcdmain: listening for client requests on localhost:2379
    2018-03-27 17:10:59.487576 I | etcdserver: name = default
    2018-03-27 17:10:59.488298 I | etcdserver: data dir = default.etcd
    2018-03-27 17:10:59.488821 I | etcdserver: member dir = default.etcd/member
    2018-03-27 17:10:59.489250 I | etcdserver: heartbeat = 100ms
    2018-03-27 17:10:59.489669 I | etcdserver: election = 1000ms
    2018-03-27 17:10:59.490900 I | etcdserver: snapshot count = 10000
    2018-03-27 17:10:59.491330 I | etcdserver: advertise client URLs = http://localhost:2379
    2018-03-27 17:10:59.491860 I | etcdserver: initial advertise peer URLs = http://localhost:2380
    2018-03-27 17:10:59.492152 I | etcdserver: initial cluster = default=http://localhost:2380
    2018-03-27 17:10:59.496832 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
    2018-03-27 17:10:59.497220 I | raft: 8e9e05c52164694d became follower at term 0
    2018-03-27 17:10:59.497466 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
    2018-03-27 17:10:59.497709 I | raft: 8e9e05c52164694d became follower at term 1
    2018-03-27 17:10:59.505315 I | etcdserver: starting server... [version: 3.0.4, cluster version: to_be_decided]
    2018-03-27 17:10:59.506764 I | membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
    2018-03-27 17:10:59.699176 I | raft: 8e9e05c52164694d is starting a new election at term 1
    2018-03-27 17:10:59.699880 I | raft: 8e9e05c52164694d became candidate at term 2
    2018-03-27 17:10:59.700375 I | raft: 8e9e05c52164694d received vote from 8e9e05c52164694d at term 2
    2018-03-27 17:10:59.700841 I | raft: 8e9e05c52164694d became leader at term 2
    2018-03-27 17:10:59.701247 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
    2018-03-27 17:10:59.702513 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster cdf818194e3a8c32
    2018-03-27 17:10:59.703305 I | etcdmain: ready to serve client requests
    2018-03-27 17:10:59.704032 I | etcdserver: setting up the initial cluster version to 3.0
    2018-03-27 17:10:59.704782 E | etcdmain: forgot to set Type=notify in systemd service file?
    2018-03-27 17:10:59.708561 N | membership: set the initial cluster version to 3.0
    2018-03-27 17:10:59.711522 I | api: enabled capabilities for version 3.0
    2018-03-27 17:10:59.712014 N | etcdmain: serving insecure client requests on localhost:2379, this is strongly discouraged!

    回复
  3. 李

    请问手动安装etcd是下载好后直接./etcd就行吗?卡在这里不动了
    2018-03-27 17:05:45.947682 I | etcdmain: etcd Version: 3.0.4
    2018-03-27 17:05:45.949403 I | etcdmain: Git SHA: d53923c
    2018-03-27 17:05:45.949915 I | etcdmain: Go Version: go1.6.3
    2018-03-27 17:05:45.950393 I | etcdmain: Go OS/Arch: linux/amd64
    2018-03-27 17:05:45.950793 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
    2018-03-27 17:05:45.951203 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
    2018-03-27 17:05:45.951659 N | etcdmain: the server is already initialized as member before, starting as etcd member...
    2018-03-27 17:05:45.952481 I | etcdmain: listening for peers on http://localhost:2380
    2018-03-27 17:05:45.953019 I | etcdmain: listening for client requests on localhost:2379
    2018-03-27 17:05:45.954994 I | etcdserver: name = default
    2018-03-27 17:05:45.955506 I | etcdserver: data dir = default.etcd
    2018-03-27 17:05:45.955919 I | etcdserver: member dir = default.etcd/member
    2018-03-27 17:05:45.956340 I | etcdserver: heartbeat = 100ms
    2018-03-27 17:05:45.956743 I | etcdserver: election = 1000ms
    2018-03-27 17:05:45.957158 I | etcdserver: snapshot count = 10000
    2018-03-27 17:05:45.957566 I | etcdserver: advertise client URLs = http://localhost:2379
    2018-03-27 17:05:45.959619 I | etcdserver: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 419
    2018-03-27 17:05:45.960496 I | raft: 8e9e05c52164694d became follower at term 4
    2018-03-27 17:05:45.960948 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 4, commit: 419, applied: 0, lastindex: 419, lastterm: 4]
    2018-03-27 17:05:45.963913 I | etcdserver: starting server... [version: 3.0.4, cluster version: to_be_decided]
    2018-03-27 17:05:45.965996 I | membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
    2018-03-27 17:05:45.966871 N | membership: set the initial cluster version to 3.0
    2018-03-27 17:05:45.967384 I | api: enabled capabilities for version 3.0
    2018-03-27 17:05:46.762603 I | raft: 8e9e05c52164694d is starting a new election at term 4
    2018-03-27 17:05:46.762756 I | raft: 8e9e05c52164694d became candidate at term 5
    2018-03-27 17:05:46.762788 I | raft: 8e9e05c52164694d received vote from 8e9e05c52164694d at term 5
    2018-03-27 17:05:46.762838 I | raft: 8e9e05c52164694d became leader at term 5
    2018-03-27 17:05:46.762870 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 5
    2018-03-27 17:05:46.764991 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster cdf818194e3a8c32
    2018-03-27 17:05:46.768356 I | etcdmain: ready to serve client requests
    2018-03-27 17:05:46.771749 N | etcdmain: serving insecure client requests on localhost:2379, this is strongly discouraged!
    2018-03-27 17:05:46.776294 E | etcdmain: forgot to set Type=notify in systemd service file?

    回复
  4. still

    请问我按照以上文章部署,有几个问题想咨询下您,1.您在编辑完apiserver.service时,/etc/kubernetes下对应的配置文件那里来的?此处我是从其他地方拷贝过来了一份配置文件。2.我在配置完成后系统服务时apiserver、controller-manager 、scheduler 服务都报错了 /var/log/messages 日志显示如下报错:
    Failed at step USER spawning /usr/bin/kube-apiserver: No such process
    Mar 15 15:18:08 localhost systemd: kube-apiserver.service: main process exited, code=exited, status=217/USER
    Mar 15 15:18:08 localhost systemd: Failed to start Kubernetes API Server.
    Mar 15 15:18:08 localhost systemd: Unit kube-apiserver.service entered failed state.
    Mar 15 15:18:08 localhost systemd: kube-apiserver.service failed.
    Mar 15 15:18:09 localhost systemd: kube-apiserver.service holdoff time over, scheduling restart.
    请问是怎么回事?

    still 回复
  5. 可达

    我的k8s集群启动的时候是可以正常kubectl get nodes的,但是kubernetes controller manager显示kube-controller-manager:Start request repeated too quickly.Failed to start kubernetes Controller Manager。这导致pod不能分发到node上,有什么解决办法么?

    可达 回复
  6. lu

    启动报错kubelet,
    ● kubelet.service - Kubernetes Kubelet Server
    Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Active: inactive (dead) (Result: exit-code) since 五 2017-12-22 15:12:35 CST; 4s ago

     Docs: https://github.com/kubernetes

    Process: 2363 ExecStart=/opt/bin/kubelet $KUBELET_ARGS (code=exited, status=203/EXEC)
    Main PID: 2363 (code=exited, status=203/EXEC)

    12月 22 15:12:34 node1 systemd[1]: kubelet.service: Unit entered failed state.
    12月 22 15:12:34 node1 systemd[1]: kubelet.service: Failed with result 'exit-code'.
    12月 22 15:12:35 node1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    12月 22 15:12:35 node1 systemd[1]: Stopped Kubernetes Kubelet Server.
    12月 22 15:12:35 node1 systemd[1]: kubelet.service: Start request repeated too quickly.
    12月 22 15:12:35 node1 systemd[1]: Failed to start Kubernetes Kubelet Server.

    lu 回复
    1. 李
      @lu

      请问这个问题您解决了吗?我也遇到了!

      回复
  7. poison

    你好,据我的实践和查阅文档,好像kubelet的--api-servers参数已经在后面版本取消了,不知道从哪个版本开始的,我实践的1.8.2已经不让用了,好像是用--kubeconfig指定config里说明,然后那个config是由kubectl config生成的,我理解的是这样,但是没验证成功。。。我想楼上几个应该也是因为这个服务没起来吧。

    poison 回复
    1. logic
      @poison

      我查看了一下 是1。8.11版本

      logic 回复
    2. logic
      @poison

      是啊,我也是这一步卡住了,但是我的报错信息是 Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled
      Drop-In: /etc/systemd/system/kubelet.service.d

             └─10-kubeadm.conf

      Active: inactive (dead) (Result: exit-code) since 四 2018-04-12 22:37:43 CST; 59s ago
      我也不知道是什么版本,反正是下载的最新的kubernetes,希望哪位大神可以帮忙答疑,万分感谢

      logic 回复
  8. 小马

    root@master:~# systemctl status kubelet.service
    ● kubelet.service - Kubernetes Kubelet Server
    Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Active: inactive (dead) (Result: exit-code) since Tue 2017-12-05 22:51:37 CST; 3s ago

     Docs: https://github.com/kubernetes

    Process: 3148 ExecStart=/opt/bin/kubelet $KUBELET_ARGS (code=exited, status=2)
    Main PID: 3148 (code=exited, status=2)

    Dec 05 22:51:37 master kubelet[3148]: --tls-private-key-file string File containing x509 private key matching --tls-cert-file.
    Dec 05 22:51:37 master kubelet[3148]: -v, --v Level log level for V logs
    Dec 05 22:51:37 master kubelet[3148]: --version version[=true] Print version information and quit
    Dec 05 22:51:37 master kubelet[3148]: --vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
    Dec 05 22:51:37 master kubelet[3148]: --volume-plugin-dir string <Warning: Alpha feature> The full path of the directory in which to sea
    Dec 05 22:51:37 master kubelet[3148]: --volume-stats-agg-period duration Specifies interval for kubelet to calculate and cache the volume disk u
    Dec 05 22:51:37 master systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    Dec 05 22:51:37 master systemd[1]: Stopped Kubernetes Kubelet Server.
    Dec 05 22:51:37 master systemd[1]: kubelet.service: Start request repeated too quickly.
    Dec 05 22:51:37 master systemd[1]: Failed to start Kubernetes Kubelet Server.

    小马 回复
    1. 爱编程的小猫
      @小马

      你看看是不是没有关闭交换空间.运行swapoff -a 然后再启动试试

      爱编程的小猫 回复
    2. 时间轨迹
      @小马

      检查一下KUBELET_ARGS,感觉好像是配置参数有问题,有些不认识

      时间轨迹 博主 回复
  9. aaron

    根据作者的文章安装kubernetes时在kubelet上总是出现如下错误,请给予指点,谢谢!

    systemctl status kubelet
    ● kubelet.service - Kubernetes Kubelet Server
    Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Active: inactive (dead) (Result: exit-code) since Thu 2017-11-30 10:15:32 CST; 37s ago

     Docs: https://github.com/kubernetes

    Main PID: 1615 (code=exited, status=200/CHDIR)

    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
    Nov 30 10:15:32 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    Nov 30 10:15:32 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
    Nov 30 10:15:32 u3 systemd[1]: kubelet.service: Start request repeated too quickly.
    Nov 30 10:15:32 u3 systemd[1]: Failed to start Kubernetes Kubelet Server.

    cat /var/log/syslog
    Nov 30 10:15:30 u3 systemd[1602]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
    Nov 30 10:15:30 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
    Nov 30 10:15:30 u3 systemd[1]: kubelet.service: Unit entered failed state.
    Nov 30 10:15:30 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    Nov 30 10:15:31 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
    Nov 30 10:15:31 u3 systemd[1]: Started Kubernetes Kubelet Server.
    Nov 30 10:15:31 u3 systemd[1606]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    Nov 30 10:15:31 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
    Nov 30 10:15:31 u3 systemd[1]: Started Kubernetes Kubelet Server.
    Nov 30 10:15:31 u3 systemd[1609]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    Nov 30 10:15:31 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
    Nov 30 10:15:31 u3 systemd[1]: Started Kubernetes Kubelet Server.
    Nov 30 10:15:31 u3 systemd[1612]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    Nov 30 10:15:31 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
    Nov 30 10:15:31 u3 systemd[1]: Started Kubernetes Kubelet Server.
    Nov 30 10:15:31 u3 systemd[1615]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
    Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
    Nov 30 10:15:32 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    Nov 30 10:15:32 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
    Nov 30 10:15:32 u3 systemd[1]: kubelet.service: Start request repeated too quickly.
    Nov 30 10:15:32 u3 systemd[1]: Failed to start Kubernetes Kubelet Server.
    Nov 30 10:15:38 u3 systemd[1]: Reloading.
    Nov 30 10:15:38 u3 systemd[1]: apt-daily.timer: Adding 6h 28min 458.731ms random time.
    Nov 30 10:15:38 u3 systemd[1]: Started ACPI event daemon.
    Nov 30 10:15:44 u3 systemd[1]: Started Kubernetes Kube-Proxt Server.
    Nov 30 10:15:44 u3 kernel: [ 1410.389218] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
    Nov 30 10:15:44 u3 kernel: [ 1410.389235] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
    Nov 30 10:15:44 u3 kernel: [ 1410.389264] IPVS: Creating netns size=2192 id=0
    Nov 30 10:15:44 u3 kernel: [ 1410.389364] IPVS: ipvs loaded.
    Nov 30 10:15:44 u3 systemd[1]: Started Kubernetes systemd probe.
    Nov 30 10:17:01 u3 CRON[1746]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)

    aaron 回复
    1. 时间轨迹
      @aaron

      错误信息里面已经说了:Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory,命令找不到

      时间轨迹 博主 回复
  10. zero

    WorkingDirectory目录没有创建
    用下面命令创建,再运行kubelet.service就OK了
    mkdir -p /var/lib/kubelet

    zero 回复
    1. jiamin
      @zero

      感谢,解决了我的问题

      jiamin 回复
  11. 小毛驴

    楼主,这个systemctl 命令是ubuntu16才有的吗?ubuntu14.04是不是不支持?

    小毛驴 回复
    1. 时间轨迹
      @小毛驴

      对,Ubuntu 15.04以前的版本都是upstart,用service命令管理服务,从15.04及以后才改成systemd系统管理服务,对应的命令是systemctl,当然为了向前兼容,service命令依旧可以用

      时间轨迹 博主 回复
  12. 小毛驴

    楼主,systemctl 这个命令是不是在ubuntu16有,而14.04没有

    小毛驴 回复
  13. xhchen

    在14.04上安装kubernetes,还好有这个guide,非常感谢!

    xhchen 回复
    1. 时间轨迹
      @xhchen

      客气~

      时间轨迹 博主 回复
  14. whui0522

    遇到了小问题, 还请楼主指导!
    我最近在搭k8s集群,配置node的kubelet服务的时候,遇到无法启动的错误!我想问一下,启动kubelet.service失败,可能会有哪些导致的?

    whui0522 回复
    1. 时间轨迹
      @whui0522

      看日志, 很可能是一些配置文件有问题

      时间轨迹 博主 回复
  15. 小马

    有没有好的 k8s有没有好的容器和宿主机一体化监控方案 ,最好包含报警功能?

    小马 回复
    1. 时间轨迹
      @小马

      你可以在Google下

      时间轨迹 博主 回复
  16. 小马

    rpc error: code = 13 desc = transport is closing kuberneter 启动apiserver 报这个错 神马原因 端口都是启动的
    root@ubuntu:~# kubectl get node
    Error from server (ServerTimeout): the server cannot complete the requested operation at this time, try again later (get nodes)

    小马 回复
    1. 时间轨迹
      @小马

      看下apiserver的日志,看看是不是etcd之类的连接不上?

      时间轨迹 博主 回复
      1. 小马
        @时间轨迹

        root@ubuntu:~# systemctl status kube-apiserver.service
        ● kube-apiserver.service - Kubernetes API Server
        Loaded: loaded (/lib/systemd/system/kube-apiserver.service; enabled; vendor preset: enabled)
        Active: active (running) since Mon 2017-08-07 16:35:34 CST; 1min 44s ago

         Docs: https://github.com/kubernetes

        Main PID: 4049 (kube-apiserver)

        Tasks: 10

        Memory: 590.2M

          CPU: 1min 35.344s

        CGroup: /system.slice/kube-apiserver.service

               └─4049 /opt/kubernetes/server/bin/kube-apiserver --etcd_servers=http://127.0.0.1:2379 --insecure-bind-address=0.0.0.0 --insecure-port=8080 --service-cluster-ip-range=172.17.0.0/16 --service-node-port-range=30000-65535 --admission_control=NamespaceLifecycle,Li
        

        Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.173009 4049 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: the server cannot complete the requested operation at th
        Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.173279 4049 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: the server cannot complete the requested operation at this
        Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.173315 4049 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: the server cannot complete the requested operation at t
        Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.173412 4049 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: the server cannot complete the requested operation a
        Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.245799 4049 storage_rbac.go:140] unable to initialize clusterroles: the server cannot complete the requested operation at this time, try again later (get clusterroles.rbac.authorization.k8s.io)
        Aug 07 16:36:36 ubuntu kube-apiserver[4049]: E0807 16:36:36.255947 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
        Aug 07 16:36:45 ubuntu kube-apiserver[4049]: E0807 16:36:45.835551 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
        Aug 07 16:36:53 ubuntu kube-apiserver[4049]: E0807 16:36:53.340572 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
        Aug 07 16:37:00 ubuntu kube-apiserver[4049]: E0807 16:37:00.464829 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
        Aug 07 16:37:08 ubuntu kube-apiserver[4049]: E0807 16:37:08.018637 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing

        小马 回复
        1. 小马
          @小马

          找到问题 了 是因为kuberneters1.6 和etcd的版本 不匹配 etcd 要3.0以上的版本

          小马 回复
  17. someday

    在执行systemctl status <service_name>命令后结果出现running,但是还有E0802这个提示。然后到后面执行kubectl get node后出现提示服务没完全启动,超时;要怎么办呢?

    someday 回复
    1. 时间轨迹
      @someday

      建议你先看下kubernetes的几个进程运行是不是正常,看你描述的症状,应该是哪个进程工作不正常,我估计可能是etcd或者kubelet,你可以看下进程是否运行正常,如果都是running的,再跟一下日志,比如之前我遇到kubernets的新版本和etcd旧版本接口不兼容等问题就会导致你说的这种情况,这种从etcd的日志是可以判断出来的。工作原因,已经好久没搞Kubernetes了,以后估计也不会高了...希望对你有帮助...

      时间轨迹 博主 回复
      1. someday
        @时间轨迹

        非常感谢您的回答,问题的原因在kubelet,配置文件WorkingDirectory路径写错,改正后结果正确,您的教程对我帮助很大,我会一直关注学习的

        someday 回复

添加新评论

选择表情

友情提醒:填写邮箱是当别人回复你的评论时,会给邮箱发邮件提醒。

|