Dev
Kubernetes项目对于开发人员的支持做的很好, 再Github仓库有开放指南, 非常详细. 简单记录以下过程
Kubernetes项目开发语言用的是Go, 目前Go对于包依赖的管理比较混乱. Go1.5以后引入了vendor特性希望解决这个问题, 目前还是experimental状态, Kubernetes项目并没有采用, 而是采用的godep方案, 这是一个需要注意的点, 建议先好好了解一下godep.
先简单介绍一下godep. godep会在kubernetes目录下建立一个Godeps目录, 里面会保存一份kubernetes的依赖包, 包括版本信息, 当用户需要搭建自己的开放环境时, 只需要godep restore
即可, godep会帮你获取依赖包的正确版本, 形成统一的开发环境.
采用了godep, 你的编译命令前面必须加上godep, 例如go build
就需要替换成godep go build
, 这点需要特别注意. 否则你将遇到下面这类问题:
kubelet git:(master)$ go install
# k8s.io/kubernetes/pkg/kubelet/rkt
../../pkg/kubelet/rkt/rkt.go:168: cannot use apisvcConn (type *"google.golang.org/grpc".ClientConn) as type *"github.com/coreos/rkt/Godeps/_workspace/src/google.golang.org/grpc".ClientConn in argument to v1alpha.NewPublicAPIClient
这样还有一个严重问题, 如果GOPATH里面的其他项目和kubernetes依赖同一个包, 比如: A项目和Kubernetes同时依赖B包, 如果A项目更新了B包, Kubernetes项目如果运行godep save ./..
, 也会引入这个更新的B包…
解决这个问题, kubernetes给出的解决方案是, 让kubernets在一个独立的GOPATH, 里面不要有其他项目…
这也是不得已的办法了. 如果还是遇到问题, 可以考虑把Godeps加入到GOPATH: \
godep path`:$GOPATH`, 确保Godeps里面的包优先被使用.
Check out
Kubernetes参考github的开放流程, 先fork项目, 然后再本地
- fork
- mkdir -p ~/workspace/go/kubernetes/k8s.io/kubernetes
- cd ~/workspace/go/kubernetes/k8s.io/kubernetes
- git clone git@github.com:tacy/kubernetes.git .
- git remote add upstream https://github.com/kubernetes/kubernetes.git
Keeping sync
保持自己的代码和主仓库同步
git fetch upstream
git rebase upstream/master
git remote set-url --push upstream no_push #prevent push to upstream if you have write access to the main repository
Godep
kubernetes项目使用godep管理依赖,所以你需要通过godep restore -v
恢复依赖包(自备梯子)
Create branch
- git checkout -b myfeature
Build
你可以直接在kubernetes目录下运行mark命令, 会编译整个kubernetes项目, 也可以编译具体包, 具体参考Makefile文件
比如我需要编译linux平台下的kubelet: KUBE_BUILD_PLATFORMS=linux/amd64 CGO_ENABLED=1 make WHAT=cmd/kubelet GOGCFLAGS='-N -1'
go build -gcflags '-N -l'
你也可以直接到具体目录下, 运行godep go install
, 有时候你会需要这个(例如gocode需要编译后的pkg才能做代码提示).
Committing changes to your fork
Before committing any changes, please link/copy these pre-commit hooks into your .git directory. This will keep you from accidentally committing non-gofmt’d go code.
cd kubernetes/.git/hooks/
ln -s ../../hooks/pre-commit .
Then you can commit your changes and push them to your fork:
git commit
git push -f origin myfeature
Creating a pull request
- Visit https://github.com/$YOUR_GITHUB_USERNAME/kubernetes
- Click the “Compare and pull request” button next to your “myfeature” branch.
- Check out the pull request process for more details
kubelet
main: k8s.io/kubernetes/cmd/kubelet.go
server.go -> RunKubelet
eventbroadcaster.StartLogging 输出kubelet event到log eventbroadcaster.startrecordingtosink 发送kubelet event到apiserver
makePodSourceConfig 接收所有的podupdate事件,通过file,url,api REST
app/server.go -> CreateAndInintKubelet() -> makePodSourceConfig() -> kubeletbootstap.birthcry() ->kubeletbootstap.startgarbagecollection()
APIServer
Schema
- convert 版本之间转换器, convertfunc用(in,out)做key, 这样就能找到正确的convertfunc做转换, 例如Convert_v1alpha1_Flunder_To_wardle_Flunder(in *Flunder, out *wardle.Flunder, s conversion.Scope).
- serializer(decode/encode) 序列化反序列化, request json/yaml ->decode to internal object -> version object -> etcd. serializer在handler中调用
Install
KVM env
Bridge
[tacy@tacyArch network]$ cat qemu.netdev
[NetDev]
Name=qemu0
Kind=bridge
[tacy@tacyArch network]$ cat qemu.network
[Match]
Name=qemu0
[Network]
Address=172.18.0.1/16
DNS=233.5.5.5
IPForward=yes
[tacy@tacyArch network]$ systemctl restart systemd-networkd
Iptables
[tacy@tacyArch ~]$ cat /etc/iptables/iptables.rules
# Generated by iptables-save v1.4.21 on Thu Dec 10 10:31:57 2015
*nat
:PREROUTING ACCEPT [20:3432]
:INPUT ACCEPT [18:2776]
:OUTPUT ACCEPT [886:56198]
:POSTROUTING ACCEPT [886:56198]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.18.0.0/16 ! -o docker0 -j MASQUERADE
COMMIT
# Completed on Thu Dec 10 10:31:57 2015
# Generated by iptables-save v1.4.21 on Thu Dec 10 10:31:57 2015
*filter
:INPUT ACCEPT [57801:42904359]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [52989:24492467]
:DOCKER - [0:0]
-A FORWARD -o qemu0 -j DOCKER
-A FORWARD -o qemu0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i qemu0 ! -o qemu0 -j ACCEPT
-A FORWARD -i qemu0 -o qemu0 -j ACCEPT
COMMIT
# Completed on Thu Dec 10 10:31:57 2015
Dnsmasq
edit /etc/dnsmasq.conf:
interface=qemu0
listen-address=172.18.0.1,127.0.0.1
dhcp-range=172.18.100.1,172.18.100.254,255.255.0.0
Docker custom
让docker和kvm共用同一个bridge, 同时禁用docker的iptables和forward, 设置docker走代理
[tacy@tacyArch ~]$ cat /etc/systemd/system/docker.service.d/custom.conf
[Service]
Environment="HTTP_PROXY=http://127.0.0.1:9001/" "HTTPS_PROXY=http://127.0.0.1:9001/"
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// --exec-opt native.cgroupdriver=cgroupfs -b=qemu0 --fixed-cidr=172.18.1.1/24 --iptables=false --ip-forward=false
[root@localhost ~]# systemctl show --property=Environment docker
Environment=GOTRACEBACK=crash HTTP_PROXY=http://172.18.0.1:9001/ HTTPS_PROXY=http://172.18.0.1:9001/
Create VM
Create VM img #### 1
Download CentosCloud image :
curl http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1510.qcow2 -o CentOS-7-x86_64-GenericCloud.qcow2
qemu-img create -f qcow2 -o backing_file=CentOS-7-x86_64-GenericCloud.qcow2,backing_fmt=qcow2 master.qcow2
qemu-img create -f qcow2 -o backing_file=CentOS-7-x86_64-GenericCloud.qcow2,backing_fmt=qcow2 node-one.qcow2
qemu-img create -f qcow2 -o backing_file=CentOS-7-x86_64-GenericCloud.qcow2,backing_fmt=qcow2 node-two.qcow2
Cloud init
[tacy@tacyArch qemu]$ ls
cloud-init vm
[tacy@tacyArch qemu]$ cd cloud-init/
[tacy@tacyArch cloud-init]$ ls
meta-data user-data
[tacy@tacyArch cloud-init]$ cat meta-data
{}
[tacy@tacyArch cloud-init]$ cat user-data
#cloud-config
users:
- name: root
ssh-authorized-keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJBe4qjPGBqnoE6Up6aB6jBOSBK1aqOjpX8fU8nvneKdzKmH0xTX5nRsfiZdTbJWX7CfjnrA0
[tacy@tacyArch qemu]$ genisoimage -output seed.iso -volid cidata -joliet -rock user-data meta-data
https://cloudinit.readthedocs.org/en/latest/topics/datasources.html#no-cloud
Shell
qemu-ifup.sh / qemu-ifdown.sh
##/etc/qemu-ifup.sh##
#!/bin/sh
echo "Executing /etc/qemu-ifup"
echo "Bringing up $1 for bridged mode..."
sudo /usr/bin/ip link set $1 up promisc on
echo "Adding $1 to qemu0..."
sudo /usr/bin/brctl addif qemu0 $1
sleep 2
##/etc/qemu-ifdown.sh##
#!/bin/sh
echo "Executing /etc/qemu-ifdown"
sudo /usr/bin/ip link set $1 down
sudo /usr/bin/brctl delif qemu0 $1
sudo /usr/bin/ip link delete dev $1
mac address
#!/usr/bin/env python
import sys
import zlib
if len(sys.argv) != 2:
print("usage: %s <VM Name>" % sys.argv[0])
sys.exit(1)
crc = zlib.crc32(sys.argv[1].encode("utf-8")) & 0xffffffff
crc = str(hex(crc))[2:]
print("52:54:%s%s:%s%s:%s%s:%s%s" % tuple(crc))
run-qemu.sh
#!/bin/bash
USERID=$(whoami)
print_usage() {
echo "Usage:
$0 {-n,-m,-i,-sp,-mp) ARG
Options:
--name ARG
-n ARG
--memory ARG
-m ARG
--image ARG
-i ARG
--serialport ARG
-sp ARG
--monitorport ARG
-mp ARG
" >&2
}
if [ $# -le 0 ]; then
print_usage
exit 1
fi
while [[ $# > 1 ]]
do
key="$1"
case $key in
-n|--name)
NAME="$2"
shift # past argument
;;
-m|--memory)
MEMORY="$2"
shift # past argument
;;
-i|--image)
IMAGE="$2"
shift # past argument
;;
-mp|--monitorport)
MP="$2"
shift # past argument
;;
-sp|--serialport)
SP="$2"
shift # past argument
;;
--default)
DEFAULT=YES
;;
*)
# unknown option
;;
esac
shift # past argument or value
done
# Get name of newly created TAP device; see https://bbs.archlinux.org/viewtopic.php?pid=1285079#p1285079
precreationg=$(/usr/bin/ip tuntap list | /usr/bin/cut -d: -f1 | /usr/bin/sort)
sudo /usr/bin/ip tuntap add user $USERID mode tap
postcreation=$(/usr/bin/ip tuntap list | /usr/bin/cut -d: -f1 | /usr/bin/sort)
IFACE=$(comm -13 <(echo "$precreationg") <(echo "$postcreation"))
MACADDR=`/home/tacy/workspace/qemu/vm/qemu-mac-hasher.py ${NAME}`
qemu-system-x86_64 -name ${NAME} -cpu host -m ${MEMORY} -smp cores=2,threads=1,sockets=1 -machine type=pc,accel=kvm -net nic,macaddr=${MACADDR},model=virtio -net tap,vhost=on,ifname="$IFACE" -serial telnet:localhost:${SP},server,nowait,nodelay -monitor tcp:127.0.0.1:${MP},server,nowait,nodelay -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd -drive file=${IMAGE},format=qcow2,cache=writeback,discard=unmap,if=none,id=hd -drive file=/home/tacy/workspace/qemu/vm/seed.iso,if=virtio -nographic
# 注意, 这里cache可以考虑用directsync, 然后加上io=native, 这样磁盘性能应该是最优的.
sudo ip link set dev $IFACE down &> /dev/null
sudo ip tuntap del $IFACE mode tap &> /dev/null
Systemd
## ~/.config/systemd/user/qemu@.service
[Unit]
Description=QEMU virtual machine
[Service]
Environment="type=system-x86_64" "haltcmd=kill -INT $MAINPID"
EnvironmentFile=/home/tacy/workspace/qemu/vm/%i
ExecStart=/usr/bin/env /home/tacy/workspace/qemu/vm/run-qemu.sh -n %i -m $memory -i $image -sp $sp -mp $mp
ExecStop=/bin/sh -c ${haltcmd}
TimeoutStopSec=30
KillMode=none
[Install]
WantedBy=multi-user.target
systemd env file:
## ~/workspace/qemu/vm/master
memory=1024
image=/home/tacy/workspace/qemu/vm/master.qcow2
sp=7101
mp=7001
haltcmd="echo 'system_powerdown' | /usr/bin/nc localhost 7001"
## ~/workspace/qemu/vm/node-one
memory=1024
image=/home/tacy/workspace/qemu/vm/node-one.qcow2
sp=7102
mp=7002
haltcmd="echo 'system_powerdown' | /usr/bin/nc localhost 7002"
## ~/workspace/qemu/vm/node-two
memory=1024
image=/home/tacy/workspace/qemu/vm/node-two.qcow2
sp=7103
mp=7003
haltcmd="echo 'system_powerdown' | /usr/bin/nc localhost 7003"
Start VM & Stop VM
systemctl --user start qemu@master
systemctl --user start qemu@node-one
systemctl --user start qemu@node-two
systemctl --user stop qemu@master
systemctl --user stop qemu@node-one
systemctl --user stop qemu@node-two
VM Config
Docker
yum install docker
##/etc/sysconfig/docker
HTTP_PROXY='http://172.18.0.1:9001'
HTTPS_PROXY='http://172.18.0.1:9001'
SSD tuning ####234
discard
要启用磁盘的discard, 首先用的驱动设备必须是SCSI: -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd -drive file=${IMAGE},format=qcow2,cache=writeback,discard=unmap,if=none,id=hd
然后修改fstab: UUID=ba1b9d4d-f899-4121-bc02-b385767de754 / xfs defaults,discard,nobarrier,noatime 0 0
确认是否成功启用discard:
[root@localhost ~]# lsblk -o MOUNTPOINT,DISC-MAX,FSTYPE
MOUNTPOINT DISC-MAX FSTYPE
1G
/ 1G xfs
0B
ioscheduler
[root@localhost ~]# cat /etc/tmpfiles.d/10_ioscheduler.conf
w /sys/block/sda/queue/scheduler - - - - noop
Kubernetes Cluster Setup
Pre request
下载需要的软件:
wget https://github.com/kubernetes/kubernetes/releases/download/v1.1.2/kubernetes.tar.gz
wget https://github.com/projectcalico/calico-kubernetes/archive/master.tar.gz
wget https://github.com/coreos/etcd/releases/download/v2.2.2/etcd-v2.1.3-linux-amd64.tar.gz
另外需要修改所有节点的主机名(包括master, node-one, node-two), Calico使用主机名作为关键的信息.
Etcd & Calico
Kubernetes/Calico/Skydns都需要使用Etcd, 我们只配置一个Etcd供所有的组件使用. Calico为Kuernetes提供一个BGP网络(路由), 而不是Overlay网络, Calico在两层的使用很简单, 三层使用需要仔细规划.
Master
scp etcd calicoctl to master
- etcd
[root@master bin]# cat /etc/systemd/system/etcd.service
[Unit]
Description=Etcd service
Documentation=https://coreos.com/etcd/docs/latest/
Requires=docker.service
After=docker.service
[Service]
ExecStart=/usr/bin/etcd \
--data-dir=/var/lib/etcd \
--advertise-client-urls=http://172.18.100.187:6666 \ #MASTER IP
--listen-client-urls=http://0.0.0.0:6666 \
--listen-peer-urls=http://127.0.0.1:2380 \
--name=etcd
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
enable etcd & start etcd: systemctl start etcd && systemctl enable etcd
- calico
[root@master bin]# cat /etc/systemd/system/calico-node.service
[Unit]
Description=calicoctl node
After=docker.service
Requires=docker.service
[Service]
User=root
Environment="ETCD_AUTHORITY=127.0.0.1:6666"
PermissionsStartOnly=true
ExecStart=/usr/bin/calicoctl node --detach=false
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
start & enable calico: systemctl enable calico && systemctl start calico
在宿主机上操作Etcd和Calico:
etcdctl -C 172.18.100.187:6666 ls --recursive
ETCD_AUTHORITY=172.18.100.187:6666 calicoctl pool show
calico里面缺省的pool是192网段, 这个地址段很容易冲突, 你可以通过calicoctl重新定义你自己的网段(我修改成了172.19.0.0/16).
Node
[root@node-one ~]# cat /etc/network-environment
#! /usr/bin/bash
# This node's IPv4 address
DEFAULT_IPV4=172.18.100.122
# The Kubernetes master IP
KUBERNETES_MASTER=172.18.100.187
# IP and port of etcd instance used by Calico
ETCD_AUTHORITY=172.18.100.187:6666
# URL to access the Kubernetes apiserver
KUBE_API_ROOT=http://172.18.100.187:8080/api/v1/
# Enable Calcio IPAM
CALICO_IPAM=true
注意, 每个Node需要修改DEFAULT_IPV4地址为自己的IP地址
[root@node-one ~]# cat /etc/systemd/system/calico-node.service
[Unit]
Description=Calico per-node agent
Documentation=https://github.com/projectcalico/calico-docker
Requires=docker.service
After=docker.service
[Service]
EnvironmentFile=/etc/network-environment
User=root
PermissionsStartOnly=true
#ExecStart=/usr/bin/calicoctl node --ip=${DEFAULT_IPV4} --kubernetes --kube-plugin-version=v0.6.1 --detach=false
# use CNI: https://github.com/projectcalico/calico-cni
ExecStart=/usr/bin/calicoctl node --ip=${DEFAULT_IPV4} --detach=false
Restart=always
RestartSec=10
start & enable calico: systemctl enable calico && systemctl start calico
Kubernetes
Master
scp kube-apiserver kubectl kube-scheduler kube-controller-manager kubelet to master
[root@master bin]# cat /etc/systemd/system/kube-apiserver.service
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes
Requires=etcd.service
After=etcd.service
[Service]
ExecStart=/usr/bin/kube-apiserver \
--allow-privileged=true \
--etcd-servers=http://127.0.0.1:6666 \
--insecure-bind-address=0.0.0.0 \
--service-cluster-ip-range=10.100.0.0/24 \
--logtostderr=true
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
[root@master bin]# cat /etc/systemd/system/kube-scheduler.service
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes
Requires=kube-apiserver.service
After=kube-apiserver.service
[Service]
ExecStart=/usr/bin/kube-scheduler \
--master=127.0.0.1:8080 \
--logtostderr=true
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
[root@master bin]# cat /etc/systemd/system/kube-controller-manager.service
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes
Requires=kube-apiserver.service
After=kube-apiserver.service
[Service]
ExecStart=/usr/bin/kube-controller-manager \
--master=127.0.0.1:8080 \
--logtostderr=true
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
[root@master bin]# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
Requires=docker.service
After=docker.service
[Service]
ExecStart=/usr/bin/kubelet \
--config=/etc/kubernetes/manifests \
--logtostderr=true
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
start & enable all kube service
Node
mkdir -p /opt/cni/bin sudo wget -N -P /opt/cni/bin/ https://github.com/projectcalico/calico-cni/releases/download/v0.2.0/calico sudo wget -N -P /opt/cni/bin/ https://github.com/projectcalico/calico-cni/releases/download/v0.2.0/calico-ipam
mkdir -p /etc/cni/net.d/ cat /etc/cni/net.d/10-calico.conf
{ "name": "calico-k8s-network", "type": "calico", "etcd_authority": "172.18.100.187:6666", "log_level": "debug", "ipam": { "type": "calico-ipam" } }
[root@node-one ~]# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=calico-node.service
Requires=calico-node.service
[Service]
EnvironmentFile=/etc/network-environment
ExecStart=/usr/bin/kubelet \
--address=0.0.0.0 \
--port=10250 \
--hostname_override=${DEFAULT_IPV4} \
--cluster-dns=10.100.0.10 \
--cluster-domain=cluster.local \
--api_servers=${KUBERNETES_MASTER}:8080 \
--network-plugin=cni \
--network-plugin-dir=/etc/cni/net.d \
--logtostderr=true
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
[root@node-one ~]# cat /etc/systemd/system/kube-proxy.service
[Unit]
Description=Kubernetes Proxy
Documentation=https://github.com/kubernetes/kubernetes
After=calico-node.service
Requires=calico-node.service
[Service]
EnvironmentFile=/etc/network-environment
ExecStart=/usr/bin/kube-proxy --master=http://${KUBERNETES_MASTER}:8080 --logtostderr=true --proxy-mode=iptables
Restart=always
RestartSec=10
Kubeadm
docker images https://quay.io/repository/mritd/kubernetes-dashboard-amd64?tab=tags kubectl proxy –address 172.18.100.28 –accept-hosts=‘^*$’
disable firewalld disable selinux set hostname “–cgroup-driver=systemd –pod-infra-container-image=tacylee/pause-amd64:3.0” export KUBE_REPO_PREFIX=tacylee export KUBECONFIG=/etc/kubernetes/admin.conf
编译rpm在kubernetes/release仓库.
Using Kubernetes
start & stop env
docker start {ceph_container_id}
systemctl --user start qemu@master
systemctl --user start qemu@node-one
systemctl --user start qemu@node-two
一些常用命令:
ETCD_AUTHORITY=172.18.100.187:6666 ./calicoctl endpoint show --detailed
etcdctl -C 172.18.100.187:6666 ls --recursive /calico
label nodes
kubectl -s 172.18.100.187:8080 label nodes '172.18.100.122' skydns=server
skydns
服务发现扩展组件, 通过DNS实现, Kubernetes的DNS发现机制实现灵活: * 可以为Service建立DNS条目
Service本身只是一个虚IP, 并不真正提供接入服务, 不用担心健康问题.
- 也可以为Pods建立多个DNS条目
如果你希望采用自己的服务发现组件, 你也可以选择为Pods建立一组DNS条目, 然后自己实现负载均衡机制
使用skydns只需在启动kubelet的时候, 设置cluster-dns启动参数, Container就用会用该IP当DNS Server.
Create skydns into the label skydns=server of node
kubectl -s 172.18.100.187:8080 create -f skydns-rc.yaml
kubectl -s 172.18.100.187:8080 create -f skydns-svc.yaml
查看skydns运行情况:
kubectl -s 172.18.100.187:8080 get -w wide rc --all-namespace
kubectl -s 172.18.100.187:8080 get -w wide pods --all-namespace
kubectl -s 172.18.100.187:8080 exec busybox nslookup kubernetes.default
Server: 10.100.0.10
Address 1: 10.100.0.10
Name: kubernetes.default
Address 1: 10.100.0.1
kubectl -s 172.18.100.187:8080 exec busybox nslookup kube-ui.kube-system
Server: 10.100.0.10
Address 1: 10.100.0.10
Name: kube-ui.kube-system
Address 1: 10.100.0.243
//skydns-rc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v8
namespace: kube-system
labels:
k8s-app: kube-dns
version: v8
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v8
template:
metadata:
labels:
k8s-app: kube-dns
version: v8
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.11
resources:
limits:
cpu: 100m
memory: 50Mi
args:
# command = "/kube2sky"
- -domain=cluster.local
- -kube_master_url=http://172.18.100.187:8080
- -etcd-server=http://172.18.100.187:6666
- name: skydns
image: gcr.io/google_containers/skydns:2015-03-11-001
resources:
limits:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://172.18.100.187:6666
- -addr=0.0.0.0:53
- -domain=cluster.local.
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
limits:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local localhost >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
dnsPolicy: Default # Don't use cluster DNS.
nodeSelector:
skydns: server
\\skydns-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.100.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
Debug
kubectl run -i –tty busybox –image=busybox – sh
E2E Performance Test
There is an end-to-end test for collecting overall resource usage of node components:
kubelet_perf.go. To
run the test, simply make sure you have an e2e cluster running (go run hack/e2e.go -up
) and
set up correctly.
Run the test with go run hack/e2e.go -v -test
--test_args="--ginkgo.focus=resource\susage\stracking"
. You may also wish to customise the number of
pods or other parameters of the test (remember to rerun make WHAT=test/e2e/e2e.test
after you do).
Profiling
Kubelet installs the go pprof handlers, which can be queried for CPU profiles:
$ kubectl proxy &
Starting to serve on 127.0.0.1:8001
$ curl -G "http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/profile?seconds=${DURATION_SECONDS}" > $OUTPUT
$ KUBELET_BIN=_output/dockerized/bin/linux/amd64/kubelet
$ go tool pprof -web $KUBELET_BIN $OUTPUT
pprof
can also provide heap usage, from the /debug/pprof/heap
endpoint
(e.g. http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/heap
).
More information on go profiling can be found here.
Benchmarks
Before jumping through all the hoops to measure a live Kubernetes node in a real cluster, it is worth considering whether the data you need can be gathered through a Benchmark test. Go provides a really simple benchmarking mechanism, just add a unit test of the form:
// In foo_test.go
func BenchmarkFoo(b *testing.B) {
b.StopTimer()
setupFoo() // Perform any global setup
b.StartTimer()
for i := 0; i < b.N; i++ {
foo() // Functionality to measure
}
}
Then:
$ go test -bench=. -benchtime=${SECONDS}s foo_test.go
More details on benchmarking here.
DNS
kube-ui
Kubernetes portal, 新的项目是Dashboard, 部署kube-ui通过下面yaml文件. 注意在svc文件中定义了type和nodeport, 你可以在集群外部通过任意node的ip访问到kube-ui.
\\kube-ui-rc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-ui-v4
namespace: kube-system
labels:
k8s-app: kube-ui
version: v4
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-ui
version: v4
template:
metadata:
labels:
k8s-app: kube-ui
version: v4
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: kube-ui
image: gcr.io/google_containers/kube-ui:v4
resources:
limits:
cpu: 100m
memory: 50Mi
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 30
timeoutSeconds: 5
nodeSelector:
kube-ui: server
\\kube-ui-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: kube-ui
namespace: kube-system
labels:
k8s-app: kube-ui
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeUI"
spec:
selector:
k8s-app: kube-ui
clusterIP: 10.100.0.243
type: NodePort
ports:
- port: 80
targetPort: 8080
nodePort: 30061
Volumes
Kubernetes支持Volume, Container可以和Volume绑定, 不同的Volume有不同的特性, 例如当容器销毁的时候, rdb volume不会销毁.
Ceph in Docker
如果你没ceph环境, 可以通过下面这个命令创建一个基于容器的5. 注意: 在我的Archlinux环境, 必须用特权容器, 否则rbd map的时候抛只读文件系统错误.
#!/bin/bash
sudo docker run -d --net=host --privileged=true --name ceph-cluster \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph/ \
-e MON_IP=172.18.0.1 \
-e CEPH_NETWORK=172.18.0.0/16 \
ceph/demo
验证ceph状态:
[tacy@tacyArch ~]# docker exec {CONTAINER_ID} ceph status
cluster 9f487f28-d328-4f1a-bdb2-737006c8d7e9
health HEALTH_OK
monmap e1: 1 mons at {tacyArch=172.18.0.1:6789/0}
election epoch 1, quorum 0 tacyArch
mdsmap e10: 1/1/1 up {0=0=up:active}
osdmap e22: 1 osds: 1 up, 1 in
flags sortbitwise
pgmap v28: 128 pgs, 9 pools, 2808 bytes data, 190 objects
28159 MB used, 170 GB / 208 GB avail
120 active+clean
8 active+clean+replay
client io 81925 B/s rd, 0 B/s wr, 133 op/s
Use rdb volume
Conf node
yum install ceph-common
Create ceph secret
[tacy@tacyArch ~]# docker exec {CEPH_CONTAINER_ID} ceph auth get-key client.admin
AQC8dINWKUyBNxAARQN5Fz0xNmltCSyRz0924A==
[tacy@tacyArch ~]# echo 'AQC8dINWKUyBNxAARQN5Fz0xNmltCSyRz0924A==' |base64
QVFDOGRJTldLVXlCTnhBQVJRTjVGejB4Tm1sdENTeVJ6MDkyNEE9PQo=
Edit your ceph-secret.yml with the base64 key:
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
data:
key: QVFCQU1vMVZxRTFPTWhBQVZwRVJQY3lRVTVwelU2SU9KMjJ4MXc9PQo=
Add your secret to Kubernetes:
$ kubectl create -f ceph-secret.yaml
$ kubectl get secret
NAME TYPE DATA
ceph-secret Opaque 1
Create ceph image
$ docker exec {CEPH_CONTAINER_ID} rbd create foo -s 100
$ docker exec {CEPH_CONTAINER_ID} rbd map foo
$ docker exec {CEPH_CONTAINER_ID} rbd mkfs.ext4 /dev/rbd0
$ docker exec {CEPH_CONTAINER_ID} rbd unmap /dev/rbd0
Create pod with foo
tips
API
fieldSelect67
通过kubectl -v=8
能看到所有的kubectl关于api的调用.
kubectl proxy
curl http://127.0.0.1:8001/api/v1/pods?fielSelector=spec.nodeName=primeton-tacy-k8s-node2,metadata.namespace=default
Code
kube-proxy
两个功能: 第一负责监听endpoint变化, 建立iptables规则, 第二负责userspace的代理, 如果代理模式选择userspace的话
kubelet
cmd/kubelet/server.go:RunKubelet 进入 -> CreateAndInitKubelet -> pkg/kubelet.go: NewMainKubelet -> makePodSourceConfig ->pkg/kubelet/config/apiserver.go:NewSourceApiserver 这是监控apiserver, 获取所有的pod update事件
然后主gorouinte会扫描podupdate channel, 处理事件
初始化kubelet配置
kubelet.go: s := options.NewKubeletServer()
options.go: api.Scheme.Convert(&v1alpha1.KubeletConfiguration{}, &config, nil)
这里引入v1alpha1包, 利用v1alpha1包下面的register.go文件中的下面代码注入缺省配置:
var (
SchemeBuilder = runtime.NewSchemeBuilder(addKnownTypes, addDefaultingFuncs)
AddToScheme = SchemeBuilder.AddToScheme
)
其中的addDefaultingFuncs:
func addDefaultingFuncs(scheme *kruntime.Scheme) error {
return scheme.AddDefaultingFuncs(
SetDefaults_KubeProxyConfiguration,
SetDefaults_KubeSchedulerConfiguration,
SetDefaults_LeaderElectionConfiguration,
SetDefaults_KubeletConfiguration,
)
}
kubelet就是通过SetDefaults_KubeletConfiguration方法初始化配置缺省值.
具体调用:
return s.converter.Convert(in, out, flags, meta)
return c.doConversion(src, dest, flags, meta, c.convert)
return f(sv, dv, s)
这里的f就是动态注入的SetDefaults_KubeletConfiguration
Example
ThirdPartyResource
https://github.com/tiaanl/kube-tpr-demo https://github.com/wfarr/k8s-tpr-playground https://github.com/kubernetes/kubernetes/pull/43027 https://github.com/kubernetes/client-go/issues/8 https://groups.google.com/forum/#!topic/kubernetes-sig-network/igJrjG-v-Cs
Usage
rolling update
滚动更新, 用一个新版本的rc替换就版本的rc, 也可以直接更新image, 不用新写rc文件 Deployment支持recreate和rollupdate两种模式, 但是目前没法支持restart之类的需求
访问集群
kubectl proxy –accept-hosts=‘^*‘, –address=‘0.0.0.0’ http://172.18.100.28:8001/api/v1/proxy/nodes/kube-node1.mytacy.com:10250/stats/summary http://localhost:8080/api/v1/namespaces/kube-system/services/elasticsearch-logging/proxy/
Issue
Arch
Volume
- 实现类似docker的volume共享: Idea: New volume type: “container” #831
Schedule
security
- async addmission control Extension of Admission Control via Initializers and External Admission Enforcement
API Machine
sig
configmap
Use Cases
As a user, I want to be able to consume configuration data as environment variables. As a user, I want to be able to consume configuration data as files in a volume. As a user, I want my view of configuration data in files to be eventually consistent with changes to the data.
如果作为Volume使用, 修改能被动态传播到Pod中, 用户可以监控文件实现功能, 如果作为环境变量, 不会传播
Secret volume should refresh when secrets are updated #18372, 这个issue已经fix, 也就是说目前支持secret的refresh(Pod可见, 之前必须删除secret然后建立一个新的) 里面提到:
configmap的两种适用模式
I think there are 2 patterns people will use with ConfigMap:
Update the ConfigMap and expect it to immediately propagate to all instances. They may watch their configurations using using inotify, or expect a HUP, or just restart. This is useful for applications where all replicas need consistent configuration. I'd guess etcd and Zookeeper are in this category.
Create a new ConfigMap, update the pod template to reference it, and roll it out via rolling update. This is useful when replicas don't need identical configuration and one is worried about pushing bad configs, which is a common source of failure.
Updating ConfigMap and not propagating the change is just confusing. Expecting users to kill pods in order to implicitly pick up changes lacks transparency and predictability, which is why we moved away from that approach in rolling update.
Other than "it was simpler to implement", what's the rationale for not propagating Secret updates?
下面谈到configmap update的几种情况
Re: picking up new config:
The user-centric question for configmap updates (and secrets now that we are going to allow updates) is I updated my configMap/Secret... Now when should I expect to see my app use the new files?
If we think the system should be able to answer this for the user, then it would need to know the semantics of the application in the container. We thought there were 3 styles of applications:
apps that need to restart to read config.
apps that reread config after a signal (HUP being the classic example)
apps that poll or inotify to detect file changes and reload
Users of secrets and configmap need to be aware which type of app they have, so they can know what the steps are to complete a config update / secret rotation for their app.
I wonder if we should define something like:
type ConfigPushAction string
const (
// Restart means that the app requires restart to pick up changes to a configuration file.
ConfigPushActionRestart ConfigPushAction = "Restart"
// Hup means that the app requires a sighup to pickup changes to a configuration file.
ConfigPushActionHup ConfigPushAction = "Hup"
// None means that the app can detect new config files and so requires no action after config is pushed.
ConfigPushActionNone ConfigPushAction = "None"
)
...
type Container struct {
...
ConfigPushAction ConfigPushAction `omitempty, name:configPushAction`
...
}
It would get tricky though when you start to get into which pid to signal (maybe better to use the ExecProbe model instead, or for special cases). It also gets tricky if different files have different semantics.
But, if you get it right, then you can automate more of the update process.
无法动态更新配置
configmap目前无法滚动更新, 例如你更新了configmap的内容, 但是rc或者dc不会有任何操作, 没法做rollupdate, 目前的做法是, 你必须创建一个新的configmap, 然后修改dc或者rc, 引用到这个新的configmap, 然后触发dc rollupdate, 完成之后删除旧的configmap, 参见: Facilitate ConfigMap rollouts / management #22368
secret存在同样的问题…
openshift相关的configmap问题: Provide an option to redeploy deployment when config map used changes#9146
类似的问题, 关于secret的更新问题Trigger to redeploy containers when a secret changes:#7019
目前尚不支持In-place rolling updates #9043, 也就是说rollingupdate只能重新调度新的pod, 这对于有状态的pod不适用
Canarying mechanism for ConfigMap #20200, 这里也希望能in-place update
目前官方推荐的解法:
I agree with @thockin's last proposal. The right thing to do here is create a new ConfigMap and do a rolling update to switch to it, using the new Deployment API.
Thockin的一个issue, 支持动态配置Feature request: A way to signal pods #24957
log
关于日志的讨论Kubernetes logging, journalD, fluentD, and Splunk, oh my! #24677, 相关的doc: What I would like from Kubernetes Logging Volumes.
[WIP] kubelet-cri: create a new logging mechanism #33111
CRI: Add kuberuntime container logs #35348
service
支持dns服务外部注入
支持类似rds的外部service注入, 例如你在aws上有一个rds, kubernetes集群里面的应用需要引用, 可以看这个Proposal:Add proposal for service externalName #29073 用法类似这样:
apiVersion: v1
kind: Service
metadata:
name: my-rds
spec:
ports:
- port: 12345
type: ExternalName
externalName: myapp.rds.whatever.aws.says
支持node-local service
有些daemonset pod, 通过service暴露给内部应用, 但是service有个问题, 他会随机选择一个pod提供服务. 这就带来一个问题, 例如fluentd, 首先这会带来跨节点流量问题, 本来同一node上的pod只需要把日志吐给本地的fluentd pod, 但是如果通过service, 可能会吐到其他node去, 这是明显不合理的, 尤其如果日志需要Node信息的时候.
解决这个问题, 参考这个Proposal: Initial proposal for node-local services #28637
LB
有很多值得学习的东西use iptables for proxying instead of userspace #3760
解决External LB两跳和SNAT问题
monitor
cadvisor支持application metrics, 通过label和container volume实现: Introduce direct API for application metrics #1016 .
kubelet集成cadvisor之后, 导致kubelet性能消耗很大, 带来很多问题: Standalone cAdvisor for monitoring #18770, 主要是影响kubelet稳定性和性能问题 Provide an option to disable/mock cadvisor in kubelet #16296, 而且有的用户并不使用cadvisor, 目前需要重新设计, 分离kubelet和cadvisor, 分离之后, cadvisor没有pod label, 对于使用cadvisor监控的人来说, 无法做pod聚合, 目前希望cadvisor提供pod label: cAdvisor should export pod labels for container metrics #32326.
新定义的容器运行时接口,定义了container metrics, 相关issueA better story of container metrics for runtime integration #27097.
新的监控架构设计 Add monitoring architecture #34758.
Proposal: Introduce Custom Metrics API #34586
Footnote
- http://events.linuxfoundation.org/sites/events/files/slides/p0.pp_.pdf [return]
- https://wiki.archlinux.org/index.php/Solid_State_Drives [return]
- https://chrisirwin.ca/posts/discard-with-kvm/ [return]
- https://wiki.netbsd.org/tutorials/how_to_setup_virtio_scsi_with_qemu/ [return]
- https://github.com/ceph/ceph-docker/tree/master/demo [return]
- Allow fieldSelectors to match arbitrary values [return]
- Generic field selectors #1362 [return]