Dev

Kubernetes项目对于开发人员的支持做的很好, 再Github仓库有开放指南, 非常详细. 简单记录以下过程

Kubernetes项目开发语言用的是Go, 目前Go对于包依赖的管理比较混乱. Go1.5以后引入了vendor特性希望解决这个问题, 目前还是experimental状态, Kubernetes项目并没有采用, 而是采用的godep方案, 这是一个需要注意的点, 建议先好好了解一下godep.

先简单介绍一下godep. godep会在kubernetes目录下建立一个Godeps目录, 里面会保存一份kubernetes的依赖包, 包括版本信息, 当用户需要搭建自己的开放环境时, 只需要godep restore即可, godep会帮你获取依赖包的正确版本, 形成统一的开发环境.

采用了godep, 你的编译命令前面必须加上godep, 例如go build就需要替换成godep go build, 这点需要特别注意. 否则你将遇到下面这类问题:

kubelet git:(master)$ go install
# k8s.io/kubernetes/pkg/kubelet/rkt
../../pkg/kubelet/rkt/rkt.go:168: cannot use apisvcConn (type *"google.golang.org/grpc".ClientConn) as type *"github.com/coreos/rkt/Godeps/_workspace/src/google.golang.org/grpc".ClientConn in argument to v1alpha.NewPublicAPIClient

这样还有一个严重问题, 如果GOPATH里面的其他项目和kubernetes依赖同一个包, 比如: A项目和Kubernetes同时依赖B包, 如果A项目更新了B包, Kubernetes项目如果运行godep save ./.., 也会引入这个更新的B包…

解决这个问题, kubernetes给出的解决方案是, 让kubernets在一个独立的GOPATH, 里面不要有其他项目…

这也是不得已的办法了. 如果还是遇到问题, 可以考虑把Godeps加入到GOPATH: \godep path`:$GOPATH`, 确保Godeps里面的包优先被使用.

Check out

Kubernetes参考github的开放流程, 先fork项目, 然后再本地

fork
mkdir -p ~/workspace/go/kubernetes/k8s.io/kubernetes
cd ~/workspace/go/kubernetes/k8s.io/kubernetes
git clone git@github.com:tacy/kubernetes.git .
git remote add upstream https://github.com/kubernetes/kubernetes.git

Keeping sync

保持自己的代码和主仓库同步

git fetch upstream
git rebase upstream/master
git remote set-url --push upstream no_push  #prevent push to upstream if you have write access to the main repository

Godep

kubernetes项目使用godep管理依赖，所以你需要通过godep restore -v恢复依赖包(自备梯子)

Create branch

git checkout -b myfeature

Build

你可以直接在kubernetes目录下运行mark命令, 会编译整个kubernetes项目, 也可以编译具体包, 具体参考Makefile文件

比如我需要编译linux平台下的kubelet: KUBE_BUILD_PLATFORMS=linux/amd64 CGO_ENABLED=1 make WHAT=cmd/kubelet GOGCFLAGS='-N -1'

go build -gcflags '-N -l'

你也可以直接到具体目录下, 运行godep go install, 有时候你会需要这个(例如gocode需要编译后的pkg才能做代码提示).

Committing changes to your fork

Before committing any changes, please link/copy these pre-commit hooks into your .git directory. This will keep you from accidentally committing non-gofmt’d go code.

cd kubernetes/.git/hooks/
ln -s ../../hooks/pre-commit .

Then you can commit your changes and push them to your fork:

git commit
git push -f origin myfeature

Creating a pull request

Visit https://github.com/$YOUR_GITHUB_USERNAME/kubernetes
Click the “Compare and pull request” button next to your “myfeature” branch.
Check out the pull request process for more details

kubelet

main: k8s.io/kubernetes/cmd/kubelet.go

server.go -> RunKubelet

eventbroadcaster.StartLogging 输出kubelet event到log eventbroadcaster.startrecordingtosink 发送kubelet event到apiserver

makePodSourceConfig 接收所有的podupdate事件,通过file,url,api REST

app/server.go -> CreateAndInintKubelet() -> makePodSourceConfig() -> kubeletbootstap.birthcry() ->kubeletbootstap.startgarbagecollection()

APIServer

Schema

convert 版本之间转换器, convertfunc用(in,out)做key, 这样就能找到正确的convertfunc做转换, 例如Convert_v1alpha1_Flunder_To_wardle_Flunder(in *Flunder, out *wardle.Flunder, s conversion.Scope).
serializer(decode/encode) 序列化反序列化, request json/yaml ->decode to internal object -> version object -> etcd. serializer在handler中调用

Install

KVM env

Bridge

[tacy@tacyArch network]$ cat qemu.netdev
[NetDev]
Name=qemu0
Kind=bridge
[tacy@tacyArch network]$ cat qemu.network
[Match]
Name=qemu0

[Network]
Address=172.18.0.1/16
DNS=233.5.5.5
IPForward=yes

[tacy@tacyArch network]$ systemctl restart systemd-networkd

Iptables

[tacy@tacyArch ~]$ cat /etc/iptables/iptables.rules
# Generated by iptables-save v1.4.21 on Thu Dec 10 10:31:57 2015
*nat
:PREROUTING ACCEPT [20:3432]
:INPUT ACCEPT [18:2776]
:OUTPUT ACCEPT [886:56198]
:POSTROUTING ACCEPT [886:56198]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.18.0.0/16 ! -o docker0 -j MASQUERADE
COMMIT
# Completed on Thu Dec 10 10:31:57 2015
# Generated by iptables-save v1.4.21 on Thu Dec 10 10:31:57 2015
*filter
:INPUT ACCEPT [57801:42904359]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [52989:24492467]
:DOCKER - [0:0]
-A FORWARD -o qemu0 -j DOCKER
-A FORWARD -o qemu0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i qemu0 ! -o qemu0 -j ACCEPT
-A FORWARD -i qemu0 -o qemu0 -j ACCEPT
COMMIT
# Completed on Thu Dec 10 10:31:57 2015

Dnsmasq

edit /etc/dnsmasq.conf:

interface=qemu0
listen-address=172.18.0.1,127.0.0.1
dhcp-range=172.18.100.1,172.18.100.254,255.255.0.0

Docker custom

让docker和kvm共用同一个bridge, 同时禁用docker的iptables和forward, 设置docker走代理

[tacy@tacyArch ~]$ cat /etc/systemd/system/docker.service.d/custom.conf
[Service]
Environment="HTTP_PROXY=http://127.0.0.1:9001/" "HTTPS_PROXY=http://127.0.0.1:9001/"
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// --exec-opt native.cgroupdriver=cgroupfs -b=qemu0 --fixed-cidr=172.18.1.1/24 --iptables=false --ip-forward=false

[root@localhost ~]# systemctl show --property=Environment docker
Environment=GOTRACEBACK=crash HTTP_PROXY=http://172.18.0.1:9001/ HTTPS_PROXY=http://172.18.0.1:9001/

Create VM

Create VM img #### ¹

Download CentosCloud image : curl http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1510.qcow2 -o CentOS-7-x86_64-GenericCloud.qcow2

qemu-img create -f qcow2 -o backing_file=CentOS-7-x86_64-GenericCloud.qcow2,backing_fmt=qcow2 master.qcow2
qemu-img create -f qcow2 -o backing_file=CentOS-7-x86_64-GenericCloud.qcow2,backing_fmt=qcow2 node-one.qcow2
qemu-img create -f qcow2 -o backing_file=CentOS-7-x86_64-GenericCloud.qcow2,backing_fmt=qcow2 node-two.qcow2

Cloud init

[tacy@tacyArch qemu]$ ls
cloud-init  vm
[tacy@tacyArch qemu]$ cd cloud-init/
[tacy@tacyArch cloud-init]$ ls
meta-data  user-data
[tacy@tacyArch cloud-init]$ cat meta-data
{}
[tacy@tacyArch cloud-init]$ cat user-data
#cloud-config
users:
  - name: root
    ssh-authorized-keys:
    - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJBe4qjPGBqnoE6Up6aB6jBOSBK1aqOjpX8fU8nvneKdzKmH0xTX5nRsfiZdTbJWX7CfjnrA0
[tacy@tacyArch qemu]$ genisoimage  -output seed.iso -volid cidata -joliet -rock user-data meta-data

https://cloudinit.readthedocs.org/en/latest/topics/datasources.html#no-cloud

Shell

qemu-ifup.sh / qemu-ifdown.sh

##/etc/qemu-ifup.sh##
#!/bin/sh

echo "Executing /etc/qemu-ifup"
echo "Bringing up $1 for bridged mode..."
sudo /usr/bin/ip link set $1 up promisc on
echo "Adding $1 to qemu0..."
sudo /usr/bin/brctl addif qemu0 $1
sleep 2

##/etc/qemu-ifdown.sh##
#!/bin/sh

echo "Executing /etc/qemu-ifdown"
sudo /usr/bin/ip link set $1 down
sudo /usr/bin/brctl delif qemu0 $1
sudo /usr/bin/ip link delete dev $1

mac address

#!/usr/bin/env python

import sys
import zlib

if len(sys.argv) != 2:
    print("usage: %s <VM Name>" % sys.argv[0])
    sys.exit(1)

crc = zlib.crc32(sys.argv[1].encode("utf-8")) & 0xffffffff
crc = str(hex(crc))[2:]
print("52:54:%s%s:%s%s:%s%s:%s%s" % tuple(crc))

run-qemu.sh

#!/bin/bash
USERID=$(whoami)

print_usage() {
  echo "Usage:

  $0 {-n,-m,-i,-sp,-mp) ARG

Options:

  --name ARG
  -n ARG

  --memory ARG
  -m ARG

  --image ARG
  -i ARG

  --serialport ARG
  -sp ARG

  --monitorport ARG
  -mp ARG

" >&2
}

if [ $# -le 0 ]; then
  print_usage
  exit 1
fi

while [[ $# > 1 ]]
do
key="$1"

case $key in
    -n|--name)
    NAME="$2"
    shift # past argument
    ;;
    -m|--memory)
    MEMORY="$2"
    shift # past argument
    ;;
    -i|--image)
    IMAGE="$2"
    shift # past argument
    ;;
    -mp|--monitorport)
    MP="$2"
    shift # past argument
    ;;
    -sp|--serialport)
    SP="$2"
    shift # past argument
    ;;
    --default)
    DEFAULT=YES
    ;;
    *)
            # unknown option
    ;;
esac
shift # past argument or value
done

# Get name of newly created TAP device; see https://bbs.archlinux.org/viewtopic.php?pid=1285079#p1285079
precreationg=$(/usr/bin/ip tuntap list | /usr/bin/cut -d: -f1 | /usr/bin/sort)
sudo /usr/bin/ip tuntap add user $USERID mode tap
postcreation=$(/usr/bin/ip tuntap list | /usr/bin/cut -d: -f1 | /usr/bin/sort)
IFACE=$(comm -13 <(echo "$precreationg") <(echo "$postcreation"))

MACADDR=`/home/tacy/workspace/qemu/vm/qemu-mac-hasher.py ${NAME}`

qemu-system-x86_64 -name ${NAME} -cpu host -m ${MEMORY} -smp cores=2,threads=1,sockets=1 -machine type=pc,accel=kvm -net nic,macaddr=${MACADDR},model=virtio -net tap,vhost=on,ifname="$IFACE" -serial telnet:localhost:${SP},server,nowait,nodelay -monitor tcp:127.0.0.1:${MP},server,nowait,nodelay -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd -drive file=${IMAGE},format=qcow2,cache=writeback,discard=unmap,if=none,id=hd -drive file=/home/tacy/workspace/qemu/vm/seed.iso,if=virtio -nographic

# 注意, 这里cache可以考虑用directsync, 然后加上io=native, 这样磁盘性能应该是最优的.

sudo ip link set dev $IFACE down &> /dev/null
sudo ip tuntap del $IFACE mode tap &> /dev/null

Systemd

## ~/.config/systemd/user/qemu@.service
[Unit]
Description=QEMU virtual machine

[Service]
Environment="type=system-x86_64" "haltcmd=kill -INT $MAINPID"
EnvironmentFile=/home/tacy/workspace/qemu/vm/%i
ExecStart=/usr/bin/env /home/tacy/workspace/qemu/vm/run-qemu.sh -n %i  -m $memory -i $image -sp $sp -mp $mp
ExecStop=/bin/sh -c ${haltcmd}
TimeoutStopSec=30
KillMode=none

[Install]
WantedBy=multi-user.target

systemd env file:

## ~/workspace/qemu/vm/master
memory=1024
image=/home/tacy/workspace/qemu/vm/master.qcow2
sp=7101
mp=7001
haltcmd="echo 'system_powerdown' | /usr/bin/nc localhost 7001"

## ~/workspace/qemu/vm/node-one
memory=1024
image=/home/tacy/workspace/qemu/vm/node-one.qcow2
sp=7102
mp=7002
haltcmd="echo 'system_powerdown' | /usr/bin/nc localhost 7002"

## ~/workspace/qemu/vm/node-two
memory=1024
image=/home/tacy/workspace/qemu/vm/node-two.qcow2
sp=7103
mp=7003
haltcmd="echo 'system_powerdown' | /usr/bin/nc localhost 7003"

Start VM & Stop VM

systemctl --user start qemu@master
systemctl --user start qemu@node-one
systemctl --user start qemu@node-two

systemctl --user stop qemu@master
systemctl --user stop qemu@node-one
systemctl --user stop qemu@node-two

VM Config

Docker

yum install docker

##/etc/sysconfig/docker
HTTP_PROXY='http://172.18.0.1:9001'
HTTPS_PROXY='http://172.18.0.1:9001'

SSD tuning ####²³⁴

discard

要启用磁盘的discard, 首先用的驱动设备必须是SCSI: -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd -drive file=${IMAGE},format=qcow2,cache=writeback,discard=unmap,if=none,id=hd

然后修改fstab: UUID=ba1b9d4d-f899-4121-bc02-b385767de754 / xfs defaults,discard,nobarrier,noatime 0 0

确认是否成功启用discard:

[root@localhost ~]# lsblk -o MOUNTPOINT,DISC-MAX,FSTYPE
MOUNTPOINT DISC-MAX FSTYPE
                 1G
/                1G xfs
                 0B

ioscheduler

[root@localhost ~]# cat /etc/tmpfiles.d/10_ioscheduler.conf
w /sys/block/sda/queue/scheduler - - - - noop

Kubernetes Cluster Setup

Pre request

下载需要的软件:

wget https://github.com/kubernetes/kubernetes/releases/download/v1.1.2/kubernetes.tar.gz
wget https://github.com/projectcalico/calico-kubernetes/archive/master.tar.gz
wget https://github.com/coreos/etcd/releases/download/v2.2.2/etcd-v2.1.3-linux-amd64.tar.gz

另外需要修改所有节点的主机名(包括master, node-one, node-two), Calico使用主机名作为关键的信息.

Etcd & Calico

Kubernetes/Calico/Skydns都需要使用Etcd, 我们只配置一个Etcd供所有的组件使用. Calico为Kuernetes提供一个BGP网络(路由), 而不是Overlay网络, Calico在两层的使用很简单, 三层使用需要仔细规划.

Master

scp etcd calicoctl to master

etcd

[root@master bin]# cat /etc/systemd/system/etcd.service
[Unit]
Description=Etcd service
Documentation=https://coreos.com/etcd/docs/latest/
Requires=docker.service
After=docker.service

[Service]
ExecStart=/usr/bin/etcd \
  --data-dir=/var/lib/etcd \
  --advertise-client-urls=http://172.18.100.187:6666 \    #MASTER IP
  --listen-client-urls=http://0.0.0.0:6666 \
  --listen-peer-urls=http://127.0.0.1:2380 \
  --name=etcd
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

enable etcd & start etcd: systemctl start etcd && systemctl enable etcd

calico

[root@master bin]# cat /etc/systemd/system/calico-node.service
[Unit]
Description=calicoctl node
After=docker.service
Requires=docker.service

[Service]
User=root
Environment="ETCD_AUTHORITY=127.0.0.1:6666"
PermissionsStartOnly=true
ExecStart=/usr/bin/calicoctl node --detach=false
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

start & enable calico: systemctl enable calico && systemctl start calico

在宿主机上操作Etcd和Calico:

etcdctl -C 172.18.100.187:6666 ls --recursive

ETCD_AUTHORITY=172.18.100.187:6666  calicoctl pool show

calico里面缺省的pool是192网段, 这个地址段很容易冲突, 你可以通过calicoctl重新定义你自己的网段(我修改成了172.19.0.0/16).

Node

[root@node-one ~]# cat /etc/network-environment
#! /usr/bin/bash

# This node's IPv4 address
DEFAULT_IPV4=172.18.100.122

# The Kubernetes master IP
KUBERNETES_MASTER=172.18.100.187

# IP and port of etcd instance used by Calico
ETCD_AUTHORITY=172.18.100.187:6666

# URL to access the Kubernetes apiserver
KUBE_API_ROOT=http://172.18.100.187:8080/api/v1/

# Enable Calcio IPAM
CALICO_IPAM=true

注意, 每个Node需要修改DEFAULT_IPV4地址为自己的IP地址

[root@node-one ~]# cat /etc/systemd/system/calico-node.service
[Unit]
Description=Calico per-node agent
Documentation=https://github.com/projectcalico/calico-docker
Requires=docker.service
After=docker.service

[Service]
EnvironmentFile=/etc/network-environment
User=root
PermissionsStartOnly=true
#ExecStart=/usr/bin/calicoctl node --ip=${DEFAULT_IPV4} --kubernetes --kube-plugin-version=v0.6.1 --detach=false
# use CNI: https://github.com/projectcalico/calico-cni
ExecStart=/usr/bin/calicoctl node --ip=${DEFAULT_IPV4} --detach=false
Restart=always
RestartSec=10

start & enable calico: systemctl enable calico && systemctl start calico

Kubernetes

Master

scp kube-apiserver kubectl kube-scheduler kube-controller-manager kubelet to master

[root@master bin]# cat /etc/systemd/system/kube-apiserver.service
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes
Requires=etcd.service
After=etcd.service

[Service]
ExecStart=/usr/bin/kube-apiserver \
  --allow-privileged=true \
  --etcd-servers=http://127.0.0.1:6666 \
  --insecure-bind-address=0.0.0.0 \
  --service-cluster-ip-range=10.100.0.0/24 \
  --logtostderr=true
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target


[root@master bin]# cat /etc/systemd/system/kube-scheduler.service
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes
Requires=kube-apiserver.service
After=kube-apiserver.service

[Service]
ExecStart=/usr/bin/kube-scheduler \
  --master=127.0.0.1:8080 \
  --logtostderr=true
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target


[root@master bin]# cat /etc/systemd/system/kube-controller-manager.service
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes
Requires=kube-apiserver.service
After=kube-apiserver.service

[Service]
ExecStart=/usr/bin/kube-controller-manager \
  --master=127.0.0.1:8080 \
  --logtostderr=true
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target


[root@master bin]# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
Requires=docker.service
After=docker.service

[Service]
ExecStart=/usr/bin/kubelet \
--config=/etc/kubernetes/manifests \
--logtostderr=true
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

start & enable all kube service

Node

mkdir -p /opt/cni/bin sudo wget -N -P /opt/cni/bin/ https://github.com/projectcalico/calico-cni/releases/download/v0.2.0/calico sudo wget -N -P /opt/cni/bin/ https://github.com/projectcalico/calico-cni/releases/download/v0.2.0/calico-ipam

mkdir -p /etc/cni/net.d/ cat /etc/cni/net.d/10-calico.conf
{
    "name": "calico-k8s-network",
    "type": "calico",
    "etcd_authority": "172.18.100.187:6666",
    "log_level": "debug",
    "ipam": {
        "type": "calico-ipam"
    }
}

[root@node-one ~]# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=calico-node.service
Requires=calico-node.service

[Service]
EnvironmentFile=/etc/network-environment
ExecStart=/usr/bin/kubelet \
--address=0.0.0.0 \
--port=10250 \
--hostname_override=${DEFAULT_IPV4} \
--cluster-dns=10.100.0.10 \
--cluster-domain=cluster.local \
--api_servers=${KUBERNETES_MASTER}:8080 \
--network-plugin=cni \
--network-plugin-dir=/etc/cni/net.d \
--logtostderr=true
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target


[root@node-one ~]# cat /etc/systemd/system/kube-proxy.service
[Unit]
Description=Kubernetes Proxy
Documentation=https://github.com/kubernetes/kubernetes
After=calico-node.service
Requires=calico-node.service

[Service]
EnvironmentFile=/etc/network-environment
ExecStart=/usr/bin/kube-proxy --master=http://${KUBERNETES_MASTER}:8080 --logtostderr=true --proxy-mode=iptables
Restart=always
RestartSec=10

Kubeadm

docker images https://quay.io/repository/mritd/kubernetes-dashboard-amd64?tab=tags kubectl proxy –address 172.18.100.28 –accept-hosts=‘^*$’

disable firewalld disable selinux set hostname “–cgroup-driver=systemd –pod-infra-container-image=tacylee/pause-amd64:3.0” export KUBE_REPO_PREFIX=tacylee export KUBECONFIG=/etc/kubernetes/admin.conf

编译rpm在kubernetes/release仓库.

Using Kubernetes

start & stop env

docker start {ceph_container_id}
systemctl --user start qemu@master
systemctl --user start qemu@node-one
systemctl --user start qemu@node-two

一些常用命令:

ETCD_AUTHORITY=172.18.100.187:6666 ./calicoctl endpoint show --detailed

etcdctl -C 172.18.100.187:6666 ls --recursive /calico

label nodes

kubectl -s 172.18.100.187:8080 label nodes '172.18.100.122' skydns=server

skydns

服务发现扩展组件, 通过DNS实现, Kubernetes的DNS发现机制实现灵活: * 可以为Service建立DNS条目

Service本身只是一个虚IP, 并不真正提供接入服务, 不用担心健康问题.

也可以为Pods建立多个DNS条目

如果你希望采用自己的服务发现组件, 你也可以选择为Pods建立一组DNS条目, 然后自己实现负载均衡机制

使用skydns只需在启动kubelet的时候, 设置cluster-dns启动参数, Container就用会用该IP当DNS Server.

Create skydns into the label skydns=server of node

kubectl -s 172.18.100.187:8080 create -f skydns-rc.yaml
kubectl -s 172.18.100.187:8080 create -f skydns-svc.yaml

查看skydns运行情况:

kubectl -s 172.18.100.187:8080 get -w wide rc --all-namespace
kubectl -s 172.18.100.187:8080 get -w wide pods --all-namespace
kubectl -s 172.18.100.187:8080 exec busybox nslookup kubernetes.default
Server:    10.100.0.10
Address 1: 10.100.0.10

Name:      kubernetes.default
Address 1: 10.100.0.1

kubectl -s 172.18.100.187:8080 exec busybox nslookup kube-ui.kube-system
Server:    10.100.0.10
Address 1: 10.100.0.10

Name:      kube-ui.kube-system
Address 1: 10.100.0.243

//skydns-rc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-dns-v8
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    version: v8
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 1
  selector:
    k8s-app: kube-dns
    version: v8
  template:
    metadata:
      labels:
        k8s-app: kube-dns
        version: v8
        kubernetes.io/cluster-service: "true"
    spec:
      containers:
      - name: kube2sky
        image: gcr.io/google_containers/kube2sky:1.11
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        # command = "/kube2sky"
        - -domain=cluster.local
        - -kube_master_url=http://172.18.100.187:8080
        - -etcd-server=http://172.18.100.187:6666
      - name: skydns
        image: gcr.io/google_containers/skydns:2015-03-11-001
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        # command = "/skydns"
        - -machines=http://172.18.100.187:6666
        - -addr=0.0.0.0:53
        - -domain=cluster.local.
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
      - name: healthz
        image: gcr.io/google_containers/exechealthz:1.0
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
        args:
        - -cmd=nslookup kubernetes.default.svc.cluster.local localhost >/dev/null
        - -port=8080
        ports:
        - containerPort: 8080
          protocol: TCP
      dnsPolicy: Default  # Don't use cluster DNS.
      nodeSelector:
        skydns: server


\\skydns-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "KubeDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.100.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP

Debug

kubectl run -i –tty busybox –image=busybox – sh

E2E Performance Test

There is an end-to-end test for collecting overall resource usage of node components: kubelet_perf.go. To run the test, simply make sure you have an e2e cluster running (go run hack/e2e.go -up) and set up correctly.

Run the test with go run hack/e2e.go -v -test --test_args="--ginkgo.focus=resource\susage\stracking". You may also wish to customise the number of pods or other parameters of the test (remember to rerun make WHAT=test/e2e/e2e.test after you do).

Profiling

Kubelet installs the go pprof handlers, which can be queried for CPU profiles:

$ kubectl proxy &
Starting to serve on 127.0.0.1:8001
$ curl -G "http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/profile?seconds=${DURATION_SECONDS}" > $OUTPUT
$ KUBELET_BIN=_output/dockerized/bin/linux/amd64/kubelet
$ go tool pprof -web $KUBELET_BIN $OUTPUT

pprof can also provide heap usage, from the /debug/pprof/heap endpoint (e.g. http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/heap).

More information on go profiling can be found here.

Benchmarks

Before jumping through all the hoops to measure a live Kubernetes node in a real cluster, it is worth considering whether the data you need can be gathered through a Benchmark test. Go provides a really simple benchmarking mechanism, just add a unit test of the form:

// In foo_test.go
func BenchmarkFoo(b *testing.B) {
  b.StopTimer()
  setupFoo() // Perform any global setup
  b.StartTimer()
  for i := 0; i < b.N; i++ {
    foo() // Functionality to measure
  }
}

Then:

$ go test -bench=. -benchtime=${SECONDS}s foo_test.go

More details on benchmarking here.

DNS

kube-ui

Kubernetes portal, 新的项目是Dashboard, 部署kube-ui通过下面yaml文件. 注意在svc文件中定义了type和nodeport, 你可以在集群外部通过任意node的ip访问到kube-ui.

\\kube-ui-rc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-ui-v4
  namespace: kube-system
  labels:
    k8s-app: kube-ui
    version: v4
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 1
  selector:
    k8s-app: kube-ui
    version: v4
  template:
    metadata:
      labels:
        k8s-app: kube-ui
        version: v4
        kubernetes.io/cluster-service: "true"
    spec:
      containers:
      - name: kube-ui
        image: gcr.io/google_containers/kube-ui:v4
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 30
          timeoutSeconds: 5
      nodeSelector:
        kube-ui: server


\\kube-ui-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: kube-ui
  namespace: kube-system
  labels:
    k8s-app: kube-ui
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "KubeUI"
spec:
  selector:
    k8s-app: kube-ui
  clusterIP: 10.100.0.243
  type: NodePort
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30061

Volumes

Kubernetes支持Volume, Container可以和Volume绑定, 不同的Volume有不同的特性, 例如当容器销毁的时候, rdb volume不会销毁.

Ceph in Docker

如果你没ceph环境, 可以通过下面这个命令创建一个基于容器的⁵. 注意: 在我的Archlinux环境, 必须用特权容器, 否则rbd map的时候抛只读文件系统错误.

#!/bin/bash

sudo docker run -d --net=host --privileged=true --name ceph-cluster \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph/ \
-e MON_IP=172.18.0.1 \
-e CEPH_NETWORK=172.18.0.0/16 \
ceph/demo

验证ceph状态:

[tacy@tacyArch ~]# docker exec {CONTAINER_ID} ceph status
    cluster 9f487f28-d328-4f1a-bdb2-737006c8d7e9
     health HEALTH_OK
     monmap e1: 1 mons at {tacyArch=172.18.0.1:6789/0}
            election epoch 1, quorum 0 tacyArch
     mdsmap e10: 1/1/1 up {0=0=up:active}
     osdmap e22: 1 osds: 1 up, 1 in
            flags sortbitwise
      pgmap v28: 128 pgs, 9 pools, 2808 bytes data, 190 objects
            28159 MB used, 170 GB / 208 GB avail
                 120 active+clean
                   8 active+clean+replay
  client io 81925 B/s rd, 0 B/s wr, 133 op/s

Use rdb volume

Conf node

yum install ceph-common

Create ceph secret

[tacy@tacyArch ~]# docker exec {CEPH_CONTAINER_ID} ceph auth get-key client.admin
AQC8dINWKUyBNxAARQN5Fz0xNmltCSyRz0924A==

[tacy@tacyArch ~]# echo 'AQC8dINWKUyBNxAARQN5Fz0xNmltCSyRz0924A==' |base64
QVFDOGRJTldLVXlCTnhBQVJRTjVGejB4Tm1sdENTeVJ6MDkyNEE9PQo=

Edit your ceph-secret.yml with the base64 key:

apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
data:
  key: QVFCQU1vMVZxRTFPTWhBQVZwRVJQY3lRVTVwelU2SU9KMjJ4MXc9PQo=

Add your secret to Kubernetes:

$ kubectl create -f ceph-secret.yaml
$ kubectl get secret
NAME                  TYPE                                  DATA
ceph-secret           Opaque                                1

Create ceph image

$ docker exec {CEPH_CONTAINER_ID} rbd create foo -s 100
$ docker exec {CEPH_CONTAINER_ID} rbd map foo
$ docker exec {CEPH_CONTAINER_ID} rbd mkfs.ext4 /dev/rbd0
$ docker exec {CEPH_CONTAINER_ID} rbd unmap /dev/rbd0

Create pod with foo

tips

kubedash cluster-insight

API

fieldSelect⁶⁷

通过kubectl -v=8能看到所有的kubectl关于api的调用. kubectl proxy curl http://127.0.0.1:8001/api/v1/pods?fielSelector=spec.nodeName=primeton-tacy-k8s-node2,metadata.namespace=default

Code

kube-proxy

两个功能: 第一负责监听endpoint变化, 建立iptables规则, 第二负责userspace的代理, 如果代理模式选择userspace的话

kubelet

cmd/kubelet/server.go:RunKubelet 进入 -> CreateAndInitKubelet -> pkg/kubelet.go: NewMainKubelet -> makePodSourceConfig ->pkg/kubelet/config/apiserver.go:NewSourceApiserver 这是监控apiserver, 获取所有的pod update事件

然后主gorouinte会扫描podupdate channel, 处理事件

初始化kubelet配置

kubelet.go: s := options.NewKubeletServer()
options.go: api.Scheme.Convert(&v1alpha1.KubeletConfiguration{}, &config, nil)

这里引入v1alpha1包, 利用v1alpha1包下面的register.go文件中的下面代码注入缺省配置:

var (
	SchemeBuilder = runtime.NewSchemeBuilder(addKnownTypes, addDefaultingFuncs)
	AddToScheme   = SchemeBuilder.AddToScheme
)

其中的addDefaultingFuncs:

func addDefaultingFuncs(scheme *kruntime.Scheme) error {
	return scheme.AddDefaultingFuncs(
		SetDefaults_KubeProxyConfiguration,
		SetDefaults_KubeSchedulerConfiguration,
		SetDefaults_LeaderElectionConfiguration,
		SetDefaults_KubeletConfiguration,
	)
}

kubelet就是通过SetDefaults_KubeletConfiguration方法初始化配置缺省值.

具体调用:

return s.converter.Convert(in, out, flags, meta)
return c.doConversion(src, dest, flags, meta, c.convert)
return f(sv, dv, s)

这里的f就是动态注入的SetDefaults_KubeletConfiguration

Example

ThirdPartyResource

https://github.com/tiaanl/kube-tpr-demo https://github.com/wfarr/k8s-tpr-playground https://github.com/kubernetes/kubernetes/pull/43027 https://github.com/kubernetes/client-go/issues/8 https://groups.google.com/forum/#!topic/kubernetes-sig-network/igJrjG-v-Cs

Usage

rolling update

滚动更新, 用一个新版本的rc替换就版本的rc, 也可以直接更新image, 不用新写rc文件 Deployment支持recreate和rollupdate两种模式, 但是目前没法支持restart之类的需求

访问集群

kubectl proxy –accept-hosts=‘^*‘, –address=‘0.0.0.0’ http://172.18.100.28:8001/api/v1/proxy/nodes/kube-node1.mytacy.com:10250/stats/summary http://localhost:8080/api/v1/namespaces/kube-system/services/elasticsearch-logging/proxy/

Issue

Arch

arch roadmap: Kubernetes Architectural Roadmap (was Core/Layers Working Doc)

Volume

实现类似docker的volume共享: Idea: New volume type: “container” #831

Schedule

重新调度功能: Controlled Rescheduling in Kubernetes

security

async addmission control Extension of Admission Control via Initializers and External Admission Enforcement

API Machine

sig

configmap

Use Cases

As a user, I want to be able to consume configuration data as environment variables. As a user, I want to be able to consume configuration data as files in a volume. As a user, I want my view of configuration data in files to be eventually consistent with changes to the data.

如果作为Volume使用, 修改能被动态传播到Pod中, 用户可以监控文件实现功能, 如果作为环境变量, 不会传播

Secret volume should refresh when secrets are updated #18372, 这个issue已经fix, 也就是说目前支持secret的refresh(Pod可见, 之前必须删除secret然后建立一个新的) 里面提到:

configmap的两种适用模式

I think there are 2 patterns people will use with ConfigMap:

Update the ConfigMap and expect it to immediately propagate to all instances. They may watch their configurations using using inotify, or expect a HUP, or just restart. This is useful for applications where all replicas need consistent configuration. I'd guess etcd and Zookeeper are in this category.

Create a new ConfigMap, update the pod template to reference it, and roll it out via rolling update. This is useful when replicas don't need identical configuration and one is worried about pushing bad configs, which is a common source of failure.

Updating ConfigMap and not propagating the change is just confusing. Expecting users to kill pods in order to implicitly pick up changes lacks transparency and predictability, which is why we moved away from that approach in rolling update.

Other than "it was simpler to implement", what's the rationale for not propagating Secret updates?

下面谈到configmap update的几种情况

Re: picking up new config:

The user-centric question for configmap updates (and secrets now that we are going to allow updates) is I updated my configMap/Secret... Now when should I expect to see my app use the new files?

If we think the system should be able to answer this for the user, then it would need to know the semantics of the application in the container. We thought there were 3 styles of applications:

apps that need to restart to read config.
apps that reread config after a signal (HUP being the classic example)
apps that poll or inotify to detect file changes and reload
Users of secrets and configmap need to be aware which type of app they have, so they can know what the steps are to complete a config update / secret rotation for their app.

I wonder if we should define something like:

type ConfigPushAction string

const (
        // Restart means that the app requires restart to pick up changes to a configuration file.
        ConfigPushActionRestart ConfigPushAction = "Restart"
        // Hup means that the app requires a sighup to pickup changes to a configuration file.
        ConfigPushActionHup ConfigPushAction = "Hup"
        // None means that the app can detect new config files and so requires no action after config is pushed.
        ConfigPushActionNone ConfigPushAction = "None"
)
...
type Container struct {
...
   ConfigPushAction ConfigPushAction `omitempty, name:configPushAction`
...
}
It would get tricky though when you start to get into which pid to signal (maybe better to use the ExecProbe model instead, or for special cases). It also gets tricky if different files have different semantics.

But, if you get it right, then you can automate more of the update process.

无法动态更新配置

configmap目前无法滚动更新, 例如你更新了configmap的内容, 但是rc或者dc不会有任何操作, 没法做rollupdate, 目前的做法是, 你必须创建一个新的configmap, 然后修改dc或者rc, 引用到这个新的configmap, 然后触发dc rollupdate, 完成之后删除旧的configmap, 参见: Facilitate ConfigMap rollouts / management #22368

secret存在同样的问题…

openshift相关的configmap问题: Provide an option to redeploy deployment when config map used changes#9146

类似的问题, 关于secret的更新问题Trigger to redeploy containers when a secret changes:#7019

目前尚不支持In-place rolling updates #9043, 也就是说rollingupdate只能重新调度新的pod, 这对于有状态的pod不适用

Canarying mechanism for ConfigMap #20200, 这里也希望能in-place update

目前官方推荐的解法:

I agree with @thockin's last proposal. The right thing to do here is create a new ConfigMap and do a rolling update to switch to it, using the new Deployment API.

Thockin的一个issue, 支持动态配置Feature request: A way to signal pods #24957

log

关于日志的讨论Kubernetes logging, journalD, fluentD, and Splunk, oh my! #24677, 相关的doc: What I would like from Kubernetes Logging Volumes.

[WIP] kubelet-cri: create a new logging mechanism #33111

CRI: Add kuberuntime container logs #35348

Handle logging in CRI #30709

service

支持dns服务外部注入

支持类似rds的外部service注入, 例如你在aws上有一个rds, kubernetes集群里面的应用需要引用, 可以看这个Proposal:Add proposal for service externalName #29073 用法类似这样:

apiVersion: v1
kind: Service
metadata:
  name: my-rds
spec:
  ports:
  - port: 12345
type: ExternalName
externalName: myapp.rds.whatever.aws.says

支持node-local service

有些daemonset pod, 通过service暴露给内部应用, 但是service有个问题, 他会随机选择一个pod提供服务. 这就带来一个问题, 例如fluentd, 首先这会带来跨节点流量问题, 本来同一node上的pod只需要把日志吐给本地的fluentd pod, 但是如果通过service, 可能会吐到其他node去, 这是明显不合理的, 尤其如果日志需要Node信息的时候.

解决这个问题, 参考这个Proposal: Initial proposal for node-local services #28637

LB

有很多值得学习的东西use iptables for proxying instead of userspace #3760

解决External LB两跳和SNAT问题

monitor

cadvisor支持application metrics, 通过label和container volume实现: Introduce direct API for application metrics #1016 .

kubelet集成cadvisor之后，导致kubelet性能消耗很大，带来很多问题: Standalone cAdvisor for monitoring #18770, 主要是影响kubelet稳定性和性能问题 Provide an option to disable/mock cadvisor in kubelet #16296，而且有的用户并不使用cadvisor，目前需要重新设计，分离kubelet和cadvisor, 分离之后， cadvisor没有pod label，对于使用cadvisor监控的人来说，无法做pod聚合，目前希望cadvisor提供pod label: cAdvisor should export pod labels for container metrics #32326.

新定义的容器运行时接口，定义了container metrics，相关issueA better story of container metrics for runtime integration #27097.

新的监控架构设计 Add monitoring architecture #34758.

Proposal: Introduce Custom Metrics API #34586

Dev

Check out

Keeping sync

Godep

Create branch

Build

Committing changes to your fork

Creating a pull request

kubelet

APIServer

Schema

Install

KVM env

Bridge

Iptables

Dnsmasq

Docker custom

Create VM

Create VM img #### 1

Cloud init

Shell

qemu-ifup.sh / qemu-ifdown.sh

mac address

run-qemu.sh

Systemd

Start VM & Stop VM

VM Config

Docker

SSD tuning ####234

discard

ioscheduler

Kubernetes Cluster Setup

Pre request

Etcd & Calico

Master

Node

Kubernetes

Master

Node

Kubeadm

Using Kubernetes

start & stop env

label nodes

skydns

Debug

E2E Performance Test

Profiling

Benchmarks

DNS

kube-ui

Volumes

Ceph in Docker

Use rdb volume

Conf node

Create ceph secret

Create ceph image

Create pod with foo

tips

API

fieldSelect67

Code

kube-proxy

kubelet

初始化kubelet配置

Example

ThirdPartyResource

Usage

rolling update

访问集群

Issue

Arch

Volume

Schedule

security

API Machine

sig

configmap

无法动态更新配置

log

service

Create VM img #### ¹

SSD tuning ####²³⁴

fieldSelect⁶⁷