Ganeti

A cluster virtualization manager.

Guido Trotter <ultrotter@google.com>

  • Google, Ganeti, Debian
© 2010-2013 Google
Use under GPLv2+ or CC-by-SA
Some images borrowed/modified from Lance Albertson and Iustin Pop

Outline

What can it do?

ganeti-cluster.png

Ideas

Terminology

terminology.png

Technologies

tech.png

Node roles (management level)

Node roles (instance hosting level)

Newer features

New features in 2.6

The very stable version (since Jul 2012):

New features in 2.7

Release candidate:

New features in 2.8

At beta stage:

What to expect

Just ideas, not promises:

Initializing your cluster

The node needs to be set up following our installation guide.

gnt-cluster init [-s ip] ... \
  --enabled-hypervisors=kvm cluster
cluster0.png

gnt-cluster

Cluster wide operations:

gnt-cluster info
gnt-cluster modify [-B/H/N ...]
gnt-cluster verify
gnt-cluster master-failover
gnt-cluster command/copyfile ...

Adding nodes

gnt-node add [-s ip] node2
gnt-node add [-s ip] node3
nodes.png

Adding instances

# install instance-{debootstrap, image}
gnt-os list
gnt-instance add -t drbd \
  {-n node3:node2 | -I hail } \
  -o debootstrap+default web
ping i0
ssh i0 # easy with OS hooks
newinstance.png

gnt-node

Per node operations:

gnt-node remove node4
gnt-node modify \
  [ --master-candidate yes|no ] \
  [ --drained yes|no ] \
  [ --offline yes|no ] node2
gnt-node evacuate/failover/migrate
gnt-node powercycle

gnt-instance

Instance operations:

gnt-instance start/stop i0
gnt-instance modify ... i0
gnt-instance info i0
gnt-instance migrate i0
gnt-instance console i0

-t drbd

DRBD provides redundancy to instance data, and makes it possible to perform live migration without having shared storage between the nodes.

drbd.png

Recovering from failure

# set the node offline
gnt-node modify -O yes node3
failure0.png

Recovering from failure

# failover instances to their secondaries
gnt-node failover --ignore-consistency node3
# or, for each instance:
gnt-instance failover \
  --ignore-consistency web
failure1.png

Recovering from failure

# restore redundancy
gnt-node evacuate -I hail node3
# or, for each instance:
gnt-instance replace-disks \
  {-n node1 | -I hail } web
failure2.png

gnt-backup

Manage instance exports/backups:

gnt-backup export -n node1 web
gnt-backup imoport -t plain \
  {-n node3 | -I hail } --src-node node1 \
  --src-dir /tmp/myexport web
gnt-backup list
gnt-backup remove

htools: cluster resource management

Written in Haskell.

Controlling Ganeti

(*) Programmable interfaces

Job Queue

gnt-job list
gnt-job info
gnt-job watch
gnt-job cancel

gnt-group

Managing node groups:

gnt-group add
gnt-group assign-nodes
gnt-group evacuate
gnt-group list
gnt-group modify
gnt-group remove
gnt-group rename
gnt-instance change-group

Managing instance networks

gnt-network was added in 2.7:

gnt-network add
gnt-network connect # to a nodegroup
gnt-network info
gnt-network list
gnt-network modify
...

Running Ganeti in production

What should you add?

Production cluster

As we use it in a Google Datacentre:

cluster.png

Fleet at Google

fleet.png

Instance provisioning at Google

virgil.png

Auto node repair at Google

hw-fail.png

Auto node readd at Google

node-readd.png

People running Ganeti

Conclusion

Questions? Feedback? Ideas? Flames?

© 2010-2011 Google
Use under GPLv2+ or CC-by-SA
Some images borrowed/modified from Lance Albertson and Iustin Pop
cc-by-sa.png