OpenShift-on-OpenStack - Creating a cluster with openshift-ansible
09 Nov 2016We’re going to try to use the openshift-ansible to spin up an OpenShift cluster. In the “easy mode” article about OpenShift, we created just a single play node, and also a whopping single point of failure. So let’s build up on that. We’ll take this to the point where we can see the pods that are running, and see where the dashboard is, and in a further article explore how to set up the networking and at least make it a little more real-world-useable.
Before you begin, I recommend you check out the process to build a custom CentOS cloud image as it’s going to be required here.
Openstack setup
Using the “easy mode” article as a starting point, get yourself an overcloud up with oooq.
However, with a little different twist. That setup just doesn’t have enough horsepower, errr, ram power mostly that is. So I’m gonna deploy more compute nodes. We’re going to use a custom general_config
file in ./tripleo-quickstart/config/general_config/doug.yml
. I went ahead and copied minimal.yml
and added the parameters I need. While I’m still learning about oooq, I do believe that these are overrides for variables that are used throughout the Ansible plays.
Here’s the variables that I set differently (and here’s my entire doug.yml file if you please):
# Top level vars, right up near the top.
control_memory: 8192
compute_memory: 12288
undercloud_memory: 16384
compute_vcpu: 4
compute_disk: 100
# Added to "overcloud_nodes"
- name: compute_1
flavor: compute
- name: compute_2
flavor: compute
# I just added the compute-scale here
extra_args: >-
--compute-scale 3
--neutron-network-type vxlan
--neutron-tunnel-types vxlan
--ntp-server pool.ntp.org
I ran into a number of issues from having the undercloud (especially) being light on RAM. I over provisioned for what my test system has, and so far my experience has been that it’s better to overprovision in the smaller environment I’m using now (single host, 8 cores, 32 gig RAM).
When you’ve got that new yaml file all setup, go ahead and run oooq as such:
./quickstart.sh --config ./config/general_config/doug.yml eendracht
(And sometimes I add a --extra-vars "force_cached_images=true"
to skip re-downloading.)
Don’t forget your overcloud images & introspection.
ssh -F /home/doug/.quickstart/ssh.config.ansible undercloud
source stackrc
openstack overcloud image upload
openstack baremetal import instackenv.json
openstack baremetal configure boot
openstack baremetal introspection bulk start
And also chuck in the subnet DNS.
subnet_id=$(neutron subnet-list | grep -i "start" | awk '{print $2}')
neutron subnet-update $subnet_id --dns-nameserver 8.8.8.8
When we get to the deploy we’re going to do the deploy in multiple parts since we’re working with a bigger cluster. We’re just going to set the compute scale to 1. Side note: I actually tend to do this in a screen now, and also tee
the output to a log. That way if I lose network connection, it still keeps running and I’m not left wondering, “where’d that go, and did it finish?”
openstack overcloud deploy --templates --compute-scale 1 --control-scale 1
And then increase the scale.
openstack overcloud deploy --templates --compute-scale 3 --control-scale 1
Ok, we’ve got an overcloud! Mine resulted in this URI.
Overcloud Endpoint: http://192.168.24.6:5000/v2.0
So you can get to the OpenStack web GUI with tunneling a la…
ssh -F ~/.quickstart/ssh.config.ansible stack@undercloud -L 8080:192.168.24.6:80
And we’ll create our network again according to the “easy mode” article (I noticed that in #oooq
they mentioned the default netmask changed, so I put my updated steps here)
neutron net-create ext-net --router:external --provider:physical_network datacentre --provider:network_type flat
neutron subnet-create ext-net --name ext-subnet --allocation-pool start=192.168.24.100,end=192.168.24.120 --disable-dhcp --gateway 192.168.24.1 192.168.24.0/24
neutron router-create router1
neutron router-gateway-set router1 ext-net
neutron net-create int
neutron subnet-create int 30.0.0.0/24 --dns_nameservers list=true 8.8.8.8
neutron router-interface-add router1 $ABOVE_ID
Or, even better, I made a gist of a bash script I use to create my network, for now. Make sure to change the constants at the top of the script to match what you prefer.
If you’d like, spin up an instance and give it a test.
How about a quick checklist of what your ooo deployment looks like before we move on?
- oooq undercloud
- overcloud image upload, introspection
- oooq overcloud deploy
- setup the networking as above
- add some default security group rules
- upload the custom image to glance
- make sure you have a nova key made.
- spin up an instance as a test.
All of these steps can found between here and the “easy mode” article.
Prep to run openshift-ansible (from the undercloud)
Let’s install the rest of what we need. (note: The python-*
clients were already installed)
ssh -F /home/doug/.quickstart/ssh.config.ansible stack@undercloud
. ~/overcloudrc
# Here's our package deps (need Ansible >= 2.1.0, 2.2 preferred, and Jinja >= 2.7)
sudo yum install -y ansible pyOpenSSL python-cryptography python-novaclient python-neutronclient python-heatclient python-pip
# Apparently we needed this to run the cluster creator (wasn't in the docs, but is now after a PR)
sudo pip install shade
# And get the openshift-ansible repo.
git clone https://github.com/openshift/openshift-ansible.git
cd openshift-ansible/
And let’s also modify the play that creates the heat stack. The timeout is too low.
sed -i 's/timeout 3/timeout 10/' playbooks/openstack/openshift-cluster/launch.yml
OpenShift cluster creator method and pitfalls.
tl;dr – If you’re not concerned with the gotchyas, just move to the next section with what to do next. But come back here if you get stuck somewhere.
First off – we’re going to be performing these actions from our undercloud machine, as we’ve done everything else in this article so far. It has the basics that we need, and it can easily access the overcloud. The gist is that we’re going to SSH to that machine, setup the requirements, then source . ~/overcloudrc
and kick off the playbooks.
The method we’re using here is to use the openstack features of the ./bin/cluster
application in openshift-ansible – here’s the official README_openstack.md.
All of these Markdown files give a standard “This feature is community supported and has not been tested by Red Hat” warning. Because they’d prefer that you use the “out-of-the-box” methods of installing OpenShift, but, we’d like to use OpenShift Origin (the upstream project) so we’re going a little out of the box here. The “officially supported” (if you can call it that) method described there for Origin is to use the oc
(OpenShift Command) application, which spins up an “all-in-one” instance of OpenShift Origin, and we want a cluster!
If you’re looking for another reference on how bring up a cluster with OpenShift Origin, may I suggest you checkout the the cluster instructions from the advanced installation docs (which… actually uses ansible, itself.) Alternatively, you can aslo “bring your own host deployment” according to the openshift-ansible docs.
Currently I’ve run into a few issues that I’ve opened issues and PR’s for (err, PR’s for most), which I have somewhat fixed herein. At some point, I may circle around and try to improve the playbooks and submit a PR so that this happens more smoothly. However, after working through most of the kinks (as described here), I feel it’s less necessary.
Above all, make sure the time is right on all your undercloud/overcloud machines
In the process of installing OpenShift, a CA is created and it’s quite time-based, especially the start dates for the certificates as created for etcd (discovery service), it needs HTTPS and create a bunch of SSL certificates, and if the time is wrong… It could cause some major failures. I struggled with this quite a bit looking in all the wrong places. If you see the origin-master
service having trouble starting during the install – this could likely be the case. You can read my struggles as I opened in this issue (and then later realized it was really the clocks on the underlying machines).
One, if the heat create succeeds, but, further playbook errors occur, rebuilding the heat stack is nearly impossible with the playbook.
So I’ve gone ahead and created a script that helps you manually hoe out the networking created by ansible openshift. It just forcefully tears it out.
The teardown script is available in this gist. Remember to change the variables up at the top, the most important being the cluster name you use in the following commands to spin up openshift-ansible with the cluster creator.
Remember to follow up with a heat stack-list
and then heat stack-delete $your_cluster
when you’re done, as it doesn’t do that step.
Two, the timeout can be too low for the heat stack creation.
I’ve included a sed command herein to fix it for you, but, I’ve also opened a PR on github to add a –timeout option.
Last and (sorta) least, we’re missing a python module, shade
The docs for openshift-ansible with openstack didn’t originally tell you you’re going to need to install a Python module, so I include it in the issues here and in my instruction, but they have merged my documentation fix in this PR.
Run the cluster creator
Let’s move on to brewing up our install command. You’ll note here that we’re specifying the centos-custom
image, which we created in this article.
bin/cluster create \
-o image_name=centos-custom \
-o external_net=ext-net \
-o floating_ip_pool=ext-net \
-o net_cidr=40.0.0.0/24 \
openstack test_cluster
If you’d like more nodes add the -n $number_of_nodes
to your cluster create command.
This results in a list of openshift nodes – here’s my resulting list… (I reformatted mine, fwiw.)
role - private IP - public IP
master - 192.168.137.4 - 192.168.24.104
compute 0 - 192.168.137.6 - 192.168.24.106
compute 1 - 192.168.137.5 - 192.168.24.105
infra - 192.168.137.3 - 192.168.24.103
Accessing OpenShift
Ok, let’s ssh into the master, since we didn’t specify any specific SSH keys above, if we are using the stack user in our undercloud, it’s going to have the right key. Note that we ssh as the openshift
user.
[stack@undercloud ~]$ ssh openshift@192.168.24.104
Let’s take a look at the health of the cluster, using the oc
OpenShift command…
oc get nodes
If you’re like me, you’re unlucky, your nodes are going to say “NotReady”, like this:
[openshift@test-cluster-master-0 ~]$ oc get nodes
NAME STATUS AGE
192.168.137.3 NotReady 1h
192.168.137.4 NotReady,SchedulingDisabled 1h
192.168.137.5 NotReady 1h
192.168.137.6 NotReady 1h
So I went to each node and restarted the origin-node
service.
[openshift@test-cluster-node-compute-1 ~]$ sudo systemctl restart origin-node
Or, do them all in one sweep…
for i in $(seq 103 106); do ssh openshift@192.168.24.$i "sudo systemctl restart origin-node"; done
After doing that for each of those nodes they look like:
[openshift@test-cluster-master-0 ~]$ oc get nodes
NAME STATUS AGE
192.168.137.3 Ready 1h
192.168.137.4 Ready,SchedulingDisabled 1h
192.168.137.5 Ready 1h
192.168.137.6 Ready 1h
Since 192.168.137.4
is my master, I think I’m OK with “scheduling disabled” for now, I’m guessing I don’t want to schedule containers there for the moment.
OpenShift Projects
Ok, still on the master, let’s look at the projects available, you can do this with oc projects
command
[openshift@test-cluster-master-0 ~]$ oc projects
You have access to the following projects and can switch between them with 'oc project <projectname>':
* default
kube-system
logging
management-infra
openshift
openshift-infra
Using project "default" on server "https://192.168.92.6:8443".
Let’s go ahead and ensure we’re on the “default” project by issuing a command to change the active project (you may already be on it, by, uh, default)
[openshift@test-cluster-master-0 ~]$ oc project default
Now using project "default" on server "https://192.168.92.6:8443".
Now let’s look at the pods that are there.
[openshift@test-cluster-master-0 ~]$ oc get pods
NAME READY STATUS RESTARTS AGE
docker-registry-5-95oai 1/1 Running 0 3m
registry-console-1-22xf5 1/1 Running 0 3m
router-1-p7do8 1/1 Running 0 3m
More detail on what pods are another time. For now, let’s inspect one.
If you’re zipping through this document and having good luck, you might see the READY
column reads 0/1
which means it’s not ready and status may not be “Running” this is OK too. It takes some time (especially while the Docker images download). Still go ahead and do a oc describe pod $name
and check it out, it’s also interesting.
Let’s look at that…. “registry” one
[openshift@test-cluster-master-0 ~]$ oc describe pod registry-console-1-22xf5 | grep -iP "^IP|^\s+Port"
IP: 10.129.0.2
Port: 9090/TCP
Oh, interesting…. It’s TCP running on port 9090 @ 10.129.0.2
Let’s curl that….
curl -k https://10.129.0.2:9090
That my friend, is cockpit – your openshift dashboard. Kubenernetes is greek for pilot – so, how fitting, the cockpit.
Wouldn’t it be great if you could easily hook up your browser to that? It sure would be. In the next article we’ll focus on how to make this networking setup more useable.