At Container Solutions we constantly push the boundaries of the tools we work with. While exploring programmable infrastructure we combine the available tools in new ways. Sometimes it works and sometimes it doesn't, but nevertheless we like to contribute things we learned back to the community.
One example of our efforts concerns Terraform. We have done extensive research into Terraform and we have been using it with different platforms, some more exotic than others. When we were applying Terraform to provision bare-metal servers, we found that we were lacking a few networking services in the environment where we were provisioning these servers. When a machine starts up in an unconfigured state, it will try to boot using BOOTP. So you need services like DHCP, a PXE server and eventually a DNS server. Cobbler is a system that bundles these services, so we decided to use that. We quickly found that to handle systems in Cobbler through Terraform, the proper way for Terraform to interface with Cobbler would be through a Provider. No such provider existed yet so we decided to write our own.
Structure of a Provider
My mate Carlos already started out on a series detailing how to write a provider so I won't repeat that stuff here. Instead I'll explain a bit about the specifics of dealing with Cobbler. From Carlos' post it's clear that we need to define a
Provider with three items: a
Schema which is a collection of parameters for configuring the Provider, a
ResourcesMap that lists the resources that will be configurable using this Provider and a
ConfigureFunc that details how to set up the connection. For each of the resources we wil need a
Schema as well, plus methods describing how to
Delete these resources. I could paste some snippets of the code here but you might as well head over to Github to check out the full source code.
The provider is fairly straightforward, it has a
Schema that contains three strings: a url, a username and a password. For now we have three resources defined in the
ResourcesMap : a system, a kickstart file and a snippet.
ConfigureFunc for the provider returns an HTTP client that talks to the Cobbler server. We abstracted the code that actually interfaces with Cobbler into a separate client library, to make the Provider code cleaner. We just pass the values for the url, username and password that we get from the .tf configuration file provided by the user, as arguments to a new
Next we define the System Resource. This has a slightly more elaborate
Schema which contains a map of network interfaces. However, most fields in the
Schema are just strings so it's not really special either. The interesting stuff here is in the
Create method. Cobbler expects that you perform a series of requests to create a resource (a
System in this case) and modify it's properties, and then after you're done you send a sync request which commits your changes. This involves updating the configuration files and restarting the various services under Cobblers control, e.g. the DHCP server. The good thing is that this encourages you to set up a bunch of changes and commit them all at once. In fact, if you call the
sync after each change, Cobbler will try to restart the services repeatedly and it breaks very quickly.
The unfortunate thing is that Terraform doesn't accomodate this behaviour, i.e. there is no way to run a post-execution hook or something. So we decided to call the
sync in a goroutine. The first
Create thread will call the goroutine and wait for a signal from the goroutine over a channel. We set a timeout when creating the resource, and perform the
sync only after that timeout has passed. Each call to the
Create method will reset the timeout to one second in the future, and so the timeout will pass one second after the last
Create has been executed. The channel is used to make sure the calling thread will wait on the goroutine, else the Terraform run would end before the
sync actually had been called. The timeout of one second is a bit arbitrary, but it seems to do the job.
We haven't implemented the
Update methods yet, they just return nil for now because we thought they were not important for our immediate use case. While diving further into Terraform provider internals we came to understand the need for a
Read method, which is to sync the current state of the infrastructure (Cobbler) with Terraforms internal state as reflected in the state file. So this is definitely on our list of things to implement.
Another thing we need to change is the way we handle logging in to Cobbler. We currently login at the beginning of the
Delete methods, which causes new logins for every resource created or deleted. We should move the login to the
ProviderFunc , so the login happens only once and the token that is returned will be reused throughout the Terraform run.
Also we currently don't implement the
sync method after deleting resources, which we really should because the DHCP server also needs to be notified of resources that have been removed.
As you can see there's still a lot missing but basically it works. We can create a system in Cobbler using Terraform and have our bare-metal server boot over the network having it's boot image served by Cobbler. The exciting thing is that our efforts did not go unnoticed by the Hashicorp folks, who kindly proposed to merge the code into the Terraform tree. We're currently in the process of getting our PR approved. As you can probably guess we're very pleased to get a chance to give something back to the community, which after all is one of Arnold Schwarzenegger's 6 rules of success.