In the beginning there was G4L (Ghost for Linux). Lacking any sort of PXE system, we shipped a cdrom in each server. Reimaging a server meant rebooting it from the cd drive, and grabbing the image via FTP. This was actually an effective hack, and appears to have worked quite well. Next, we went to a basic pxelinux setup (still using G4L). While better in some regards (no shipping cds in every drive), it was not quite optimal. G4L suffers from the problem of not understanding NTFS, so you end up with a 1:1 copy of every sector of the hard disk. This means restores take forever, and take longer the bigger your disks.
After awhile, we switched to Clonezilla. This was a significant improvement, as it uses Partimage to create the images. Partimage can tell the difference between unused and in use blocks. This significantly decreased the amount of time needed to image a server (we were now able to image multiple batches of servers in a day, limited only by the amount of power available). Clonezilla was nice, as it offered an all in one setup, and didn’t require that you understand how everything was working. This is where I got hooked on PXE. At this point, we still had minimal automation for the setup process. We had a simple batch script that would prompt you for the IP/hostname of a machine, and do the necessary configs. This script was on a USB drive that moved from machine to machine.
Our initial deployment of Clonezilla used it as part of DRBL. DRBL has some fancy scripts that guide you through the entire setup process, and abstract away a lot of what’s going on. DRBL lasted awhile, as it took me some time to understand all the different moving parts that go along with network bootring. The main reason we dropped DRBL was because it would blow away the entire pxelinux.cfg menu every time you made a change.
Up next was your basic PXELinux stack. We had a DHCP server, TFTP, and NFS (for images). This was what we deployed worldwide, and is still in use in some of our locations. We still use Clonezilla for Windows images, but we boot the PXE version, and no longer rely on DRBL to set things up for us. We also started to use public IPs for the setup process. This gave us a significant advantage during mass deployments, as it means you can directly connect to the servers from a remote location to finish the setup process.
I developed some automation here, but it was fairly fragile. A PHP script was launched on the first login to the machine (and the machine was configured to automatically login on the first boot). This would attempt to pull down network configurations from a central machine, and would configure the network, then wait for someone to acknowledge the configuration before restarting. The acknowledgement was important, as the machine needed to be properly labeled and sent to the correct location. We’d often be configuring the machines for multiple locations at a time, so labeling was important.
Somewhat later, I developed a batch script that did all the necessary software configuration with minimal human interaction. This was significant as before we would manually follow a checklist when setting up machines. That was moderately effective, but you would always end up with machines where some critical step had been missed. We ended up with various scripts that checked for various pieces of software over and over again (as you couldn’t rely on any software being present).
Next up comes iPXE. This replaces PXELinux, and gives us a couple advantages. One big one is we can now generate configuration files on the fly. This lets us keep track of what machines have already been imaged and default to booting from the local drive instead of overwriting the drive. It also lets us improve our Linux install process (we pull down an IP via DHCP, and statically configure that on the machine). We’re still relying on Clonezilla to restore the images, but we’re pushing down IP configurations to make it more automated (and reliable).
That’s where we are today. My current plan is to get rid of all the images, and switch to doing Windows installs via the network. iPXE has enough documentation on this to get you to booting the Windows installer successfully. From there, unattend.xml will get you through the rest of the Windows installer. The big advantage with this is that hardware configuration no longer matters. There’s no need to maintain a ton of different images for each different hardware configuration. There’s no keeping track of which image works where. It’s just start the installer and go. I’m also developing a Python app to grab relevant software configurations. Python works a bit better in this case, as I can compile it to a single exe (yay PyInstaller) which makes it easier to copy over and launch from unattend.xml.
I’ve looked into various Windows specific solutions (AIK, RDS, etc), but they all seem to want too much control over the process. I wouldn’t have nearly as much control over the installation process if I switched. Somewhat more importantly, I can deploy our entire iPXE system automatically with Puppet. I can’t do that with the Windows specific systems (well, SCCM might be able to, but that’s an extra license fee and a significant amount of work. I have heard that SCCM is only worth it if you have people to devote to it).