In my last post I already praised Packer. But there is one thing I really do not like about it: the main configuration file is JSON. Now JSON is a great format for many purposes, but I like to be able to simply comment stuff out while testing and I like to add real comments to such configuration files, whether they have to be maintained by someone else or by myself. And finally Packer templates contain a lot of duplication if you use multiple “builders” (for VMWare, VirtualBox etc.): configuration items like the iso file and the checksum are the same whether you build for VMWare or VirtualBox.
You might even have quite some duplication across multiple Packer templates. But Mitchell Hashimoto (the (main) author of Packer) has a point when he says:
“This is one of the primary reasons we chose JSON as the configuration format: it is highly convenient to write a script to generate the configuration.” (See the comment in this issue)
Now the original author of that issue wanted an ‘include’ mechanism and pasted some Python code to achieve that. I additionally wanted to avoid duplication in a single file and came up with a script, that allows me to use YAML “merge keys”:
everything in&basic_builder gets copied wherever a <<: *basic_builder occurs. You can add and even override items. This is plain YAML. The only problem: the original entry &basic_builder will be in the result document. Therefore the script removes any top level (!) _macros entry.
To get me started I converted the existing JSON to YAML with the same script. Depending on the extension of the input file either JSON or YAML is created:
usage: yaml-to-json.py [-h] [--force] SOURCEFILE
Convert YAML to JSON (and vice versa)
SOURCEFILE the file name to be converted. If it ends with '.json' a YAML
file will be created. All other file extensions are assumed to
by YAML and a JSON file will be created
-h, --help show this help message and exit
--force overwrite target file, even if it is newer than the source file
In the last weeks I am building VMs for VMWare Fusion with Packer, which really works well (considering Packer, not so much VMWare).
This is just in preparation for playing with Docker on CentOS, so I needed to install a newer Kernel (3.14 to be exact). The biggest hurdle was to install the VMWare tools for the newer kernel. The “thinprint” service never started and the hgfs file system never started either, therefore Vagrant could not share a folder with the VM (solved that by using nfs).
Trying different ways to get everything working I built one VM after another. Sitting in Cafés or in my hotel room with abysmal internet access made that so much slower. Downloading all the extra RPMs, the newer kernel etc. challenged my patience.
rsync the repo?
So I came up with the idea to have all the RPMs locally. But mirroring a complete repository takes some 30 GB (and many hours to rsync). Tried that. Cancelled.
Next I wanted to use a proxy. I found someexamples on how to configure squid. But they used it as a reverse proxy. And even taking some time reading up on it, I couldn’t understand the configuration. Why a reverse proxy?
Normally a reverse proxy is for connections from the ‘outside’ to protect a slow backend. My server still should be able to access the internet in general, so this didn’t sound right (please use the comments to increase my knowledge).
So I stepped back to using squid as a forward proxy. I found this description really straightforward to help me set up the yum configuration inside the VM, resulting in the following script:
sed -i -e "s/^mirrorlist/#mirrorlist/" \
-e "s%#baseurl=http://mirror.centos.org/centos%baseurl=http://centos.mirror-server.de%" \
This comments out the mirrorlist (s/^mirrorlist/#mirrorlist/) and comments in the baseurl to use (I chose http://centos.mirror-server.de because it is near to me), resulting in:
The next problem was to determine the IP address of my laptop from within the VM. I found no officially documented/supported mechanism. VMWare creates an interface called vmnet8 on my computer (which is a virtual network interface used for ‘shared’ networking by VMWare); ifconfig shows:
And we are looking for the IP address 192.168.117.1. I could set that value as an environment variable in Packer, I chose to determine the value from inside the VM instead, because I need that information later when the VM is running as well. Luckily the IP address of my laptop from within the VM always seems to be XXX.XXX.XXX.1. I just need the IP address of the VM (the guest) and replace the last number: 192.168.117.132 => 192.168.117.1
MY_IP=`ifconfig $INTERFACE | grep "inet " | sed "s/inet addr:\([0-9.]*\).*/\1/"`
HOST_IP=`echo $MY_IP | sed s/\.[0-9]*$/.1/`
My squid proxy will be running on port 3128, so I just have to check, if the proxy really is available:
curl --connect-timeout 1 http://www.google.com > /dev/null 2&>1
if [[ $? != 0 ]]; then
To speed things up, I first used SquidMan. Everything looked dandy, but when the VM tried to yum update and started to download the repository metadata (a 4 MB sqlite db), it started okay and got really slow. I mean really slow: instead of 20 seconds it estimated 2 hours. Some googling seemed to say that Squid waits for all the data to arrive and then hands it over to the client. So I waited. And waited. Nothing. Tried to download from within the VM bypassing the proxy: 20 seconds again. I googled, and googled some more until I was ready to give up. What I know about networking “fits on a stamp”, but I started a tcpdump nonetheless. And there was something curious: when I bypassed the proxy, everything looked normal (to me). But when I used the proxy, I saw many entries with IPV6 addresses instead of IPV4. So I tried to determine how to tell Squid not to use IPV6 or at least to prefer IPV4:
Now the speed was as expected. I still do not understand if the problem is with the chosen mirror (centos.mirror-server.de) or my ISP. I do not care.
In the meantime (I thought SquidMan might be the culprit) I switched over to a ‘normal’ squid, and learned some more stuff and ended with the following script to configure my squid (installed via homebrew on OSX, so YMMV):
# we need the directory, where the squid configuration file can be found
SQUID_DIR=`brew info squid | grep Cellar | sed "s/^\([^ ]*\).*/\1/"`
# insert some additional refresh patterns (before the other refresh patterns)
sed -i .org '/refresh_pattern .ftp/i \
refresh_pattern -i .rpm$ 129600 100% 129600 \
refresh_pattern -i .bz2$ 129600 100% 129600 \
# append some lines to the configuration
cat <<EOF >> $SQUID_DIR/etc/squid.conf
# log file locations:
# store objects up to:
maximum_object_size 16 MB
# I needed that at my home to avoid slow ipv6
# and we want the cache to survive a restart:
cache_dir ufs /usr/local/var/cache/squid 10000 16 256
the refresh_pattern are used to store the RPMs and keep them, even if no cache-headers are set. They have to be set before the existing refresh_patterns, so I insert them before the first pattern (/refresh_pattern .ftp/i). Then I set the log directories, the maximum object size (16 MB), the IPV4 (DNS via IPV4) and finally a cache directory so that the cached data survives a restart of the proxy. Done.