In the last weeks I am building VMs for VMWare Fusion with Packer, which really works well (considering Packer, not so much VMWare).
This is just in preparation for playing with Docker on CentOS, so I needed to install a newer Kernel (3.14 to be exact). The biggest hurdle was to install the VMWare tools for the newer kernel. The “thinprint” service never started and the hgfs file system never started either, therefore Vagrant could not share a folder with the VM (solved that by using nfs).
Trying different ways to get everything working I built one VM after another. Sitting in Cafés or in my hotel room with abysmal internet access made that so much slower. Downloading all the extra RPMs, the newer kernel etc. challenged my patience.
rsync the repo?
So I came up with the idea to have all the RPMs locally. But mirroring a complete repository takes some 30 GB (and many hours to rsync). Tried that. Cancelled.
Next I wanted to use a proxy. I found some examples on how to configure squid. But they used it as a reverse proxy. And even taking some time reading up on it, I couldn’t understand the configuration. Why a reverse proxy?
Normally a reverse proxy is for connections from the ‘outside’ to protect a slow backend. My server still should be able to access the internet in general, so this didn’t sound right (please use the comments to increase my knowledge).
So I stepped back to using squid as a forward proxy. I found this description really straightforward to help me set up the yum configuration inside the VM, resulting in the following script:
sed -i -e "s/^mirrorlist/#mirrorlist/" \ -e "s%#baseurl=http://mirror.centos.org/centos%baseurl=http://centos.mirror-server.de%" \ /etc/yum.repos.d/CentOS-Base.repo
This comments out the
s/^mirrorlist/#mirrorlist/) and comments in the
baseurl to use (I chose
http://centos.mirror-server.de because it is near to me), resulting in:
Then I add the proxy configuration to
echo "proxy=$http_proxy" >> /etc/yum.conf
and remove the fastestmirror plugin:
mv -f /etc/yum/pluginconf.d/fastestmirror.conf $HOME/fastestmirror.conf
IP address of the host
The next problem was to determine the IP address of my laptop from within the VM. I found no officially documented/supported mechanism. VMWare creates an interface called
vmnet8 on my computer (which is a virtual network interface used for ‘shared’ networking by VMWare);
vmnet8: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether 00:50:56:c0:00:08 inet 192.168.117.1 netmask 0xffffff00 broadcast 192.168.117.255
And we are looking for the IP address 192.168.117.1. I could set that value as an environment variable in Packer, I chose to determine the value from inside the VM instead, because I need that information later when the VM is running as well. Luckily the IP address of my laptop from within the VM always seems to be XXX.XXX.XXX.1. I just need the IP address of the VM (the guest) and replace the last number: 192.168.117.132 => 192.168.117.1
MY_IP=`ifconfig $INTERFACE | grep "inet " | sed "s/inet addr:\([0-9.]*\).*/\1/"` HOST_IP=`echo $MY_IP | sed s/\.[0-9]*$/.1/`
My squid proxy will be running on port 3128, so I just have to check, if the proxy really is available:
curl --connect-timeout 1 http://www.google.com > /dev/null 2&>1 if [[ $? != 0 ]]; then ... fi
To speed things up, I first used SquidMan. Everything looked dandy, but when the VM tried to
yum update and started to download the repository metadata (a 4 MB sqlite db), it started okay and got really slow. I mean really slow: instead of 20 seconds it estimated 2 hours. Some googling seemed to say that Squid waits for all the data to arrive and then hands it over to the client. So I waited. And waited. Nothing. Tried to download from within the VM bypassing the proxy: 20 seconds again. I googled, and googled some more until I was ready to give up. What I know about networking “fits on a stamp”, but I started a
tcpdump nonetheless. And there was something curious: when I bypassed the proxy, everything looked normal (to me). But when I used the proxy, I saw many entries with IPV6 addresses instead of IPV4. So I tried to determine how to tell Squid not to use IPV6 or at least to prefer IPV4:
Now the speed was as expected. I still do not understand if the problem is with the chosen mirror (centos.mirror-server.de) or my ISP. I do not care.
In the meantime (I thought SquidMan might be the culprit) I switched over to a ‘normal’ squid, and learned some more stuff and ended with the following script to configure my squid (installed via homebrew on OSX, so YMMV):
# we need the directory, where the squid configuration file can be found SQUID_DIR=`brew info squid | grep Cellar | sed "s/^\([^ ]*\).*/\1/"` # insert some additional refresh patterns (before the other refresh patterns) sed -i .org '/refresh_pattern .ftp/i \ refresh_pattern -i .rpm$ 129600 100% 129600 \ refresh_pattern -i .bz2$ 129600 100% 129600 \ ' $SQUID_DIR/etc/squid.conf # append some lines to the configuration cat <<EOF >> $SQUID_DIR/etc/squid.conf # log file locations: cache_access_log stdio:/usr/local/var/logs/squid/squid-access.log cache_store_log stdio:/usr/local/var/logs/squid/squid-store.log cache_log /usr/local/var/logs/squid/squid-cache.log # store objects up to: maximum_object_size 16 MB # I needed that at my home to avoid slow ipv6 dns_v4_first on # and we want the cache to survive a restart: cache_dir ufs /usr/local/var/cache/squid 10000 16 256 EOF
refresh_pattern are used to store the RPMs and keep them, even if no cache-headers are set. They have to be set before the existing refresh_patterns, so I insert them before the first pattern (
/refresh_pattern .ftp/i). Then I set the log directories, the maximum object size (16 MB), the IPV4 (DNS via IPV4) and finally a cache directory so that the cached data survives a restart of the proxy. Done.