zu berücksichtigen — worauf ich, unter “ich suche” in meinem Profil verweise (!). Da Sie das offensichtlich nicht berücksichtigt haben, möchte ich höflich darum bitten, von Ihnen keine weiteren Nachrichten zu erhalten.
In my last post I already praised Packer. But there is one thing I really do not like about it: the main configuration file is JSON. Now JSON is a great format for many purposes, but I like to be able to simply comment stuff out while testing and I like to add real comments to such configuration files, whether they have to be maintained by someone else or by myself. And finally Packer templates contain a lot of duplication if you use multiple “builders” (for VMWare, VirtualBox etc.): configuration items like the iso file and the checksum are the same whether you build for VMWare or VirtualBox.
You might even have quite some duplication across multiple Packer templates. But Mitchell Hashimoto (the (main) author of Packer) has a point when he says:
“This is one of the primary reasons we chose JSON as the configuration format: it is highly convenient to write a script to generate the configuration.” (See the comment in this issue)
Now the original author of that issue wanted an ‘include’ mechanism and pasted some Python code to achieve that. I additionally wanted to avoid duplication in a single file and came up with a script, that allows me to use YAML “merge keys”:
everything in&basic_builder gets copied wherever a <<: *basic_builder occurs. You can add and even override items. This is plain YAML. The only problem: the original entry &basic_builder will be in the result document. Therefore the script removes any top level (!) _macros entry.
To get me started I converted the existing JSON to YAML with the same script. Depending on the extension of the input file either JSON or YAML is created:
usage: yaml-to-json.py [-h] [--force] SOURCEFILE
Convert YAML to JSON (and vice versa)
SOURCEFILE the file name to be converted. If it ends with '.json' a YAML
file will be created. All other file extensions are assumed to
by YAML and a JSON file will be created
-h, --help show this help message and exit
--force overwrite target file, even if it is newer than the source file
In the last weeks I am building VMs for VMWare Fusion with Packer, which really works well (considering Packer, not so much VMWare).
This is just in preparation for playing with Docker on CentOS, so I needed to install a newer Kernel (3.14 to be exact). The biggest hurdle was to install the VMWare tools for the newer kernel. The “thinprint” service never started and the hgfs file system never started either, therefore Vagrant could not share a folder with the VM (solved that by using nfs).
Trying different ways to get everything working I built one VM after another. Sitting in Cafés or in my hotel room with abysmal internet access made that so much slower. Downloading all the extra RPMs, the newer kernel etc. challenged my patience.
rsync the repo?
So I came up with the idea to have all the RPMs locally. But mirroring a complete repository takes some 30 GB (and many hours to rsync). Tried that. Cancelled.
Next I wanted to use a proxy. I found someexamples on how to configure squid. But they used it as a reverse proxy. And even taking some time reading up on it, I couldn’t understand the configuration. Why a reverse proxy?
Normally a reverse proxy is for connections from the ‘outside’ to protect a slow backend. My server still should be able to access the internet in general, so this didn’t sound right (please use the comments to increase my knowledge).
So I stepped back to using squid as a forward proxy. I found this description really straightforward to help me set up the yum configuration inside the VM, resulting in the following script:
sed -i -e "s/^mirrorlist/#mirrorlist/" \
-e "s%#baseurl=http://mirror.centos.org/centos%baseurl=http://centos.mirror-server.de%" \
This comments out the mirrorlist (s/^mirrorlist/#mirrorlist/) and comments in the baseurl to use (I chose http://centos.mirror-server.de because it is near to me), resulting in:
The next problem was to determine the IP address of my laptop from within the VM. I found no officially documented/supported mechanism. VMWare creates an interface called vmnet8 on my computer (which is a virtual network interface used for ‘shared’ networking by VMWare); ifconfig shows:
And we are looking for the IP address 192.168.117.1. I could set that value as an environment variable in Packer, I chose to determine the value from inside the VM instead, because I need that information later when the VM is running as well. Luckily the IP address of my laptop from within the VM always seems to be XXX.XXX.XXX.1. I just need the IP address of the VM (the guest) and replace the last number: 192.168.117.132 => 192.168.117.1
MY_IP=`ifconfig $INTERFACE | grep "inet " | sed "s/inet addr:\([0-9.]*\).*/\1/"`
HOST_IP=`echo $MY_IP | sed s/\.[0-9]*$/.1/`
My squid proxy will be running on port 3128, so I just have to check, if the proxy really is available:
curl --connect-timeout 1 http://www.google.com > /dev/null 2&>1
if [[ $? != 0 ]]; then
To speed things up, I first used SquidMan. Everything looked dandy, but when the VM tried to yum update and started to download the repository metadata (a 4 MB sqlite db), it started okay and got really slow. I mean really slow: instead of 20 seconds it estimated 2 hours. Some googling seemed to say that Squid waits for all the data to arrive and then hands it over to the client. So I waited. And waited. Nothing. Tried to download from within the VM bypassing the proxy: 20 seconds again. I googled, and googled some more until I was ready to give up. What I know about networking “fits on a stamp”, but I started a tcpdump nonetheless. And there was something curious: when I bypassed the proxy, everything looked normal (to me). But when I used the proxy, I saw many entries with IPV6 addresses instead of IPV4. So I tried to determine how to tell Squid not to use IPV6 or at least to prefer IPV4:
Now the speed was as expected. I still do not understand if the problem is with the chosen mirror (centos.mirror-server.de) or my ISP. I do not care.
In the meantime (I thought SquidMan might be the culprit) I switched over to a ‘normal’ squid, and learned some more stuff and ended with the following script to configure my squid (installed via homebrew on OSX, so YMMV):
# we need the directory, where the squid configuration file can be found
SQUID_DIR=`brew info squid | grep Cellar | sed "s/^\([^ ]*\).*/\1/"`
# insert some additional refresh patterns (before the other refresh patterns)
sed -i .org '/refresh_pattern .ftp/i \
refresh_pattern -i .rpm$ 129600 100% 129600 \
refresh_pattern -i .bz2$ 129600 100% 129600 \
# append some lines to the configuration
cat <<EOF >> $SQUID_DIR/etc/squid.conf
# log file locations:
# store objects up to:
maximum_object_size 16 MB
# I needed that at my home to avoid slow ipv6
# and we want the cache to survive a restart:
cache_dir ufs /usr/local/var/cache/squid 10000 16 256
the refresh_pattern are used to store the RPMs and keep them, even if no cache-headers are set. They have to be set before the existing refresh_patterns, so I insert them before the first pattern (/refresh_pattern .ftp/i). Then I set the log directories, the maximum object size (16 MB), the IPV4 (DNS via IPV4) and finally a cache directory so that the cached data survives a restart of the proxy. Done.
I am currently reading Laurent Bossavits’ fascinating book “The Leprechauns of Software Engineering”. He dissects one software engineering ‘myth’ after another: “Cone of Uncertainty”, “10x developer productivity” etc. What he is after is proof. Or at least some valid empirical data. And mostly he finds: not much at all. Sources cited often do not support the claim; there is no data given, or even better: an impressive graph (that surely must be based on empirical findings) turns out to be based on subjective experience.
I like his determination, he thoroughly tries to track down the original behind a lineage of sometimes inaccurate citations, even getting behind distortions of the original article.
Especially with the “cone of uncertainty” he does a convincing job: Have you ever experienced cost under runs? The graph surely looks impressive, but it does not say much more than the famous quip: “It’s difficult to make predictions, especially about the future” (ascribed to Mark Twain, but that is another story).
I still believe that in non-agile software process defects that have been introduced early are more costly to fix than defects introduced later. But there seems to be no empirical data. So if there is no data backing up this claim, is there at least a valid qualitative argument (for non-agile projects at least)?
I assume that the requirements document is simply shorter than the resulting software code. Now compare a wrong sentence (a defect) in a requirements document and a defect in a line of code. If the defect is found in testing or later then there is a high chance that the defect in the requirements document has influenced a large portion of the code, therefore I would conclude (qualitative reasoning!) that the defect is more costly to fix than the defect in a line of code. But probably comparing lines is problematic at least.
Let’s try another line of reasoning. Assuming again a software process based on the waterfall model: a defect found in testing that has been introduced in the requirements phase results in (1) a change in the requirements document, (2) probably a change in the design and (3) surely a change in the code. Depending on the rigidness of the process there might be some “quality gates”, “reviews” etc. to pass. Compare that to a defect introduced in coding: Only the code has to be changed. The code also has to be deployed and tested, but this holds true for both kinds of defects.
I still think, Laurent Bossavits has a very valid point: a paper that claims to be based on empirical data, hast to back up that claim, by showing us the data, so that we can check the validity of the claim. So read the book, even if you disagree. It’s fun and I think it will help you detect myths yourself.
I am currently working for two customers at three different locations (and at home). And the proxy settings differ at every location! Sometimes I have to switch from the internal network (with proxy) to my iPhone HotSpot and then back again.
Each time I have to change my proxy settings for
I tried to use a local proxy, but had some problems with Outlook (for OS X).
So I ended up creating a small shell script that checks for the proxy settings in OS X and patch Maven’s settings.xml and the git configuration (as well as setting the environment variable http_proxy). It has served me well in the last weeks, so I wanted to share it.
Determining the current default network
You can have multiple active network interfaces (e.g. “Wi-Fi” and your iPhone), but one is the ‘default’ interface, which you can determine with:
route -n get default
this gives you the network interface, e.g. ‘en0’.
But for the following steps I need the network service name, i.e. “Wi-Fi”, so we need to call
I am currently writing most of my stuff in MultiMarkDown. And since I prefer images over text, I have to integrate a whole bunch of images in my MultiMarkDown texts. And that is just cumbersome. The minimal image link is
but if you want to be able to reference the images (and add some HTMLL alt-text) it has to become
There was the usual (in this case rather unsuccessful) attempt to explain Monads – at least the presenter did not help me understand them (I now think Monads were only invented so that blogger and conference speakers have a topic; there are probably more attempts to explain monads than there are programming languages. Or perhaps monads were invented in the same vein as C++).
I must admit that there were quite some bad presentations, but strangely the mood and atmosphere were so positive and there were so many good sessions, that I simply sat there and tried to get a grip on the stuff of the previous session (or simply went to get the next Espresso).
When I had to choose one thread that was presented in many talks it would be:
Multithreading: Don’t do it!
From the Disruptor to Node.js and Michael Stonebreaker’s Session on VoltDB (see below).
Multi threading is hard. You might even get it right for now. But when the system grows and other programmers need to change parts of your code, things get ugly. And they get ugly fast. You might introduce a subtle race condition so that customers could see the data of other customers (guilty as charged). I have yet to create a deadlock in production code, but a colleague of mine had to track down a deadlock in my code, which was caused by someone else using my code in a totally unexpected way (local calls instead of remote calls were the main culprit). Call me a bad programmer, but I know of many more examples (I deliberately only chose my own blunders).
Michael Stonebreaker is still creating databases. He was (one of) the driving forces behind Ingres and PostGres. Now he is working on an in-memory-database: VoltDB, which sounds quite impressive. And I liked his opinionated take on the NoSQL trend. I do not agree with him, but I like a well argued new perspective. He has worked such a long time in the world of relational databases, that he might not see the full picture. I do not think that these new ‘kids’ will replace relational (SQL, ACID, etc.) databases, but I think NoSQL fills an important niche.
If you could not attend: all sessions have been recorded and many of them should show up on InfoQ over the next months.
My personal favorite was a presentation by Chris Granger of Light Table fame. Light Table is a new kind of programmer’s editor. Trying to keep everything “at hand”. He took his inspiration from an intensive user study he did while at Microsoft (working as program manager for Visual Studio) and from Bret Victor’s seminal talk Inventing on principle (If you haven’t seen it: you really should). Bret did a variation of his talk at Strange Loop as well, but he seemed not as focused as in the presentation I linked to.
The funniest presentation was again about a new editor. Kind of. Gary Bernhardt showed a new editor that worked in the terminal (like Vim and Emacs), but did quite astonishing stuff. Even some UML graphs created on the fly with GraphViz. But to be able to display images in the terminal he even had to create a new terminal. Which is about as scary as it gets, since terminals are very low level stuff (kernel intergration and all). So it was kind of a let down (but of the funny kind) when he admitted that everything was a lie. A lie to demonstrate how much we are kept locked up in some notion of the 70ies about what a terminal is (and should be).