Post

Template für Personalberater-Antwort auf Xing

Leave a reply

Sehr geehrter Herr XXX!

Kontaktanfragen von Menschen, die ich nicht kenne, nehme ich nicht an.
Manchmal mache ich eine Ausnahme, wenn jemand einen Grund für die Kontaktaufnahme nennt, der mich interessiert.

Da ich sehe, dass Sie Personalberater sind, bitte ich grundsätzlich

http://fabian-beiner.de/de/artikel/von-blinden-headhuntern-und-blauaugigen-personalern/

zu berücksichtigen — worauf ich, unter “ich suche” in meinem Profil verweise (!). Da Sie das offensichtlich nicht berücksichtigt haben, möchte ich höflich darum bitten, von Ihnen keine weiteren Nachrichten zu erhalten.

Mit freundlichen Grüßen
Victor Volle

Post

Packer templates in YAML

Leave a reply

In my last post I already praised Packer. But there is one thing I really do not like about it: the main configuration file is JSON. Now JSON is a great format for many purposes, but I like to be able to simply comment stuff out while testing and I like to add real comments to such configuration files, whether they have to be maintained by someone else or by myself. And finally Packer templates contain a lot of duplication if you use multiple “builders” (for VMWare, VirtualBox etc.): configuration items like the iso file and the checksum are the same whether you build for VMWare or VirtualBox.

You might even have quite some duplication across multiple Packer templates. But Mitchell Hashimoto (the (main) author of Packer) has a point when he says:

“This is one of the primary reasons we chose JSON as the configuration format: it is highly convenient to write a script to generate the configuration.” (See the comment in this issue)

Now the original author of that issue wanted an ‘include’ mechanism and pasted some Python code to achieve that. I additionally wanted to avoid duplication in a single file and came up with a script, that allows me to use YAML “merge keys”:

---
_macros:
- &basic_builder
  iso_url: 'CentOS-6.5-x86_64-bin-DVD1.iso'
  iso_checksum: '0d9dc37b5dd4befa1c440d2174e88a87'
  iso_checksum_type: md5
     ...
builders:
- <<: *basic_builder
  type: virtualbox-iso
  ...
- <<: *basic_builder
  type: vmware-iso
     ...

everything in &basic_builder gets copied wherever a <<: *basic_builder occurs. You can add and even override items. This is plain YAML. The only problem: the original entry &basic_builder will be in the result document. Therefore the script removes any top level (!) _macros entry.

To get me started I converted the existing JSON to YAML with the same script. Depending on the extension of the input file either JSON or YAML is created:

usage: yaml-to-json.py [-h] [--force] SOURCEFILE

Convert YAML to JSON (and vice versa)

positional arguments:
  SOURCEFILE  the file name to be converted. If it ends with '.json' a YAML
              file will be created. All other file extensions are assumed to
              by YAML and a JSON file will be created

optional arguments:
  -h, --help  show this help message and exit
  --force     overwrite target file, even if it is newer than the source file
Post

Yak shaving to speed up creating VMs

1 comment


tl;dr

In the last weeks I am building VMs for VMWare Fusion with Packer, which really works well (considering Packer, not so much VMWare).

This is just in preparation for playing with Docker on CentOS, so I needed to install a newer Kernel (3.14 to be exact). The biggest hurdle was to install the VMWare tools for the newer kernel. The “thinprint” service never started and the hgfs file system never started either, therefore Vagrant could not share a folder with the VM (solved that by using nfs).

Trying different ways to get everything working I built one VM after another. Sitting in Cafés or in my hotel room with abysmal internet access made that so much slower. Downloading all the extra RPMs, the newer kernel etc. challenged my patience.

rsync the repo?

So I came up with the idea to have all the RPMs locally. But mirroring a complete repository takes some 30 GB (and many hours to rsync). Tried that. Cancelled.

reverse proxy?

Next I wanted to use a proxy. I found some examples on how to configure squid. But they used it as a reverse proxy. And even taking some time reading up on it, I couldn’t understand the configuration. Why a reverse proxy?
Normally a reverse proxy is for connections from the ‘outside’ to protect a slow backend. My server still should be able to access the internet in general, so this didn’t sound right (please use the comments to increase my knowledge).

configure yum

So I stepped back to using squid as a forward proxy. I found this description really straightforward to help me set up the yum configuration inside the VM, resulting in the following script:

  sed -i -e "s/^mirrorlist/#mirrorlist/" \
         -e "s%#baseurl=http://mirror.centos.org/centos%baseurl=http://centos.mirror-server.de%" \
         /etc/yum.repos.d/CentOS-Base.repo  

This comments out the mirrorlist (s/^mirrorlist/#mirrorlist/) and comments in the baseurl to use (I chose http://centos.mirror-server.de because it is near to me), resulting in:

#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
baseurl=http://centos.mirror-server.de/$releasever/os/$basearch/  

Then I add the proxy configuration to yum.conf:

echo "proxy=$http_proxy" >> /etc/yum.conf  

and remove the fastestmirror plugin:

  mv -f /etc/yum/pluginconf.d/fastestmirror.conf $HOME/fastestmirror.conf  

IP address of the host

The next problem was to determine the IP address of my laptop from within the VM. I found no officially documented/supported mechanism. VMWare creates an interface called vmnet8 on my computer (which is a virtual network interface used for ‘shared’ networking by VMWare); ifconfig shows:

vmnet8: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    ether 00:50:56:c0:00:08
    inet 192.168.117.1 netmask 0xffffff00 broadcast 192.168.117.255

And we are looking for the IP address 192.168.117.1. I could set that value as an environment variable in Packer, I chose to determine the value from inside the VM instead, because I need that information later when the VM is running as well. Luckily the IP address of my laptop from within the VM always seems to be XXX.XXX.XXX.1. I just need the IP address of the VM (the guest) and replace the last number: 192.168.117.132 => 192.168.117.1

MY_IP=`ifconfig $INTERFACE | grep "inet " | sed "s/inet addr:\([0-9.]*\).*/\1/"`
HOST_IP=`echo $MY_IP | sed s/\.[0-9]*$/.1/`

My squid proxy will be running on port 3128, so I just have to check, if the proxy really is available:

curl --connect-timeout 1 http://www.google.com > /dev/null 2&>1
if [[ $? != 0 ]]; then
   ...
fi  

configuring squid

To speed things up, I first used SquidMan. Everything looked dandy, but when the VM tried to yum update and started to download the repository metadata (a 4 MB sqlite db), it started okay and got really slow. I mean really slow: instead of 20 seconds it estimated 2 hours. Some googling seemed to say that Squid waits for all the data to arrive and then hands it over to the client. So I waited. And waited. Nothing. Tried to download from within the VM bypassing the proxy: 20 seconds again. I googled, and googled some more until I was ready to give up. What I know about networking “fits on a stamp”, but I started a tcpdump nonetheless. And there was something curious: when I bypassed the proxy, everything looked normal (to me). But when I used the proxy, I saw many entries with IPV6 addresses instead of IPV4. So I tried to determine how to tell Squid not to use IPV6 or at least to prefer IPV4:

dns_v4_first on  

Now the speed was as expected. I still do not understand if the problem is with the chosen mirror (centos.mirror-server.de) or my ISP. I do not care.

In the meantime (I thought SquidMan might be the culprit) I switched over to a ‘normal’ squid, and learned some more stuff and ended with the following script to configure my squid (installed via homebrew on OSX, so YMMV):

# we need the directory, where the squid configuration file can be found  
SQUID_DIR=`brew info squid | grep Cellar | sed "s/^\([^ ]*\).*/\1/"`  

# insert some additional refresh patterns (before the other refresh patterns)  
sed -i .org '/refresh_pattern .ftp/i \  
refresh_pattern -i .rpm$ 129600 100% 129600 \  
refresh_pattern -i .bz2$ 129600 100% 129600 \  
' $SQUID_DIR/etc/squid.conf  

# append some lines to the configuration  
cat <<EOF >> $SQUID_DIR/etc/squid.conf  

# log file locations:  
cache_access_log stdio:/usr/local/var/logs/squid/squid-access.log  
cache_store_log stdio:/usr/local/var/logs/squid/squid-store.log  
cache_log /usr/local/var/logs/squid/squid-cache.log  

# store objects up to:  
maximum_object_size 16 MB  

# I needed that at my home to avoid slow ipv6  
dns_v4_first on  

# and we want the cache to survive a restart:  
cache_dir ufs /usr/local/var/cache/squid 10000 16 256  
EOF  

the refresh_pattern are used to store the RPMs and keep them, even if no cache-headers are set. They have to be set before the existing refresh_patterns, so I insert them before the first pattern (/refresh_pattern .ftp/i). Then I set the log directories, the maximum object size (16 MB), the IPV4 (DNS via IPV4) and finally a cache directory so that the cached data survives a restart of the proxy. Done.

Post

The cost of defects …

Leave a reply

I am currently reading Laurent Bossavits’ fascinating book “The Leprechauns of Software Engineering”. He dissects one software engineering ‘myth’ after another: “Cone of Uncertainty”, “10x developer productivity” etc. What he is after is proof. Or at least some valid empirical data. And mostly he finds: not much at all. Sources cited often do not support the claim; there is no data given, or even better: an impressive graph (that surely must be based on empirical findings) turns out to be based on subjective experience.

I like his determination, he thoroughly tries to track down the original behind a lineage of sometimes inaccurate citations, even getting behind distortions of the original article.

Especially with the “cone of uncertainty” he does a convincing job: Have you ever experienced cost under runs? The graph surely looks impressive, but it does not say much more than the famous quip: “It’s difficult to make predictions, especially about the future” (ascribed to Mark Twain, but that is another story).

relative costs to fix defects (from: http://www.infoq.com/resource/articles/resilient-security-architecture/en/resources/figure1.jpg)

I still believe that in non-agile software process defects that have been introduced early are more costly to fix than defects introduced later. But there seems to be no empirical data. So if there is no data backing up this claim, is there at least a valid qualitative argument (for non-agile projects at least)?

I assume that the requirements document is simply shorter than the resulting software code. Now compare a wrong sentence (a defect) in a requirements document and a defect in a line of code. If the defect is found in testing or later then there is a high chance that the defect in the requirements document has influenced a large portion of the code, therefore I would conclude (qualitative reasoning!) that the defect is more costly to fix than the defect in a line of code. But probably comparing lines is problematic at least.

Let’s try another line of reasoning. Assuming again a software process based on the waterfall model: a defect found in testing that has been introduced in the requirements phase results in (1) a change in the requirements document, (2) probably a change in the design and (3) surely a change in the code. Depending on the rigidness of the process there might be some “quality gates”, “reviews” etc. to pass. Compare that to a defect introduced in coding: Only the code has to be changed. The code also has to be deployed and tested, but this holds true for both kinds of defects.

I still think, Laurent Bossavits has a very valid point: a paper that claims to be based on empirical data, hast to back up that claim, by showing us the data, so that we can check the validity of the claim. So read the book, even if you disagree. It’s fun and I think it will help you detect myths yourself.

Post

Switching proxy settings for Maven, git, etc. (on OS X)

Leave a reply

proxies

I am currently working for two customers at three different locations (and at home). And the proxy settings differ at every location! Sometimes I have to switch from the internal network (with proxy) to my iPhone HotSpot and then back again.

Each time I have to change my proxy settings for

  • git
  • Maven
  • homebrew/curl/…

I tried to use a local proxy, but had some problems with Outlook (for OS X).

So I ended up creating a small shell script that checks for the proxy settings in OS X and patch Maven’s settings.xml and the git configuration (as well as setting the environment variable http_proxy). It has served me well in the last weeks, so I wanted to share it.

Determining the current default network

You can have multiple active network interfaces (e.g. “Wi-Fi” and your iPhone), but one is the ‘default’ interface, which you can determine with:

   route -n get default

this gives you the network interface, e.g. ‘en0’.
But for the following steps I need the network service name, i.e. “Wi-Fi”, so we need to call

   networksetup -listnetworkserviceorder

which produces something like:

   (1) Wi-Fi
   (Hardware Port: Wi-Fi, Device: en0)

   (2) iPhone
   (Hardware Port: iPhone USB, Device: en5)

I have to grep for the “Device” (and then get the previous line, etc.).

After all this I have the default network service name.

Determining the proxy

explicit proxy settings

If the host and port of the proxy are set explicitely, I can retrieve the sttings with

   networksetup -getwebproxy "<network service name>"

The above statement prints something like:

   Enabled: Yes
   Server: 10.2.58.17
   Port: 8080
   Authenticated Proxy Enabled: 0

proxy.pac

But if I have a proxy.pac configured, I need to call

   networksetup -getautoproxyurl Wi-Fi

which returns the URL for the proxy.pac file. This file is a simple JavaScript file that can be parsed with the help of pacparser

Configuring Git, Maven, etc.

Git config

When I have the PROXY_HOST and the PROXY_PORT, I can set some environment variables and tell git to use the proxy:

   git config --global http.proxy "http://$PROXY_HOST:$PROXY_PRT"

Maven proxy settings

The proxy settings for Maven are located in $HOME/.m2/settings.xml.
I have created a simple entry for the proxy, the id must be “env-proxy”:

   <proxies>
     <proxy>
       <id>env-proxy</id>
       <active>false</active>
       <protocol>http</protocol>
       <host>proxy</host>
       <port>8080</port>
     </proxy>
   </proxies>

This whole block is replaced by the correct settings with the active-flag switched to true.

Un-setting

Invoking the script when no proxy can be determined removes/deactivates the proxy settings (git, Maven’s settings.xml and environment variables)

Install

  1. install pacparser
  2. clone https://github.com/vrvolle/proxy-settings-osx.git
  3. put the scripts in a directory in your PATH
  4. make them executable
  5. add a proxy entry “env-proxy” to your settings.xml

Invoke

To invoke the script, you must source it. Otherwise the environment variables will not be set:

   source <path-to-script>/set-http-proxy
Post

Automator Workflow for MultiMarkDown image links

Leave a reply

I am currently writing most of my stuff in MultiMarkDown. And since I prefer images over text, I have to integrate a whole bunch of images in my MultiMarkDown texts. And that is just cumbersome. The minimal image link is

   ![caption](img/neo4j-cypher-graph-naive.png)

but if you want to be able to reference the images (and add some HTMLL alt-text) it has to become

   ...
   ![caption][neo4j-cypher-graph-naive]
   ...

   [neo4j-cypher-graph-naive]: img/neo4j-cypher-graph-naive.png "caption"

and with a link to that image that works well enough in HTML and LaTeX/PDF you have to add:

   (s. [caption](#neo4j-cypher-graph-naive))

That’s quite some typing. So in the spirit of Brett Terpstras Markdown Service Tools, I created my own (very first service) that allows me to

  1. select a file in the (Path) Finder
    select the service
  2. enter the caption
    enter the caption
  3. and have everything put into the Clipboard, so that
  4. I can paste it into my editor of choice

I have uploaded the Automator Workflow at

https://dl.dropboxusercontent.com/u/2969865/md-image-link.workflow.zip

I will ask Brett to include it in his collection

Post

Multithreading: Don’t do it!

2 comments
Strange Loop 2012

It seems that I mostly get to write a blog entry when I have been to a conference. This year I went to Strange Loop. Impressive. It is organized by Alex Miller (you might know his hilarious Maven Adoption Curve) – mostly in his spare time! It still feels like a small conference, even though there were more than 900 participants. It is a mixture of straight stuff like Brendan Eich’s tour de force of all new features and changes in the next JavaScript (EcmaScript) version (ES6). And stranger stuff like Daniel Spiewaks “Deconstructing P vs NP (or why I hate Sudoku)”.

There was the usual (in this case rather unsuccessful) attempt to explain Monads – at least the presenter did not help me understand them (I now think Monads were only invented so that blogger and conference speakers have a topic; there are probably more attempts to explain monads than there are programming languages. Or perhaps monads were invented in the same vein as C++).

I must admit that there were quite some bad presentations, but strangely the mood and atmosphere were so positive and there were so many good sessions, that I simply sat there and tried to get a grip on the stuff of the previous session (or simply went to get the next Espresso).

When I had to choose one thread that was presented in many talks it would be:

Multithreading: Don’t do it!

From the Disruptor to Node.js and Michael Stonebreaker’s Session on VoltDB (see below).

Multi threading is hard. You might even get it right for now. But when the system grows and other programmers need to change parts of your code, things get ugly. And they get ugly fast. You might introduce a subtle race condition so that customers could see the data of other customers (guilty as charged). I have yet to create a deadlock in production code, but a colleague of mine had to track down a deadlock in my code, which was caused by someone else using my code in a totally unexpected way (local calls instead of remote calls were the main culprit). Call me a bad programmer, but I know of many more examples (I deliberately only chose my own blunders).

Michael Stonebreaker is still creating databases. He was (one of) the driving forces behind Ingres and PostGres. Now he is working on an in-memory-database: VoltDB, which sounds quite impressive. And I liked his opinionated take on the NoSQL trend. I do not agree with him, but I like a well argued new perspective. He has worked such a long time in the world of relational databases, that he might not see the full picture. I do not think that these new ‘kids’ will replace relational (SQL, ACID, etc.) databases, but I think NoSQL fills an important niche.

If you could not attend: all sessions have been recorded and many of them should show up on InfoQ over the next months.

My personal favorite was a presentation by Chris Granger of Light Table fame. Light Table is a new kind of programmer’s editor. Trying to keep everything “at hand”. He took his inspiration from an intensive user study he did while at Microsoft (working as program manager for Visual Studio) and from Bret Victor’s seminal talk Inventing on principle (If you haven’t seen it: you really should). Bret did a variation of his talk at Strange Loop as well, but he seemed not as focused as in the presentation I linked to.

The funniest presentation was again about a new editor. Kind of. Gary Bernhardt showed a new editor that worked in the terminal (like Vim and Emacs), but did quite astonishing stuff. Even some UML graphs created on the fly with GraphViz. But to be able to display images in the terminal he even had to create a new terminal. Which is about as scary as it gets, since terminals are very low level stuff (kernel intergration and all). So it was kind of a let down (but of the funny kind) when he admitted that everything was a lie. A lie to demonstrate how much we are kept locked up in some notion of the 70ies about what a terminal is (and should be).

If you do not know Gary Bernhardt, please take a look at Wat. Brendan Eich (see above) even used the Wat guffaws to demonstrate some of the new features of the next JavaScript standard.