I did have to learn some Prolog when I was studying CS and back then it was one of those “why do we have to learn this when everybody is programming in C or Turbo Pascal” (yes, I’m old). For some strange reason things clicked for me quicker with Prolog than Lisp, which I now find quite ironic given that I’ve been using Emacs for since the early 1990s.
Quite a while ago, I answered a question about the basic deadlock scenario on Stack Overflow. More recently, I got an interesting comment on it. The poster asked if it was possible to get a deadlock with a single lock and an I/O operation. My first gut reaction was “no, not really”, but it got me thinking. So let’s try to unroll the scenario and see if we can reason at least about my gut feeling.
Update: The title of this post isn’t quite correct – using the homebrew cask mentioned in this blog post will install the current major version of the Oracle JDK. If you want to install a specific major version of the JDK (6 or 8 at the time of writing), I describe how to do that in this new blog post.
I’ve had a ‘manual’ install of JDK 8 on my Mac for quite a while, mainly to run Clojure. It was the typical “download from the Oracle website, then manually run the installer” deployment. As I move the management of more development tools from manual management over to homebrew, I decided to use homebrew to manage my Java installation also. It’s just so much easier to get updates and update information all in one place. Oh, and installs the same JDK anyway, just without all the additional pointy clicky work.
Removing the existing installation
Fortunately Oracle has uninstall operations on their website. It’s a rather manual approach but at least it is documented and the whole procedure consists of three commands. Unfortunately in my case this didn’t end up uninstalling an older version of the JDK. For some reason, I had ended up with both 1.8.0_60 and 1.8.0_131 installed on my machine, and Oracle’s uninstall instructions didn’t touch the 1.8.0_60 install in /System/Library/Frameworks/JavaVM.framework. I suspect this is an older JDK brought over from the Yosemite install and the consensus on the Internet I could find suggest to leave that alone as the system needs those.
Apparently in older versions of OS X is was possible to run /usr/libexec/java_home -uninstall to get rid of a Java install, but that option does not appear to work in OS X Sierra anymore
Installing Java using Homebrew
The installation via homebrew is about as simple as expected. I have cask installed already, so for me it’s a simple matter of running
brew cask install java
and it will install the latest Oracle JDK. You can use
brew cask info java
to verify which version it will install.
If you haven’t got homebrew installed, follow the installation instructions on docs.brew.sh and also make sure that you install cask:
brew tap caskroom/cask
After re-installing the JDK using homebrew, java_home also finally reports the correct version:
odie-pro:~ timo$ /usr/libexec/java_home /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home odie-pro:~ timo$
I’ve mentioned before that I prefer Mercurial to Git, at least for my own work. That said, git has a nice feature that allows you to cherry pick revisions to merge between branches. That’s extremely useful if you want to move a single change between branches and not do a full branch merge. Turns out mercurial has that ability, too, but it goes by a slightly different name.
There are actually two options in Mercurial – the older transplant extension and from Mercurial 2.0 onwards, the built-in graft command. I prefer to use the graft command, mainly because it is built into base mercurial and thus is available everywhere as long as one is running Mercurial 2.0 or up. Given that the current release is 4.0.1 as of the time of writing, you should really run something newer than 2.0. Also, graft uses mercurial’s merge abilities to cherry pick the change, you have a somewhat better chance of the change applying cleanly. Transplant uses the patch mechanism with works reasonably well, but in my opinion the merge system works better especially if you’re dealing with something that turns into a three way merge.
Usage is pretty simple – switch to the branch that you want to graft a change onto (the destination branch) and then graft away using the revision number of the change you want to use. Say, you want to graft a change 9534 to the release_356 branch:
hg co release_356 hg graft 9534
Note that hg graft does a merge and commit of the specific revision in one step as long as you don’t encounter a merge conflict. If you do encounter a merge conflict you’ll have to resolve it like you would resolve any other merge conflict, followed by a manual commit.
hg graft has additional functionality over and above simple cherry picking of one revision. For example, you can graft a range of revisions onto another branch. Have look at the documentation for hg graft for more information.
It might sound paradoxical, but in general, writing more code is easier than writing less code that accomplishes the same goals. Even if your code starts out clean, compact and beautiful, the code that is added later to cover the corner cases nobody thought of usually takes care of the code being well designed, elegant and beautiful. Agile programming offers a solution, namely constant refactoring, but who has time for that? That’s why I occasionally give myself the 10% code reduction challenge and I encourage you to do the same.
I’ve blogged about improving the performance of Git on Windows in the past and rightly labelled the suggested solution as a bad hack because it requires you to manually replace binaries that are part of the installation. For people who tend to use DVCSs from the command line, manually replacing binaries is unlikely to be a big deal but it’s clunky and should really be a wakeup call for some people to include a newer base system.
By now there is a much easier way to get the same performance improvement and this is to use Git for Windows instead of the default Windows git client from git-scm.com. Not only does the Git for Windows installer include the newer openssl and openssh binaries that I suggested dropping into the git installation directory in my original post, it is also a much newer version of git.
For me, installing the Git for Windows client kills a couple of birds with one stone.
First, it addresses a large part of my complaint that Windows is a second class citizen to the Git developers. Using git on Windows is still a tad clunkier than using it in its native environment (ie, the Unix world) but a dedicated project to improve the official command line client goes a long way to address this issue. Plus, the client is much more up to date compared to the official client from git-scm.com.
Second, addressing the performance issues that the official client has is a big deal, at least to those of us who need to work with git repositories in the multi-gigabyte size class. With repositories of that size, it does make a difference if your clone performance suddenly is an order of magnitude faster. In my case it also finally allows me to use these large git repositories with Mercurial’s hg-git plugin, which simply was not possible before.
I’ve not tried to verify if the newer openssh and openssl binaries address the issue I described in Making git work better on Windows. My assumption is that it’s not the case as I saw the same behaviour with the manually updated binaries. For use with a CI system like Jenkins I still recommend to use http access to the repository.
Over on bitbashing.io, Matt Kline has an interesting blog post on how Shipping Culture is hurting us as an industry. Hop over there and read it now, because he addresses another case of the pendulum having swung too far. Your developers take a long time to get a new version out? I know, let’s make them ship something half baked. Quality is overrated anyway. Especially when you don’t have a reputation to lose as a maker of quality software.
If you only read one section of the post, read the part about Anti-Intellectual bandwagons. It summarizes one of my big annoyances with this industry, namely that we seem to dabble in collective amnesia combined with a side helping of AD… oh look, shiny!
That said, we are good at reinventing the octagonal wheel with a slight wobble around the axle.
I encounter this on a fairly regular basis – a project uses a third-party library and there is either a bug in the library that we can’t seem to avoid hitting, or there’s a feature missing or not 100% ideal for our use case.
Especially when dealing with an open source library, at this point someone will inevitably suggest that we have the source, so we should just fix/hack/modify the library and get on with our lives. I’m massively opposed to that approach, with essentially one exception I’ll mention towards the end.
So, why am I so opposed to changing a third-party library, even if I have the code and the rights to do so?
It’s very simple – it adds to the maintenance headache. If I suddenly find myself having to change outside code, I have to be able to maintain it going forward. That means I:
- Suddenly have to have a place to keep it in source control so I can maintain my patches going forward
- Updating the library becomes an exercise in merging the local changes with the remote changes and should the author of the library make substantial structural changes, it’s not even clear if you can easily integrate the new version into your patched version. You might well end up in a situation where you locked yourself out of taking future updates because the effort involved in porting your changes forward is so big that you can’t reasonably accommodate it. You find yourself in a dark room, owning another piece of code that by rights, you really shouldn’t and it’s going to unnecessarily increase your maintenance effort because all of a sudden, you’re not only responsible for bugs in your own code, but also bugs in the third-party library code that isn’t so third-party library anymore.
There is one exception I’d make to the rule of not touching third-party code and that is, if you talked to the maintainer/owner of the library, agreed that what you’re trying to do is actually beneficial to everybody involved and most importantly, that the owner of the library is willing to take your change and integrate it into their next release. Most open source projects will be more than happy to accept contributions this way as long as they fit in with their vision of how the code should work. Same goes for commercial libraries, too. Just talk to them beforehand to make sure it’s not a misunderstanding of the code at your end.
Under those particular circumstances, yes, I’d accept that it’s OK to change third-party code. Any other reason, don’t do it.
Butbutbutbut, I absolutely have to change an important third-party library because their API doesn’t fit into our model anymore?
No you don’t, sorry. Either replace the library or write a shim that takes care of your needs, because I don’t need to have a discussion with you in a few years time as to why we aren’t able to upgrade a library that we decided we needed to hack and now are tied to an old code base that’s potentially buggy and full of security holes.
I grew up as a software developer on a steady diet of Dr Dobb’s magazines. I was hooked the first time I came across an issue of the magazine as a student in the university library and for most of my career I have been a subscriber to it, until the print magazine was cancelled. I was sad to read this morning that after 38 years of publication, first in print and then on the web, the online edition has now met the same fate.
I probably learned more about real world software development from reading articles in the magazine than the courses I took at university. Heck, even if the articles initially completely went over my head, I learned from them. I remember reading the series of articles about 386BSD and being strangely fascinated by them, even though I had at that time trouble understanding parts of it. In a strange “what comes around goes around” fashion, I’ve been using FreeBSD – which is a direct descendant of 386BSD – for almost 20 years now.
If Dr Dobb’s hadn’t opened my eyes to the strange and wonderful world of software development with all its facets, I doubt I’d be where I am today, still fascinated by computers and programming, and occasionally still infuriated by both.
RIP, Dr Dobb’s. Thanks for the ride.
Image of the title page of the first issue courtesy of Wikipedia.
In a previous blog post I explained how you can substantially improve the performance of git on Windows updating the underlying SSH implementation. This performance improvement is very worthwhile in a standard Unix-style git setup where access to the git repository is done using ssh as the transport layer. For a regular development workstation, this update works fine as long as you keep remembering that you need to check and possibly update the ssh binaries after every git update.
I’ve since run into a couple of other issues that are connected to using OpenSSH on Windows, especially in the context of a Jenkins CI system.
Accessing multiple git repositories via OpenSSH can cause problems on Windows
I’ve seen this a lot on a Jenkins system I administer.
When Jenkins is executing a longer-running git operation like a clone or large update, it can also check for updates on another project. During the check, you’ll suddenly see an “unrecognised host” message pop up on the console you’re running Jenkins from and it’s asking you to confirm the host fingerprint/key for the git server it uses all the time. What’s happening behind the scenes is that the first ssh process is locking .ssh/known_hosts and the second ssh process suddenly can’t check the host key due to the lock.
This problem occurs if you’re using OpenSSH on Windows to access your git server. PuTTY/Pageant is the recommended setup but I personally prefer using OpenSSH because if it is working, it’s seamless the same way it works on a Unix machine. OK, the real reason is that I tend to forget to start pageant and load its keys but we don’t need to talk about that here.
One workaround that is being suggested for this issue is to turn off the key check and make /dev/null “storage” for known_hosts. I don’t personally like that approach much as it feels wrong to me – why add security by insisting on using ssh as a transport and then turn off said security, which results in a somewhat performance challenged git on Windows with not much in the way of security?
Another workaround improves performance, gets rid of the parallel access issue and isn’t much less safe.
Use http/https transport for git on Windows
Yes, I know that git is “supposed” to use ssh, but using http/https access on Windows just works better. I’m using the two interchangeably even though my general preference would be to just use https. If you have to access the server over the public Internet and it contains confidential information, I’d probably still use ssh, but I’d also question why you’re not accessing it over a VPN tunnel. But I digress.
The big advantages of using http for git on Windows is that it works better than ssh simply by virtue of not being a “foreign object” in the world of Windows. There is also the bonus that clones and large updates tend to be faster even compared to a git installation with updated OpenSSH binaries. As an aside, when I tested the OpenSSH version that is shipped with git for Windows against PuTTY/Pageant, the speeds are roughly the same so you’ll be seeing the performance improvements no matter which ssh transport you use.
As a bonus, it also gets rid of the problematic race condition that is triggered by the locking of known_hosts.
It’s not all roses though as it’ll require some additional setup on behalf of your git admin. Especially if you use a tool like gitolite for access control, the fact that you end up with two paths in and out of your repository (ssh and http) means that you essentially have to manage two types of access control as the http transport needs its own set of access control. Even with the additional setup cost, in my experience offering both access methods is worth it if you’re dealing with repositories that are a few hundred megabytes in size or even gigabytes in size. It still takes a fair amount of time to shovel an large unbundled git repo across the wire this way, but you’ll be drinking less coffee while waiting for it to finish.