by Chas Emerick
In the process of building a new web-based document-centric service, it became clear that I needed a good in-browser visual diff tool. I've become friends with a number of desktop "thick client" diff tools over the years, but the interface to this new service is 100% through the browser, and all those old friends aren't amenable to diffing web-based resources.
Some web searches didn't turn up anything particularly promising. I was looking for an in-browser diff tool, preferably in Javascript (but I suppose Flash would have done the trick, too). I found a few not-so-great Java applets that would do the bare minimum, but nothing ideal. There were a few javascript diff algorithm implementations (like this), but nothing that could be considered a complete solution.
So, I built jsdifflib over a weekend in February of 2007.
jsdifflib is a Javascript library that provides:
Yes, I ripped off the formatting of the diff view from the Trac project. It's a near-ideal presentation of diff data as far as I'm concerned. If you don't agree, you can hack the CSS to your heart's content.
The main reason why I reimplemented Python's difflib module in Javascript to serve as the algorithmic basis for jsdifflib was that I didn't want to mess with the actual diff algorithm -- I wanted to concentrate on getting the in-browser view right. However, because jsdifflib's API matches Python's difflib's SequenceMatcher class in its entirety, it's trivial to do the actual diffing on the server-side, using Python, and pipe the results of that diff calculation to your in-browser diff view. So, you have the choice of doing everything in Javascript on the browser, or falling back to server-side diff processing if you are diffing really large files.
Most of the time, we do the latter, simply because while jsdifflib is pretty fast all by itself, and is totally usable for diffing "normal" files (i.e. fewer than 5,000-10,000 lines), we regularly need to diff files that are 1 or 2 orders of magnitude larger than that. For that, server-side diffing is a necessity.
You can give jsdifflib a try without downloading anything. Just click the link below, put some content to be diffed in the two textboxes, and diff away.
That page also contains all of the examples you'll need to use jsdifflib yourself, but let's look at them here, anyway.
Here's the function from the demo HTML file linked to above that diffs the two pieces of text entered into the textboxes on the page:
There's not a whole lot to say about this function. The most notable aspect of it is that the diffview.buildView() function takes an object/map with specific attributes, rather than a list of arguments. Those attributes are mostly self-explanatory, but are nonetheless described in detail in code documentation in diffview.js.
This isn't enabled in the demo link above, but I've included it to exemplify how one might use the opcode output from a web-based Python backend to drive jsdifflib's diff view.
As you can see, I'm partial to using dojo for ajaxy stuff. All that is happening here is the base and new text is being POSTed to a Python server-side process (we like pylons, but you could just as easily use a simple Python script as a cgi). That process then needs to diff the provided text using an instance of Python's difflib.SequenceMatcher class, and return the opcodes from that SequenceMatcher instance to the browser (in this case, using JSON serialization). In the interest of completeness, here's the controller action from our pylons application that does this (don't try to match up the parameters shown below with the POST parameters shown in the Javascript function above; the latter is only here as an example):
The top priorities are implementing ignoring of empty lines, and the indication of diffs at the character level with sub-highlighting (similar to what Trac's diff view does).
I'd also like to see the difflib.SequenceMatcher reimplementation gain some more speed -- it's virtually a line-by-line translation from the Python implementation, so there's plenty that could be done to make it more performant in Javascript. However, that would mean making the reimplementation diverge even more from the "reference" Python implementation. Given that I don't really want to worry about the algorithm, that's not appealing. I'd much rather use a server-side process when the in-browser diffing is a little too pokey.
Other than that, I'm open to suggestions.
jsdifflib carries a BSD license. As such, it may be used in other products or services with appropriate attribution (including commercial offerings). The license is prepended to each of jsdifflib's files.
jsdifflib consists of three files -- two Javascript files, and one CSS file. Why two Javascript files? Because I wanted to keep the reimplementation of the python difflib.SequenceMatcher class separate from the actual visual diff view generator. Feel free to combine and/or optimize them in your deployment environment.