To clarify, I want to be able to see the side-by-side diffs as rendered output. So if I delete a paragraph, the side by side view would know to space things correctly.
@Josh exactly. Though maybe it would show the deleted text in red or something. The idea is that if I use a WYSIWYG editor for my HTML content, I don't want to have to switch to HTML to do diffs. I want to do it with two WYSIWYG editors side by side maybe. Or at least display diffs side-by-side in an end-user friendly matter.
UPDATE: This library has been moved to GitHub.
To see the diff, however, you will more than likely want to use someone else's library. I used DaisyDiff, a Java library, for a similar project where my client was happy with seeing a single HTML rendering of the content with MS Word "track changes"-like markup.
So, you expect
<font face="Arial">Hi Mom</font>
<span style="font-family:Arial;">Hi Mom</span>
to be considered the same?
The output depends very much on the User Agent. Like Ionut Anghelcovici suggests, make an image. Do one for every browser you care about.
If it is XHTML (which assumes a lot on my part) would the Xml Diff Patch Toolkit help? http://msdn.microsoft.com/en-us/library/aa302294.aspx
For smaller differences you might be able to do a normal text-diff, and then analyse the missing or inserted pieces to see how to resolve it, but for any larger differences you're going to have a very tough time doing this.
For instance, how would you detect, and show, that a left-aligned image (floating left of a paragraph of text) has suddenly become right-aligned?
Using a text differ will break on non-trivial documents. Depending on what you think is intuitive, XML differs will probably generate diffs that aren't very good for text with markup. AFAIK, DaisyDiff is the only library specialized in HTML. It works great for a subset of HTML.
Compares and describes all the differences between two XML documents. The document comparison does not stop once the first unrecoverable difference is found, unlike the Diff class.
There's another nice trick you can use to significantly improve the look of a rendered HTML diff. Although this doesn't fully solve the initial problem, it will make a significant difference in the appearance of your rendered HTML diffs.
Explained in a little more detail:
If you want to use this technique, run your diff algorithm and insert a bunch of
<span>s or tiny
Consider using the output of links or lynx to render a text-only version of the html, and then diff that.
Following features are really nice:
©2020 All rights reserved.