Simple HTML Pretty Print

http://jsfiddle.net/JamesKyle/L4b8b/

This may be a futile effort, but I personally think its possible.

I'm not the best at Javascript or jQuery, however I think I have found a simple way of making a simple prettyprint for html.

There are four types of code in this prettyprint:

  1. Plain Text
  2. Elements
  3. Attributes
  4. Values

In order to stylize this I want to wrap elements, attibutes and values with spans with their own classes.


The first way I have of doing this is to store every single kind of element and attribute (shown below) and then wrapping them with the corresponding spans

$(document).ready(function() {

    $('pre.prettyprint.html').each(function() {

        $(this).css('white-space','pre-line');

        var code = $(this).html();

        var html-element = $(code).find('a, abbr, acronym, address, area, article, aside, audio, b, base, bdo, bdi, big, blockquote, body, br, button, canvas, caption, cite, code, col, colgroup, command, datalist, dd, del, details, dfn, div, dl, dt, em, embed, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, head, header, hgroup, hr, html, i, img, input, ins, kbd, keygen, label, legend, li, link, map, mark, meta, meter, nav, noscript, object, ol, optgroup, option, output, p, param, pre, progress, q, rp, rt, ruby, samp, script, section, select, small, source, span, strong, summary, style, sub, sup, table, tbody, td, textarea, tfoot, th, thead, title, time, tr, track, tt, ul, var, video, wbr');

        var html-attribute = $(code).find('abbr, accept-charset, accept, accesskey, actionm, align, alink, alt, archive, axis, background, bgcolor, border, cellpadding, cellspacing, char, charoff, charset, checked, cite, class, classid, clear, code, codebase, codetype, color, cols, colspan, compact, content, coords, data, datetime, declare, defer, dir, disabled, enctype, face, for, frame, frameborder, headers, height, href, hreflang, hspace, http-equiv, id, ismap, label, lang, language, link, longdesc, marginheight, marginwidth, maxlength, media, method, multiple, name, nohref, noresize, noshade, nowrap, object, onblur, onchange,onclick ondblclick onfocus onkeydown, onkeypress, onkeyup, onload, onmousedown, onmousemove, onmouseout, onmouseover, onmouseup, onreset, onselect, onsubmit, onunload, profile, prompt, readonly, rel, rev, rows, rowspan, rules, scheme, scope, scrolling, selected, shape, size, span, src, standby, start, style, summary, tabindex, target, text, title, type, usemap, valign, value, valuetype, version, vlink, vspace, width');

        var html-value = $(code).find(/* Any instance of text inbetween two parenthesis */);

        $(element).wrap('<span class="element" />');
        $(attribute).wrap('<span class="attribute" />');
        $(value).wrap('<span class="value" />');

        $(code).find('<').replaceWith('&lt');
        $(code).find('>').replaceWith('&gt');
    });
});

The second way I thought of was to detect elements as any amount of text surrounded by two < >'s, then detect attributes as text inside of an element that is either surrounded by two spaces or has an = immediately after it.

$(document).ready(function() {

    $('pre.prettyprint.html').each(function() {

        $(this).css('white-space','pre-line');

        var code = $(this).html();

        var html-element = $(code).find(/* Any instance of text inbeween two < > */);

        var html-attribute = $(code).find(/* Any instance of text inside an element that has a = immeadiatly afterwards or has spaces on either side */);

        var html-value = $(code).find(/* Any instance of text inbetween two parenthesis */);

        $(element).wrap('<span class="element" />');
        $(attribute).wrap('<span class="attribute" />');
        $(value).wrap('<span class="value" />');

        $(code).find('<').replaceWith('&lt');
        $(code).find('>').replaceWith('&gt');
    });
});

How would either of these be coded, if at all possible

Again you can see this as a jsfiddle here: http://jsfiddle.net/JamesKyle/L4b8b/

Answers:

Answer

Don't be so sure you have gotten all there is to pretty-printing HTML in so few lines. It took me a little more than a year and 2000 lines to really nail this topic. You can just use my code directly or refactor it to fit your needs:

https://github.com/prettydiff/prettydiff/blob/master/lib/markuppretty.js (and Github project)

You can demo it at http://prettydiff.com/?m=beautify&html

The reason why it takes so much code is that people really don't seem to understand or value the importance of text nodes. If you are adding new and empty text nodes during beautification then you are doing it wrong and are likely corrupting your content. Additionally, it is also really ease to screw it up the other way and remove white space from inside your content. You have to be careful about these or you will completely destroy the integrity of your document.

Also, what if your document contains CSS or JavaScript. Those should be pretty printed as well, but have very different requirements from HTML. Even HTML and XML have different requirements. Please take my word for it that this is not a simple thing to figure out. HTML Tidy has been at this for more than a decade and still screws up a lot of edge cases.

As far as I know my markup_beauty.js application is the most complete pretty-printer ever written for HTML/XML. I know that is a very bold statement, and perhaps arrogant, but so far its never been challenged. Look my code and if there is something you need that it is not doing please let me know and I will get around to adding it in.

Answer

Personally I would wrap HTML with pre and not try to do any pretty printing. There are TONS of libraries for doing code formatting just google pretty print. Just wrapping HTML with pre will automatically make it 'printed' code.

For JavaScript, you can use JSON.stringify to recreate the code by passing in a number of spaces for nested structures.

JSON.stringify({ name: 'value' }, null, 2); //Change to four, for four spaces
Answer

If you're doing this client-side, and you already have the DOM, then it would be more efficient to serialise it yourself inserting the appropriate tags as you go rather than serialising the whole subtree at once and then trying to reparse it.

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.