Use javascript to get raw html code

I need to get the actual html code of an element in a web page.

For example if the actual html code inside the element is "How to fix"

Running this javascript getElementById('myE').innerHTML gives me "How to fix" which is the decoded form

How can I get "How to fix" using javascript?

Answers:

Answer

What you have should work:

Element test:

<div id="myE">How to&nbsp;fix</div>?

JavaScript test:

alert(document.getElementById("myE????????").innerHTML); //alerts "How to&nbsp;fix"

You can try it out here. Make sure that wherever you're using the result isn't show &nbsp; as a space, which is likely the case. If you want to show it somewhere that's designed for HTML, you'll need to escape it.

Answer

You cannot get the actual HTML source of part of your web page.

When you give a web browser an HTML page, it parses the HTML into some DOM nodes that are the definitive version of your document as far as the browser is concerned. The DOM keeps the significant information from the HTML—like that you used the Unicode character U+00A0 Non-Breaking Space before the word fix—but not the irrelevent information that you used it by means of an entity reference rather than just typing it raw ( ).

When you ask the browser for an element node's innerHTML, it doesn't give you the original HTML source that was parsed to produce that node, because it no longer has that information. Instead, it generates new HTML from the data stored in the DOM. The browser decides on how to format that HTML serialisation; different browsers produce different HTML, and chances are it won't be the same way you formatted it originally.

In particular,

  • element names may be upper- or lower-cased;

  • attributes may not be in the same order as you stated them in the HTML;

  • attribute quoting may not be the same as in your source. IE often generates unquoted attributes that aren't even valid HTML; all you can be sure of is that the innerHTML generated will be safe to use in the same browser by writing it to another element's innerHTML;

  • it may not use entity references for anything but characters that would otherwise be impossible to include directly in text content: ampersands, less-thans and attribute-value-quotes. Instead of returning &nbsp; it may simply give you the raw   character.

You may not be able to see that that's a non-breaking space, but it still is one and if you insert that HTML into another element it will act as one. You shouldn't need to rely anywhere on a non-breaking space character being entity-escaped to &nbsp;... if you do, for some reason, you can get that by doing:

x= el.innerHTML.replace(/\xA0/g, '&nbsp;')

but that's only escaping U+00A0 and not any of the other thousands of possible Unicode characters, so it's a bit questionable.

If you really really need to get your page's actual source HTML, you can make an XMLHttpRequest to your own URL (location.href) and get the full, unparsed HTML source in the responseText. There is almost never a good reason to do this.

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.