Today while I was skimming through the feeds from Hacker news, I came across this interesting post -- “A file that’s both an acceptable HTML page and a JPEG image”. Apparently this url (http://lcamtuf.coredump.cx/squirrel/) can be viewed as a HTML document when opened on a browser and the same can be used as a source to the image element. Yes, it does work. As mentioned on the above page, there is no server side trick involved in this. After inspecting the page for sometime and trying out different tools in my arsenal, I have figured that this is related to Postel’s Iaw.
What is Postel’s law?
Postel’s law or the robustness principle is a general design guideline for software: Be conservative in what you do, be liberal in what you accept from others (often reworded as "Be conservative in what you send, liberal in what you accept"). (Definition sourced from Wikipedia.) Modern day browsers follow this principle to a greater extent. Let me explain this with an example. Create an empty file and name it as “something.html”. Just add a line of text without any HTML tags. The browsers will render the content even though there are no HTML tags available. Now add html, head and body tags and leave them open without closing. Browsers render the pages even if they are not well formed. In other words, browsers are liberal enough in accepting the contents to render.
The Squirrel page
Now, lets use Postel’s law to understand how the Squirrel page works. When we request the url as a HTML document or as an image, the same content is being delivered to the browser (check the screenshot of Chrome Network panel). The content sent from the server is actually an image with HTML contents embedded on it. The browser interprets the content depending on the context in which it is used.
- Open the page on a browser to view the HTML contents. If you view the page source, you will find some junk characters in the source. You will find a set of junk characters at the beginning of the file and some more contents at the end of file within HTML comments.
- The initial set of contents are outside any HTML tag. This corresponds to the headers of the image file and has to be outside of HTML tag for the parsers to pick them as image. However, when we render the url as a HTML document, browser will try to render those junk characters even though they are outside any HTML tag. Hence, they are hidden explicitly on the HTML page with the help of CSS. body { visibility: hidden; }.
- The actual image contents are placed inside a HTML comment block. Browsers will safely ignore any comments while rendering the page where as when we use it as an image the parsers will understand the contents and parse the image data.
Since the browsers are liberal enough to render the content based on the context, the same url can be used as a HTML document and as an image. Try saving the source and open in your favorite text editor, you will find junk characters. Rename the save file as jpeg, and open the file in browser, the contents will be rendered as image and not HTML.
Update: Just read through the comments on the Hacker news page. Looks like the same technique is used to compress JS files inside PNG files for JS1K contest.
-- Varun
No comments:
Post a Comment