Few days ago, I came across an interesting post about Speaker.js on Hacker News. Speaker.js is a client side library which enables text-to-speech using only JavaScript and HTML. The library doesn’t do any server side calls to do the conversion. This made me thinking about the techniques used in JavaScript to process large amount of binary data. Old school JavaScript doesn’t provide any support for storing binary data. Traditionally normal arrays were used to simulate the behavior of binary arrays by storing a number in the range of 0 to 255 for each element. Obviously, the above technique will not be suitable for applications that require processing of large amount of data. Then, with the introduction of HTML canvas element, developers started using Canvas’ ‘ImageData’ to hold any binary data needed for their applications. Canvas ImageData is still one of the most widely used techniques to deal with binary data. If you are following the developments on HTML5 standardization and web browsers, you must be aware of “Typed Arrays”. JavaScript “Typed Arrays” provides a mechanism for accessing raw binary data much more efficiently. Rest of this post will focus on “Typed Arrays” and performance metrics of all these three techniques.
Typed Arrays
JavaScript “Typed Arrays” provide a mechanism for accessing raw binary data much more efficiently. The specification defines two types: buffer – a generic fixed length buffer type, view - accessor types that allow access to the data stored within the buffer.
Buffer (implemented by ArrayBuffer). The ArrayBuffer is a data type that is used to represent a generic, fixed-length binary data buffer. You can't directly manipulate the contents of an ArrayBuffer; instead, you can create an ArrayBufferView object which represents the buffer in a specific format, and use that to read and write the contents of the buffer. The following line of code will create a chunk of memory with 16 bytes pre-initialized to 0. Note: You will not be able to access data using the variable buffer.
var buffer = new ArrayBuffer(16);
View (implemented by ArrayBufferView and its subclasses). A view provides a context—that is, a data type, starting offset, and number of elements—that turns the data into an actual typed array. Views are implemented by the ArrayBufferView class and its subclasses. Float32Array, Float64Array, Int8Array, Int16Array, Int32Array, Uint8Array, Uint16Array, Uint32Array are some of the available view classes. There is also a generic view DataView available to read and write data to ArrayBuffer. In the following lines of code, we create a view that treats the data in the buffer as an array of 32-bit signed integers. We can access the data in the buffer just like a normal array. It is possible to create multiple views on the same buffer. By combining a single buffer with multiple views of different types, starting at different offsets into the buffer, we can interact with complex data structures (like data read from a structured file, WebGL, etc).
var int32View = new Int32Array(buffer);
for (var i=0; i<int32View.length; i++) {
int32View[i] = i*2;
}// 16-bit singed integer view on the same buffer. This is allowed.
var int16View = new Int16Array(buffer);
for (var i=0; i<int16View.length; i++) {
console.log("Entry " + i + ": " + int16View[i]);
}
Browser Support
Performance Tests
Kanaka has written some test cases to test the performance of these three techniques: Normal Arrays, ImageData and Typed Arrays. The test cases are hosted as part of his noVNC project on github. I ran the same test cases on my Macbook Pro (2 GHz Intel Quad-core i7, 4 GB 1333 MHz DDR3) and it turns out that performance of Chrome is much better than other browsers. In Chrome, ‘Typed Arrays’ proves to be the most efficient technique for manipulating binary data. There are some drastic changes in these metrics when compared to the tests run by Kanaka in April 2011. Test results after averaging out 50 test iterations can be found here.
The Four Tests:
Create - For each test iteration, an array is created and then initialized to zero and this is repeated 2000 times.
Random read - For each test iteration, 5 million reads are issued to pseudo-random locations in an array.
Sequential read - For each test iteration, 5 million reads are issued sequentially to an array. The reads loop around to the beginning of the array when they reach the end of the array.
Sequential write - For each test iteration, 5 million updates are made sequentially to an array. The writes loop around to the beginning of the array when they reach the end of the array.
Chrome is the only browser where “Typed Arrays” seems to be performing well. If you want to use a standard technique, then you can go with ‘Typed Arrays’. Otherwise, you will have to wait for other browser vendors to improve the performance of “Typed Arrays”. More information about “Typed Arrays” can be found here.
Update: I have re-ran the test cases after making the changes pointed out by mraleph (here)
- Explicitly specifying the arraysize while creating normal arrays
- Instead of creating a single test_something function, created separate functions for each array type. It turns out that performance of other browsers has improved significantly after this change. Interesting inference from this result is that the JS engines in Firefox, Safari and Opera do not seem to handle Polymorphism well.
-- Varun