Why is Blob of Array smaller than Blob of Uint8Array?

I read a file using FileReader.readAsArrayBuffer and then do something like this:

  var compressedData = pako.gzip(new Uint8Array(this.result));
  var blob1 = new Blob([compressedData]); // size = 1455338 bytes
  var blob2 = new Blob(compressedData);   // size = 3761329 bytes

As an example: if result has 4194304 bytes, after compression it will be size 1455338 bytes. But for some reason the Uint8Array needs to be wrapped in an Array. Why is this?

Answers:

Answer

Cf. documentation for BLOB constructor:

https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob

[the first argument] is an Array of ArrayBuffer, ArrayBufferView, Blob, DOMString objects, or a mix of any of such objects, that will be put inside the Blob. DOMStrings are encoded as UTF-8.

I'm not sure how it works under the hood, but basically the constructor expects an array of things it will pack into the BLOB. So, in the first case, you're constructing a BLOB of a single part (i.e. your ArrayBuffer), whereas in the second you're constructing it from 1455338 parts (i.e. each byte separately).

Since the documentation says the BLOB parts can only be arrays or strings, it probably ends up converting each of the byte values inside your ArrayBuffer into UTF-8 strings, which means instead of using 1 byte per number, it uses 1 byte per decimal digit (the ratio of the two result sizes seems to support this, since single byte values are 1-3 digits long, and the larger BLOB is about 2.5 times the size of the smaller). Not only is that wasteful, I'm pretty sure it also renders your ZIP unusable.

So, bottom line is, the first version is the correct way to go.

Answer

Unfortunately, MDN article is almost wrong here, and at best misleading.

From the specs:

The Blob() constructor can be invoked with the parameters below:

  • A blobParts sequence which takes any number of the following types of elements, and in any order:

    • BufferSource elements.

    • Blob elements.

    • USVString elements.

  • ... [BlobPropertyBag, none of our business here]

So a sequence here can be a lot of things, from an Array to a Set going through an multi-dimensional Array.

Then the algorithm is to traverse this sequence until it finds one of the three types of elements above.

So what happens in your case is that a TypedArray can be converted to a sequence. This means that when you pass it as the direct parameter, it will not be able to see its ArrayBuffer and the algorithm will traverse its content and pick up the values (here 8 bit numbers converted to Strings), which is probably not what you expected.

In the other hand, when you wrap your Uint8Array through an Array, the algorithm is able to find the BufferSource your Uint8Array points to. So it will use it instead (binary data, and probably what you want).

var arr = new Uint8Array(25);
arr.fill(255);
var nowrap = new Blob(arr);
var wrapped = new Blob([arr]);
test(nowrap, 'no wrap');
test(wrapped, 'wrapped');

function test(blob, msg) {
  var reader = new FileReader();
  reader.onload = e => console.log(msg, reader.result);
  reader.readAsText(blob);
}

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.