String compression in JavaScript

I'm looking for a JavaScript function that given a string returns a compressed (shorter) string.

I'm developing a Chrome web application that saves long strings (HTML) to a local database. For testing purposes I tried to zip the file storing the database, and it shrank by a factor of five, so I figured it would help keep the database smaller if I compressed the things it stores.

I've found an implementation of LZSS in JavaScript here: http://code.google.com/p/u-lzss/ ("U-LZSS").

It seemed to work when I tested it "by hand" with short example strings (decode === encode), and it's reasonably fast too, in Chrome. But when given big strings (100 ko) it seems to garble/mix up the last half of the string.

Is it possible that U-LZSS expects short strings and can't deal with larger strings? And would it be possible to adjust some parameters in order to move that upper limit?

Answers:

Answer

I just released a small LZW implementation especially tailored for this very purpose as none of the existing implementations did meet my needs.

That's what I'm using going forward, and I will probably try to improve the library at some point.

Answer

Here are encode (276 bytes, function en) and decode (191 bytes, function de) functions I modded from LZW in a fully working demo. There is no smaller or faster routine available on the internet than what I am giving you here.

function en(c){var x='charCodeAt',b,e={},f=c.split(""),d=[],a=f[0],g=256;for(b=1;b<f.length;b++)c=f[b],null!=e[a+c]?a+=c:(d.push(1<a.length?e[a]:a[x](0)),e[a+c]=g,g++,a=c);d.push(1<a.length?e[a]:a[x](0));for(b=0;b<d.length;b++)d[b]=String.fromCharCode(d[b]);return d.join("")}

function de(b){var a,e={},d=b.split(""),c=f=d[0],g=[c],h=o=256;for(b=1;b<d.length;b++)a=d[b].charCodeAt(0),a=h>a?d[b]:e[a]?e[a]:f+c,g.push(a),c=a.charAt(0),e[o]=f+c,o++,f=a;return g.join("")}

var compressed=en("http://www.ScriptCompress.com - Simple Packer/Minify/Compress JavaScript Minify, Fixify & Prettify 75 JS Obfuscators In 1 App 25 JS Compressors (Gzip, Bzip, LZMA, etc) PHP, HTML & JS Packers In 1 App PHP Source Code Packers Text Packer HTML Packer or v2 or v3 or LZW Twitter Compress or More Words DNA & Base64 Packer (freq tool) or v2 JS JavaScript Code Golfer Encode Between Quotes Decode Almost Anything Password Protect Scripts HTML Minifier v2 or Encoder or Escaper CSS Minifier or Compressor v2 SVG Image Shrinker HTML To: SVG or SVGZ (Gzipped) HTML To: PNG or v2 2015 JS Packer v2 v3 Embedded File Generator Extreme Packer or version 2 Our Blog DemoScene JS Packer Basic JS Packer or New Version Asciify JavaScript Escape JavaScript Characters UnPacker Packed JS JavaScript Minify/Uglify Text Splitter/Chunker Twitter, Use More Characters Base64 Drag 'n Drop Redirect URL DataURI Get Words Repeated LZMA Archiver ZIP Read/Extract/Make BEAUTIFIER & CODE FIXER WHAK-A-SCRIPT JAVASCRIPT MANGLER 30 STRING ENCODERS CONVERTERS, ENCRYPTION & ENCODERS 43 Byte 1px GIF Generator Steganography PNG Generator WEB APPS VIA DATAURL OLD VERSION OF WHAK PAKr Fun Text Encrypt Our Google");
var decompressed=de(compressed);

document.writeln('<hr>'+compressed+'<hr><h1>'+compressed.length+' characters versus original '+decompressed.length+' characters.</h1><hr>'+decompressed+'<hr>');

Answer

At Piskvor's suggestion, I tested the code found in an answer to this question: JavaScript implementation of Gzip (top-voted answer: LZW implementation) and found that:

  1. it works
  2. it reduces the size of the database by a factor of two

... which is less than 5 but better than nothing! So I used that.

(I wish I could have accepted an answer by Piskvor but it was only a comment).

Answer

To me it doesn't seem reasonable to compress a string using UTF-8 as the destination... It looks like just looking for trouble. I think it would be better to lose some compression and using plain 7-bit ASCII as the destination.

In a toy 4 KB JavaScript demo I wrote for fun I used an encoding for the result of compression that stores four binary bytes into five chars chosen from a subset of ASCII of 85 chars that is clean for embedding in a JavaScript string (85^5 is slightly more than 8^4, but still fits in the precision of JavaScript integers). This makes compressed data safe for example for JSON without need of any escaping.

Answer

Try experimenting with textfiles before implementing anything because I think that the following does not necessarily hold:

so I figured it would help keep the database smaller if I compressed the things it stores.

That's because lossless compression algorithms are pretty good with repeating patterns (e.g whitespace).

Answer

I think you should also look into lz-string it's fast a compresses quite well and has some advantages they list on their page:

What about other libraries?

  • some LZW implementations which gives you back arrays of numbers (terribly inefficient to store as tokens take 64bits) and don't support any character above 255.
  • some other LZW implementations which gives you back a string (less terribly inefficient to store but still, all tokens take 16 bits) and don't support any character above 255.
  • an LZMA implementation that is asynchronous and very slow - but hey, it's LZMA, not the implementation that is slow.
  • a GZip implementation not really meant for browsers but meant for node.js, which weighted 70kb (with deflate.js and crc32.js on which it depends).

The reasons why the author created lz-string:

  • Working on mobile I needed something fast.
  • Working with Strings gathered from outside my website, I needed something that can take any kind of string as an input, including any UTF characters above 255.
  • The library not taking 70kb was a definitive plus. Something that produces strings as compact as possible to store in localStorage. So none of the libraries I could find online worked well for my needs.

There are implementations of this lib in other languages, I am currently looking into the python implementation, but the decompression seems to have issues at the moment, but if you stick to JS only it looks really good to me.

Answer

It seems, there is a proposal of compression/decompression API: https://github.com/wicg/compression/blob/master/explainer.md .

And it is implemented in Chrome 80 (right now in Beta) according to a blog post at https://blog.chromium.org/2019/12/chrome-80-content-indexing-es-modules.html .

I am not sure I am doing a good conversion between streams and strings, but here is my try to use the new API:

    var encoding = 'deflate'; // or 'gzip'

    function compress(text) {
      var byteArray = new TextEncoder().encode(text);
      var cs = new CompressionStream(encoding);
      var writer = cs.writable.getWriter();
      writer.write(byteArray);
      writer.close();
      return new Response(cs.readable).arrayBuffer();
    }

    function decompress(byteArray) {
      var cs = new DecompressionStream(encoding);
      var writer = cs.writable.getWriter();
      writer.write(byteArray);
      writer.close();
      return new Response(cs.readable).arrayBuffer().then(function (arrayBuffer) {
        return new TextDecoder().decode(arrayBuffer);
      });
    }
    
    var test = "http://www.ScriptCompress.com - Simple Packer/Minify/Compress JavaScript Minify, Fixify & Prettify 75 JS Obfuscators In 1 App 25 JS Compressors (Gzip, Bzip, LZMA, etc) PHP, HTML & JS Packers In 1 App PHP Source Code Packers Text Packer HTML Packer or v2 or v3 or LZW Twitter Compress or More Words DNA & Base64 Packer (freq tool) or v2 JS JavaScript Code Golfer Encode Between Quotes Decode Almost Anything Password Protect Scripts HTML Minifier v2 or Encoder or Escaper CSS Minifier or Compressor v2 SVG Image Shrinker HTML To: SVG or SVGZ (Gzipped) HTML To: PNG or v2 2015 JS Packer v2 v3 Embedded File Generator Extreme Packer or version 2 Our Blog DemoScene JS Packer Basic JS Packer or New Version Asciify JavaScript Escape JavaScript Characters UnPacker Packed JS JavaScript Minify/Uglify Text Splitter/Chunker Twitter, Use More Characters Base64 Drag 'n Drop Redirect URL DataURI Get Words Repeated LZMA Archiver ZIP Read/Extract/Make BEAUTIFIER & CODE FIXER WHAK-A-SCRIPT JAVASCRIPT MANGLER 30 STRING ENCODERS CONVERTERS, ENCRYPTION & ENCODERS 43 Byte 1px GIF Generator Steganography PNG Generator WEB APPS VIA DATAURL OLD VERSION OF WHAK PAKr Fun Text Encrypt Our Google";

    console.time('compress');
    compress(test).then(function (x) {
      console.timeEnd('compress');
      console.log('compressed length', x.byteLength);
      console.time('decompress');
      decompress(x).then(function (y) {
        console.timeEnd('decompress');
        console.log('decompressed length', y.length);
        console.assert(test === y);
      });
    });

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.