Hashing Files with WASM
Waiting 20 minutes to be told of a file conflict? Nah
Recently I was asked for help about a work problem by a friend. It was an interesting and unique problem.
They have internal people who need to upload large files to each instance of a system. These files may be 2-5gb.
The people uploading them may accidentally upload a duplicate because they have to do this so often.
Now for the problem:
Currently, they are waiting until the file uploads, and the backend can tell you it’s a duplicate. They may have wasted 10-30 minutes just to be told it was the wrong file that already existed.
WASM? Why?
Sure there are many ways to solve this and many options to prevent this issue, but what I suggested as something that I’ve never had to do before: use web assembly to compute a file hash client side. Once you have this, you can compare it with your backend for any files that already exist with the same hash.
This should take a couple seconds client side, maybe 20+ on a very old HDD to do. Much better than the wait time to upload under a slow connection.
Another thing to think of, is that you cannot load this entire file into memory. The computer may not have 5gb of ram available, and web assembly has a limit on the ram allocation it can use.
When I first suggested to do this, I knew it was possible with the web crypto api, as it has built in hash function support, but I wasn’t sure if streaming files was also supported.
It wasn’t natively, but thankfully there is a open source ‘hash-wasm’ package that does EXACTLY this! Supports hashing while streaming a file.
Streaming file contents to WASM
Here’s all the code that should be needed to accomplish this:
<script type="module">
import { createSHA256 } from 'https://cdn.jsdelivr.net/npm/hash-wasm/dist/hash-wasm.esm.js';
const CHUNK_SIZE = 64 * 1024 * 1024; // 64 MB chunks
document.getElementById('filePicker').addEventListener('change', async (e) => {
const file = e.target.files[0];
if (!file) return;
const hasher = await createSHA256();
hasher.init();
let offset = 0;
while (offset < file.size) {
const chunk = file.slice(offset, offset + CHUNK_SIZE);
const buffer = await chunk.arrayBuffer();
hasher.update(new Uint8Array(buffer));
offset += CHUNK_SIZE;
}
document.getElementById('result').textContent = hasher.digest('hex');
});
</script>When you get a file from a file picker, loop over that file size, and for every piece of it, call hasher.update from the hash-wasm library. This will correctly compute the SHA-256 hash of the file without needing to have more than 64mb of extra ram!
Pretty neat and I am surprised this wasn’t built in to the platform, the streaming piece needed to be done wuth wasm by this 3rd party package, because the native api doesn’t let you pass in a piece at a time for the hasher. The native apis only let us get a hash on a completely loaded file.
If my friend uses this approach, I’ll report back with some of the rough timing calculations in terms of the time to loop over and compute this way.


