Shrinking .wasm Code Size

    When serving a .wasm file over the network, the smaller it is, the faster theclient can download it. Faster .wasm downloads lead to faster page load times,and that leads to happier users.

    However, it's important to remember though that code size likely isn't theend-all-be-all metric you're interested in, but rather something much more vagueand hard to measure like "time to first interaction". While code size plays alarge factor in this measurement (can't do anything if you don't even have allthe code yet!) it's not the only factor.

    WebAssembly is typically served to users gzip'd so you'll want to be sure tocompare differences in gzip'd size for transfer times over the wire. Also keepin mind that the WebAssembly binary format is quite amenable to gzipcompression, often getting over 50% reductions in size.

    Furthermore, WebAssembly's binary format is optimized for very fast parsing andprocessing. Browsers nowadays have "baseline compilers" which parses WebAssemblyand emits compiled code as fast as wasm can come in over the network. This meansthat if you're using instantiateStreaming the second the Web requestis done the WebAssembly module is probably ready to go. JavaScript, on the otherhand, can often take longer to not only parse but also get up to speed with JITcompilation and such.

    And finally, remember that WebAssembly is also far more optimized thanJavaScript for execution speed. You'll want to be sure to measure for runtimecomparisons between JavaScript and WebAssembly to factor that in to howimportant code size is.

    All this to say basically don't dismay immediately if your .wasm file islarger than expected! Code size may end up only being one of many factors in theend-to-end story. Comparisons between JavaScript and WebAssembly that only lookat code size are missing the forest for the trees.

    There are a bunch of configuration options we can use to get rustc to makesmaller .wasm binaries. In some cases, we are trading longer compile times forsmaller .wasm sizes. In other cases, we are trading runtime speed of theWebAssembly for smaller code size. We should be cognizant of the trade offs ofeach option, and in the cases where we trade runtime speed for code size,profile and measure to make an informed decision about whether the trade isworth it.

    In Cargo.toml, add lto = true in the [profile.release] section:

    This gives LLVM many more opportunities to inline and prune functions. Not onlywill it make the .wasm smaller, but it will also make it faster at runtime!The downside is that compilation will take longer.

    Tell LLVM to Optimize for Size Instead of Speed

    LLVM's optimization passes are tuned to improve speed, not size, by default. Wecan change the goal to code size by modifying the [profile.release] section inCargo.toml to this:

    1. [profile.release]
    2. opt-level = 's'

    Or, to even more aggressively optimize for size, at further potential speedcosts:

    1. [profile.release]
    2. opt-level = 'z'

    Note that, surprisingly enough, opt-level = "s" can sometimes result insmaller binaries than opt-level = "z". Always measure!

    Use the wasm-opt Tool

    The toolkit is a collection of WebAssembly-specific compilertools. It goes much further than LLVM's WebAssembly backend does, and using itswasm-opt tool to post-process a .wasm binary generated by LLVM can often getanother 15-20% savings on code size. It will often produce runtime speed ups atthe same time!

    One of the biggest contributors to wasm binary size can be debug information andthe names section of the wasm binary. The wasm-pack tool, however, removesdebuginfo by default. Additionally wasm-opt removes the names section bydefault unless -g is also specified.

    If tweaking build configurations to optimize for code size isn't resulting in asmall enough .wasm binary, it is time to do some profiling to see where theremaining code size is coming from.

    The twiggy Code Size Profiler

    that supports WebAssembly asinput. It analyzes a binary's call graph to answer questions like:

    • Why was this function included in the binary in the first place?

    1. $ twiggy top -n 20 pkg/wasm_game_of_life_bg.wasm
    2. Shallow Bytes Shallow % Item
    3. ───────────────┼───────────┼────────────────────────────────────────────────────────────────────────────────────────
    4. 9158 19.65% "function names" subsection
    5. 3251 6.98% dlmalloc::dlmalloc::Dlmalloc::malloc::h632d10c184fef6e8
    6. 2510 5.39% <str as core::fmt::Debug>::fmt::he0d87479d1c208ea
    7. 1737 3.73% data[0]
    8. 1574 3.38% data[3]
    9. 1524 3.27% core::fmt::Formatter::pad::h6825605b326ea2c5
    10. 1200 2.57% core::fmt::Formatter::pad_integral::h06996c5859a57ced
    11. 1131 2.43% core::str::slice_error_fail::h6da90c14857ae01b
    12. 1051 2.26% core::fmt::write::h03ff8c7a2f3a9605
    13. 931 2.00% data[4]
    14. 864 1.85% dlmalloc::dlmalloc::Dlmalloc::free::h27b781e3b06bdb05
    15. 841 1.80% <char as core::fmt::Debug>::fmt::h07742d9f4a8c56f2
    16. 813 1.74% __rust_realloc
    17. 708 1.52% core::slice::memchr::memchr::h6243a1b2885fdb85
    18. 678 1.45% <core::fmt::builders::PadAdapter<'a> as core::fmt::Write>::write_str::h96b72fb7457d3062
    19. 631 ┊ 1.35% ┊ universe_tick
    20. 631 ┊ 1.35% ┊ dlmalloc::dlmalloc::Dlmalloc::dispose_chunk::hae6c5c8634e575b8
    21. 503 ┊ 1.08% ┊ <&'a T as core::fmt::Debug>::fmt::hba207e4f7abaece6

    Manually Inspecting LLVM-IR

    LLVM-IR is the final intermediate representation in the compiler toolchainbefore LLVM generates WebAssembly. Therefore, it is very similar to theWebAssembly that is ultimately emitted. More LLVM-IR generally means more.wasm size, and if a function takes up 25% of the LLVM-IR, then it generallywill take up 25% of the .wasm. While these numbers only hold in general, theLLVM-IR has crucial information that is not present in the .wasm (because ofWebAssembly's lack of a debugging format like DWARF): which subroutines wereinlined into a given function.

    You can generate LLVM-IR with this cargo command:

    1. cargo rustc --release -- --emit llvm-ir

    Then, you can use find to locate the .ll file containing the LLVM-IR incargo's target directory:

    References

    Tweaking build configurations to get smaller .wasm binaries is pretty handsoff. When you need to go the extra mile, however, you are prepared to use moreinvasive techniques, like rewriting source code to avoid bloat. What follows isa collection of get-your-hands-dirty techniques you can apply to get smallercode sizes.

    format!, to_string, etc… can bring in a lot of code bloat. If possible,only do string formatting in debug mode, and in release mode use static strings.

    Avoid Panicking

    This is definitely easier said than done, but tools like twiggy and manuallyinspecting LLVM-IR can help you figure out which functions are panicking.

    Panics do not always appear as a panic!() macro invocation. They ariseimplicitly from many constructs, such as:

    • Indexing a slice panics on out of bounds indices: my_slice[i]

    • Division will panic if the divisor is zero: dividend / divisor

    The first two can be translated into the third. Indexing can be replaced withfallible my_slice.get(i) operations. Division can be replaced withchecked_div calls. Now we only have a single case to contend with.

    Unwrapping an Option or Result without panicking comes in two flavors: safeand unsafe.

    The safe approach is to abort instead of panicking when encountering a or an Error:

    1. # #![allow(unused_variables)]
    2. #fn main() {
    3. #[inline]
    4. pub fn unwrap_abort<T>(o: Option<T>) -> T {
    5. use std::process;
    6. match o {
    7. Some(t) => t,
    8. None => process::abort(),
    9. }
    10. }
    11. #}

    Ultimately, panics translate into aborts in wasm32-unknown-unknown anyways, sothis gives you the same behavior but without the code bloat.

    Alternatively, the unreachable crate provides an unsafe for Option andResult which tells the Rust compiler to assume that the Option is Someor the Result is Ok. It is undefined behavior what happens if thatassumption does not hold. You really only want to use this unsafe approach whenyou 110% know that the assumption holds, and the compiler just isn't smartenough to see it. Even if you go down this route, you should have a debug buildconfiguration that still does the checking, and only use unchecked operations inrelease builds.

    Avoid Allocation or Switch to wee_alloc

    Rust's default allocator for WebAssembly is a port of dlmalloc to Rust. Itweighs in somewhere around ten kilobytes. If you can completely avoid dynamicallocation, then you should be able to shed those ten kilobytes.

    Completely avoiding dynamic allocation can be very difficult. But removingallocation from hot code paths is usually much easier (and usually helps makethose hot code paths faster, as well). In these cases, should save you most (but notquite all) of those ten kilobytes. weealloc is an allocator designed forsituations where you need _some kind of allocator, but do not need aparticularly fast allocator, and will happily trade allocation speed for smallercode size.

    When you create generic functions that use type parameters, like this:

    1. # #![allow(unused_variables)]
    2. #fn main() {
    3. #}

    Then rustc and LLVM will create a new copy of the function for each T typethat the function is used with. This presents many opportunities for compileroptimizations based on which particular T each copy is working with, but thesecopies add up quickly in terms of code size.

    If you use trait objects instead of type parameters, like this:

    Then dynamic dispatch via virtual calls is used, and only a single version ofthe function is emitted in the .wasm. The downside is the loss of the compileroptimization opportunities and the added cost of indirect, dynamicallydispatched function calls.

    Use the wasm-snip Tool

    This is a rather heavy, blunt hammer for functions that kindof look like nails if you squint hard enough.

    Maybe you know that some function will never be called at runtime, but thecompiler can't prove that at compile time? Snip it! Afterwards, run wasm-optagain with the —dce flag, and all the functions that the snipped functiontransitively called (which could also never be called at runtime) will getremoved too.

    This tool is particularly useful for removing the panicking infrastructure,since panics ultimately translate into traps anyways.