Rust and JavaScript Interop ↔️ - You Learn Something New Everyday 💭

In recent projects of mine, I’ve been using WebAssembly quite a bit. WebAssembly (Wasm) is “a new binary instruction format for a stack based virtual machine” that lets you use languages besides JavaScript to run code on a web page - usually either for performance reasons or to run code you’d like to share across different platforms. In my opinion, the most promising of these languages, due to its lack of a need for a runtime and great tooling is Rust.

The best way to use Rust with WebAssembly is through Wasm-Bindgen. Wasm-Bindgen makes it easy to write Rust code that compiles down to Web Assembly that is easily interoperable with JavaScript. It is both a library for generating the boilerplate Rust for functions that JavaScript can call as well as a cli tool for generating the boilerplate JavaScript that can easily interop with WebAssembly.

In this post, we’ll be examining how Wasm-Bindgen creates a bridge between Rust and JavaScript. We’ll take a look at numbers and strings and how those types are transformed and made consumable from Rust (as compiled WebAssembly) by JavaScript. This post will take a look at internals to Wasm-Bindgen. What’s written below is true for the current version of Wasm-Bindgen at the time of this writing (0.2.21), but the details might change in the future.

The Problem

Before we can look at how Wasm-Bindgen works, we need to understand the base interoperability between Wasm and JavaScript. Wasm has an extremely limited interoperability story. Wasm code only understands 32 and 64-bit floating point and integer numbers. Beyond this, Wasm is capable of calling JavaScript functions (as long as they expect just 32 and 64 bit floats and integers), and it can expose its own functions to JavaScript (again only accepting those 4 types of arguments). Naturally this makes working with Wasm pretty difficult. How can we write functions that take richer types like strings, arrays, objects and classes?

This is where Wasm-Bindgen comes in. Instead of directly writing Rust and JavaScript that deals with this very limited interoperability directly, the library/tool let’s you write Rust and JavaScript that deals with a much more rich set of types. Wasm-Bindgen then generates the glue code that boils the JavaScript and Wasm interop to only use the 4 allows types.

Now that we have this background we can look at the code that one might write in Rust and JavaScript and then the generated code that makes this possible. I won’t take the time to explain exactly how Wasm-Bindgen is used - the code examples here should be clear enough on their own for our purposes. To learn how to actually create a project using Wasm-Bindgen, you can take a look at their wonderful documentation.

Numbers

The easiest example of how Wasm-Bindgen enables interoperability is through numbers. Since Wasm supports integers and floats (32- and 64-bit), Wasm-Bindgen normally has to do to very little. Despite numbers being relatively easy, they still go through similar machinery as other types and thus make for a good first example.

Take for example the following function:

#[wasm_bindgen]
pub fn add(a: u16, b: u8) ->  u16 {
  a + (b as u16)
}

Here we have a pretty simple rust function named add that has been annotated with the wasm_bindgen attribute. From JavaScript we’ll be able to call this function like so: wasm.add(1, 2). In order to be able to do this though, Wasm-Bindgen will generate both Rust and JavaScript code for us.

Using the awesome tool cargo-expand we can expand the wasm_bindgen macro and see what Rust code is generated for us (remember to use the cargo-expand tool with the wasm32-unknown-unknown target or else you won’t see the right thing!):

// ... imports above

pub fn add(a: u16, b: u8) -> u16 {
  a + (b as u16)
}

#[export_name = "add"]
#[allow(non_snake_case)]
#[cfg(all(target_arch = "wasm32", not(target_os = "emscripten")))]
pub extern "C" fn __wasm_bindgen_generated_add(
  arg0: <u16 as ::wasm_bindgen::convert::FromWasmAbi>::Abi,
  arg1: <u8 as ::wasm_bindgen::convert::FromWasmAbi>::Abi,
) -> <u16 as ::wasm_bindgen::convert::IntoWasmAbi>::Abi {
  ::wasm_bindgen::__rt::link_mem_intrinsics();
  let _ret = {
    let mut __stack = unsafe { ::wasm_bindgen::convert::GlobalStack::new() };
    let arg0 = unsafe { <u16 as ::wasm_bindgen::convert::FromWasmAbi>::from_abi(arg0, &mut __stack) };
    let arg1 = unsafe { <u8 as ::wasm_bindgen::convert::FromWasmAbi>::from_abi(arg1, &mut __stack) };
    add(arg0, arg1)
  };
  <u16 as ::wasm_bindgen::convert::IntoWasmAbi>::into_abi(_ret, &mut unsafe {
    ::wasm_bindgen::convert::GlobalStack::new()
  })
}

// Next comes the wbindgen_describe function ...

Wow, a lot to unpack there. Ok, let’s start at the top. We first have our add function completely unchanged. That’s good to know.

Next comes another function named __wasm_bindgen_generated_add with a lot of attributes above it. The first attribute export_name = "add" is very interesting. This function will be exported with the name add (i.e., the same name as the function we wrote ourselves) and not with the name __wasm_bindgen_generated_add. That means that when someone calls the add function on our WebAssembly module, they’ll actually be calling this generated function. Of course, within the generated function is the original function add that we wrote, so our code will eventually get called. But what’s the generated code doing?

We’re going to skip over the link_mem_intrinsics. This is an interesting function that you can read about here.

First, notice that our arguments are now <u16 as ::wasm_bindgen::convert::FromWasmAbi>::Abi and <u8 as ::wasm_bindgen::convert::FromWasmAbi>::Abi. This means this function will accept values of type Abi which is an associated type defined on a trait name FromWasmAbi (If you’re unfamiliar with associated types you can read about them here). FromWasmAbi is the trait that defines how values are converted from the types that WebAssembly understands (namely 32- and 64-bit integers and floats) to the richer types Rust is capable of dealing with. The associated type Abi is the type of value we’ll be getting passed from JavaScript (i.e. one of the four types WebAssembly understanding). As we’ll see in second when we take a look at the FromWasmAbi implementation for u8 and u16 the Abi associated is in fact u32 (one of the four types WebAssembly understands).

On the fourth and fifth lines of function we can see that the arguments are converted using the from_abi function that FromWasmAbi defines. Again, from_abi will take a value of some type that WebAssembly can understand and turn it into richer types that our Rust code can work with. So what does from_abi look like for u16?

impl FromWasmAbi for u16 {
  type Abi = u32;

  #[inline]
  unsafe fn from_abi(js: u32, _extra: &mut Stack) -> u16 { js as u16 }
}

Interesting - we now know that the argument we’re getting to the exported function is actually a u32. All the from_abi function does is take the u32 and convert it to a u16. Pretty straight forward. This means that if the calling code calls our add function with a number larger than what can fit in a 16 bit integer, the top 16 bits will just be dropped.

The story for our u8 argument is exactly the same with the from_abi function just turning the u32 into a u8 instead of a u16.

But what’s with the weird GlobalStack that’s passed to all these functions? As we see it’s not needed at all for our purposes (none of the from_abi functions are using it!). It is used for richer types than numbers and strings though.

Lastly, the opposite of from_abi - into_abi is called with the result of the call to our original add function. As you might guess the IntoWasmAbi trait is the opposite of the FromWasmAbi trait. It takes Rust types and describes how to turn them into one of the 4 types that WebAssembly actually understands. Let’s take a look at the implementation of this trait for u16:

impl IntoWasmAbi for u16 {
  type Abi = u32;

  #[inline]
  fn into_abi(self, _extra: &mut Stack) -> u32 { self as u32 }
}

As we’d expect the function just takes our u16 and turns it into a u32 that WebAssembly will understand.

And that’s the story from Rust and WebAssembly for numbers. The function will be compiled and can be called from JavaScript as is.

In fact, the generated JavaScript that the wasm-bindgen cli tool produces is just a simple wrapper around a direct call to our WebAssembly module:

export function add(arg0, arg1) {
  return wasm.add(arg0, arg1)
}

What happens if we pass the wrong type to our WebAssembly module (like say a string or an object)? Well, in reality when we compile a WebAssembly module we’re not given direct access to the functions defined in that module. Instead the browser runtime provides wrappers that coerce the arguments to the proper type if possible. These coercions follow the same rules that coercions in normal JavaScript do (e.g., undefined is coerced to 0).

In the above example, if you feed a negative number to the function and have compiled in debug mode, the module will panic, because the negative number is interpreted as a positive number and the addition causes an overflow (which panics in debug Rust). Compiling the Rust code in release mode causes Rust to not panic when a + addition leads to an overflow so everything works as expected.

Strings

Ok numbers are fairly straight forward. What about strings? First, we’ll be taking a look at owned Strings with the following example:

#[wasm_bindgen]
pub fn make_smile(mut a: String) -> String {
  a.push_str(" :)");
  a
}

And expanding this out, we get:

pub fn make_smile(mut a: String) -> String {
  a.push_str(" :)");
  a
}

#[export_name = "make_smile"]
#[allow(non_snake_case)]
#[cfg(all(target_arch = "wasm32", not(target_os = "emscripten")))]
pub extern "C" fn __wasm_bindgen_generated_make_smile(
  arg0: <String as ::wasm_bindgen::convert::FromWasmAbi>::Abi,
  ) -> <String as ::wasm_bindgen::convert::IntoWasmAbi>::Abi {
  ::wasm_bindgen::__rt::link_mem_intrinsics();
  let _ret = {
    let mut __stack = unsafe { ::wasm_bindgen::convert::GlobalStack::new() };
    let arg0 = unsafe {
      <String as ::wasm_bindgen::convert::FromWasmAbi>::from_abi(arg0, &mut __stack)
    };
    make_smile(arg0)
  };
  <String as ::wasm_bindgen::convert::IntoWasmAbi>::into_abi(_ret, &mut unsafe {
    ::wasm_bindgen::convert::GlobalStack::new()
  })
}

As you can tell the example of the String is very similar to the example we saw before (albeit this time with one fewer argument). Let’s see how the from_abi implementation for String is different from u16 and u8.

impl FromWasmAbi for String {
  type Abi = <Vec<u8> as FromWasmAbi>::Abi;

  #[inline]
  unsafe fn from_abi(js: Self::Abi, extra: &mut Stack) -> Self {
    String::from_utf8_unchecked(<Vec<u8>>::from_abi(js, extra))
  }
}

This is interesting. We see here that we’re calling the Rust standard library String::from_utf8_unchecked method which turns a buffer of bytes into a String without checking that it is actually valid utf-8. This means if we manage to pass a buffer of bytes to this function that’s not actually valid utf-8, weird things will happen. We’ll see in a little bit how the generated JavaScript protects against this.

The argument to the String::from_utf8_unchecked function is a Vec<u8> that comes from calling from_abi for Vec<u8>. The implementation is based on the FromWasmAbi for Vec<T> which itself is based on the implementation of FromWasmAbi for Box<[T]>. In the implementation of FromWasmAbi for Box<[u8]> we can finally see the concrete type of Abi is WasmSlice. A WasmSlice is simply a struct that contains len field (i.e., the length of our slice) of type u32 and a ptr field (i.e., a pointer to the beginning of the slice) of type u32. Let’s first have a look at both the FromWasmAbi implementations:

impl<T> FromWasmAbi for Vec<T> where Box<[T]>: FromWasmAbi<Abi = WasmSlice> {
  type Abi = <Box<[T]> as FromWasmAbi>::Abi;

  unsafe fn from_abi(js: Self::Abi, extra: &mut Stack) -> Self {
    <Box<[T]>>::from_abi(js, extra).into()
  }
}

impl FromWasmAbi for Box<[u8]> {
  type Abi = WasmSlice;

  #[inline]
  unsafe fn from_abi(js: WasmSlice, extra: &mut Stack) -> Self {
    let ptr = <*mut u8>::from_abi(js.ptr, extra);
    let len = js.len as usize;
    Vec::from_raw_parts(ptr, len, len).into_boxed_slice()
  }
}

So we can see the implementation for Vec<T> just relies on the implementation for Box<[T]> and the fact that the Rust standard library allows us to turn a Box<[u8]> into a Vec<u8>. The implementation of FromWasmAbi for Box<[u8]> knows how to get a pointer and a length from the WasmSlice it gets passed and then turn those values into a Vec using the standard libraries function Vec::from_raw_parts. It then turns that Vec into a Box<[u8]>.

And if we keep going down the rabbit hole, we can look into the implementation of from_abi for *mut u8, but it’s not very interesting since it just converts the type from u32 to a *mut pointer.

So now we know that our top level exported function takes a WasmSlice which gets converted through Box<[u8]> to Vec<u8> and finally to String which our original function can work on.

Finally, <String as ::wasm_bindgen::convert::IntoWasmAbi>::into_abi gets called. Again, this is just the opposite of the from_abi functions we’ve been looking at. Let’s have a look:

impl IntoWasmAbi for String {
  type Abi = <Vec<u8> as IntoWasmAbi>::Abi;

  #[inline]
  fn into_abi(self, extra: &mut Stack) -> Self::Abi {
    self.into_bytes().into_abi(extra)
  }
}

Here we see our String gets converted into bytes (i.e., a Vec<u8>) and then the into_abi implementation for Vec<u8> is called:

impl<T> IntoWasmAbi for Vec<T> where Box<[T]>: IntoWasmAbi<Abi = WasmSlice> {
  type Abi = <Box<[T]> as IntoWasmAbi>::Abi;

  fn into_abi(self, extra: &mut Stack) -> Self::Abi {
    self.into_boxed_slice().into_abi(extra)
  }
}

impl IntoWasmAbi for Box<[u8]> {
  type Abi = WasmSlice;

  #[inline]
  fn into_abi(self, extra: &mut Stack) -> WasmSlice {
    let ptr = self.as_ptr();
    let len = self.len();
    mem::forget(self);
    WasmSlice { ptr: ptr.into_abi(extra), len: len as u32 }
  }
}

We can see here how the implementation of IntoWasmAbi mirrors FromWasmAbi. One interesting thing of now is the use of mem::forget in the into_abi implementation for Box<[u8]> which tells Rust to not drop the Box<[u8]>. Without this line Rust would automatically deallocate the Box<[u8]> and ptr would be a dangling pointer.

Now we have a pretty good understanding of what’s happening inside the Rust code that’s been compiled to WebAssembly. But to fully understanding what’s going on, we’ll need to look at the generated code on the JS side:

export function make_smile(arg0) {
  const [ptr0, len0] = passStringToWasm(arg0);
  const retptr = globalArgumentPtr();
  wasm.make_smile(retptr, ptr0, len0);
  const mem = getUint32Memory();
  const rustptr = mem[retptr / 4];
  const rustlen = mem[retptr / 4 + 1];
  const realRet = getStringFromWasm(rustptr, rustlen).slice();
  wasm.__wbindgen_free(rustptr, rustlen * 1);
  return realRet;
}

Ok, so when we call the make_smile function on the JavaScript side, we first take the argument to our function (i.e., a JavaScript String) and pass it to a function call passStringToWasm. This function is responsible for converting our String to utf-8 (since JavaScript strings are not normally utf-8, but Rust expects utf-8 strings), allocating space in the Wasm heap, and putting the string there. Let’s take a look:

function passStringToWasm(arg) {
  const buf = cachedEncoder.encode(arg);
  const ptr = wasm.__wbindgen_malloc(buf.length);
  getUint8Memory().set(buf, ptr);
  return [ptr, buf.length];
}

Here we see cachedEncoder (simply a utf-8 encoder that is cached for reuse) changing the string to utf-8. Next we call a function on our wasm module that Wasm-Bindgen has generated for us __wbindgen_malloc that simply allocates memory on the WebAssembly heap. Next we call getUint8Memory which simply gives us a view of the WebAssembly heap as a Uint8Array and set which sets the utf-8 buffer we’ve created at the point in memory we just allocated. Finally we return the pointer to that memory and the length of the buffer.

After, passing the String to Wasm, we call the globalArgumentPtr function to get a pointer to somewhere in the Wasm heap. This is a special place in the heap decided by Wasm-Bindgen where we can put function return values that we need to return by pointer. Why we need to return a value by pointer is something we’ll discuss below. globalArgumentPtr uses another Wasm-Bindgen generated function like __wbindgen_malloc named __wbindgen_global_argument_ptr to accomplish this.

When we finally get to calling make_smile on our WebAssembly module we call the function with three arguments retptr, ptr0 and len0 and we don’t get a return value. But wait a minute - our exported function in the Rust code has only one argument and a return value… Wouldn’t we expect the call signature of our exported function from Rust to match the call signature that we see when we use that function from JavaScript?

The reason for this is the weirdness of what Rust and LLVM decided the “ABI” of function calls for the Wasm target would be. There are two rules that combine together to produce the interesting call signature we see in the JavaScript:

Complex arguments (i.e. arguments that are combinations of the 4 basic types Wasm supports) are “splatted” (meaning passed as separate arguments)
Values that are “too big” are returned by pointer that is passed as the first argument

The first rule means that our WasmSlice which as we’ve seen is a struct composed of two u32s is broken into those two values and those two values are passed as separate arguments. The second rule means that instead of our function returning a WasmSlice it instead puts that WasmSlice as the location specified by the first argument. Why a struct of two u32s is considered too big when Wasm supports 64-bit numbers is a topic for another time…

Once this is done, the function then turns the pointer and the length inside the WasmSlice into a string using getStringFromWasm:

function getStringFromWasm(ptr, len) {
  return cachedEncoder.decode(getUint8Memory().subarray(ptr, ptr + len));
}

Here, we’re again using the cachedEncoder this time to decode the memory stored between ptr and ptr + len as a utf-8 string.

We can then return this to the calling JavaScript code. But before we do this, we must first free the memory where our WasmSlice return value was since we won’t be using it anymore.

And that’s it! We’ve successfully take a JavaScript String, converted it to utf-8, moved it onto the Wasm Heap, and passed a pointer and length to Wasm. Inside of Wasm we’ve converted that pointer and length to a Rust String, called our original make_smile function and then returned back a WasmSlice pointer and length. Finally on the JavaScript side we reformed a JavaScript string from the WasmSlice located in the Wasm heap and finally freed that WasmSlice.

Wow that’s a lot of work!

Conclusion

As you can see, there’s quite a bit of machinery happening to generate Rust and JavaScript code for facilitating interop between JavaScript and WebAssembly. Hopefully this post gave you a basic understanding of how this code works. If you’d like to learn more about how JavaScript and WebAssembly can be made to easily talk to one another, let me know on Twitter!