Skip to content

Commit

Permalink
Add 03-25 meeting notes
Browse files Browse the repository at this point in the history
  • Loading branch information
linclark authored Mar 25, 2021
1 parent bf84e67 commit 1d369e0
Showing 1 changed file with 112 additions and 0 deletions.
112 changes: 112 additions & 0 deletions meetings/2021/WASI-03-25.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,115 @@ The meeting will be on a zoom.us video conference.
1. Better OS-agnosticism with I/O Types (Dan Gohman) ([slides])

[slides]: presentations/2021-03-25-gohman-io-types/index.html

## Notes
### Attendees
- Lin Clark
- Till Schneidereit
- Andrew Brown
- Matt Butcher
- Radu Matei
- Luke Wagner
- Dan Gohman
- Peter Huene
- Ralph Squillace
- Piotr Sikora
- Mark McCaskey
- Mingqiu Sun
- Yong He
- Pat Hickey
- Sam Clegg
- David Piepgrass
- Jonnie Birch
- Brian Hardock

**Dan Gohman:** Quick introduction to concepts.

Wasm itself: how is it different than assembly. In Wasm, foundational concept is the module. This is in contrast to modern OSs, which build on processes. All the IO in Wasm goes through imports and exports, rather than scanning through binary to find syscalls. No sneaky ways to get a syscall in. And Wasm is typed.

One of the big things is that makes Wasm good for virtualization. If you’re a wasm app and talking to a host, don’t know whether a module is a host api or a virtualization of it. These virtualization layers have all info.

One of the things this leads to is shared nothing linking, having linked instances together without linking address spaces together, so they dont’ share data directly but can still call each other. You can use calls as a lightweight IPC mechanism. Great for virtualization because we can virtualize calls in a way that’s hard otherwise.

How does an app pass file or any kind of data from one platform to another. Where should I put the file? What conventions does the host have? Same for networking. In order to transfer data, need to know a lot about the env you’re in. So this exposes a lot of info about hosts to applications.

Next big picture concept: data vs metadata. Data is the contents. Metadata is things like inode, filename, etc. This distinction leads to a broader dichotomy between compute and metacompute. Compute is what computers are for, in the literal sense. Metacompute is what you do to enable that—managing access, organizing files. Compute juse does math, data structures, algos, etc. So they are the most independent from host.

Last big idea. POSIX is dynamically typed. If input to a file happens to be a file, then you can do lseek. If it’s a pipe, then lseek will fail. It exposes what kind of thing the data comes from. This brings along some metadata concerns into the compute on the data. Let’s use our static type system to make the most of what Wasm gives us.

All these bg concepts lead us to a place called I/O Types. Part of a larger idea called Wasm-native I/O. POSIX bakes in a lot of assumptions about how the system is set up.

What is an I/O type. A type in the type system. Abstraction for a common I/O pattern. Abstracts away actual data from how the data gets there, how is it stored, etc.

Want to lay out a few simple examples.

i/o streams: They have reliable in order delivery of bytes. TCP streams are the archetype. IT may give us a stream of T. I’ve prototyped this in a library. Major ops are read and write. We also add skip, like read but discards data. Write_zeros, flush. Media_type: sometimes useful info from file name. We can provide string for this without exposing the whole filename. Pseudonym: You give it a stream and it gives you an opaque handle that contains the name. Forward: combine read from one input and write to another. This allows to copy from input to output without copying into the instance. Will be able to construct from an I/O Array.

I/O Arrays: dynamic arrays of bytes, like a Vec in Rust or C++. Not living in linera memory. Examples, device memory, files. Files are tricky because seem to be both streams and arrays. Prototype of this as well. Read_at and write_at is like pread. Advise: inform system of use case, like sequential access. Similar to I/O streams with media_type, pseudonym, etc. Copy is similar to forward in I/O streams.

One of the ways that you’ll get these types is using typed main. System can convert passed in files to streams or arrays. Programs that do this don’t care about the network stack, local address of the machine. Lots of things that we don’t have to care about. Typed main and I/O types avoids all of these complicated issues. The name of the program isn’t the relevant part. Just focus on compute, which we can make quite portable.

We do have some differences to POSIX. Can depend on multiple return values. Instead of int return, can report succeeding partially along with error.

For error types, POSIX has a lot of errno codes. Applications themselves usually don’t care about exactly how it failed. Bubble up to the user. Often the reason why something fails reveals info about the host. FOr us, in many cases, errors will be simple success or failure. We can use opaque handle to pass back up to Wasm env which can be logged to a file, but hide from application itself. This helps with virtualization. Doesn’t need to simulate what kind of error would have occurred.

Wasm doesn’t have signals, and seems like we won’t add them. So we don’t need to have a concept of being interrupted.

Big picture: not going to think of it as file or socket api, so not going to expose some of the weirder details.

POSIX also has concepts of read and write being atomic, but that’s poorly understood. Hard to decide from low level of read call whether it should be atomic. Instead, we want to introduce things like atomic files for things like config files.

Of course WASI already has wasi-filesystem, and wasi-sockets is underway. Relationship we expect is peers to wasi-io. One of the really interesting things is io types takes pressure off of filesystem to provide quite as much portability. For systems that want to have this level of non-determinism and port, use IO types. For other, less portable things, use wasi-filesystem. Wasi-filesystem can then be less aggressive about portability.

Some use cases will always want to know they’re talking to a file or socket. Some use cases will always want to specialize on that level. Some hosts won’t support all that functionality. Libraries can decide which they need and where they can run.

Any questions?

Let’s go more in depth.

Are files streams or arrays? Both, done with a single handle in POSIX. In IO Types, we want to have separate handles. Is lseek a stream or array operation? Both, kind of a bridge in a very subtle way. In IO Types, we can emulate lseek. The fd would have the handles for both the stream and the array. This way, can do lseek in userspace?

What about directories? Some apps want to iterate through directories. Want to avoid exposing names and that’s how dirs usual work. Instead, in IO Types, planning stream of lazy streams. Can check things like media_type to figure out if it wants to open, and then lazily open and it will give either a stream or array.

How does this relate to write once, run anywhere?

We’re starting with focusing on compute and abstracting the metacompute. Not focusing on the whole application. We can think about simpler applications for now and expand out from there. Instead of making whole app portable, growing the portable core and push metacompute to the boundaries. Won’t have business logic mixed in with thinking about how the filesystem works.

What about compat?

Going to have capability to program to IO Types directly, but also important to apply to existing APIs. We can build libc types on top of IO Types. Going to be a way for code that’s already somewhat portable to easily port. This work hasn’t started yet. Idea is that we’d have two different libcs. Do I want to use libc backed by filesystem or by IO Types?

Zero copy is something I get asked about a lot. Want to talk about how we think about that. Forward, skip, write_zeros, etc are examples. Worth mentioning that read is not zero copy. Zero copy for read means shared memory. That means you need to have a protocol on top of that memory. When we add that on, then it’s not very virtualizable. Possible that we’ll introduce zero-copy read in the future with changes to Wasm core, but for now don’t really need it for many use cases, but can instead use things like forward.

**Sam Clegg:** What about avoiding linear memory all together and allowing read to return slices to others memory? Then you could have an object that points to kernel memory.

**Dan Gohman:** Yeah, that’s a cool idea. I think we’d need to add things to core Wasm for that.

**Sam Clegg:** And GC would probably come in.

**Dan Gohman:** Some considerations there, but certainly something that we can think about. If wasm wants to go in direction of views, then WASI will go there too.

Async will be handled by Wasm CG. IO Types are about semantics of the IO and how its abstracted. Independent of how we wait for things to happen. Wasm stack switching subgroup will lead that work.

Data parallelism, don’t have time for that.

Hints of things to come:
- Vectored IO. Optimistic that IT can help with this
- Multiplexed connections, like QUIC. Maybe makes sense to have an abstraction on that
- IO Windows. Network engineers think in this way
- Interactive streams. IO Streams are unidirectional. Interactive is likely a different set of APIs
- Text streams is something I care a lot about. Rule out things like control codes.
- Terminal IO
- A way to do fsync and fdatasync.

**Matt Butcher:** What can we do to help?

**Dan Gohman:** wasi-io, trying to define witx files for these. Once we have this is to take a look at these interfaces and the backing prototypes. Try to use them in an application. File issues in wasi-io repo. That will be a place for open ended discussions. If you want to talk about these things in Discord, or BA Zulip. Reach out and lets talk about things we can do

Are there other patterns we can identify? What are the categories we can make

**Sam Clegg:** You mentioned dont’ want to deal with complexity of platform differences. Where does that complexity end up?

**Dan Gohman:** Some of it is in the host system. And some of it is pushing it out to outer parts of the program. Inner modules might be pure compute, with outer modules using wasi-filesystem. General theme is pushing that all the way out, to either those outer modules or the host.

**Matt Butcher:** Wanted to give unambiguous feedback that we’re really excited and that you’ve hit on an abstraction that we think is really workable.

0 comments on commit 1d369e0

Please sign in to comment.