Technology
Peer-to-peer
Our peer-to-peer technology works at the heart of Spacedrive allowing all of your devices to seamlessly communicate and share data. This documentation outlines
Implementing features with P2P
TODO:
- From frontend, to backend
- Including authentication
- Versioning/making breaking change
- Show using
sd_p2p_tunnel
Underlying technology
Terminology
- Node: An application running Spacedrive's network stack.
- This could be the Spacedrive app or the P2P relay.
- If you have multiple Spacedrive installations open on your computer, each one is an independant node.
- Library: A logical collection of your data within Spacedrive.
- From a theorical perspective, a library is just the conflict resolved state of one or more instances although a lot of the time we don't stricly treat it that way.
- Instance: An instance of a library running on a particular node.
- An instance correlates directly to each SQLite file.
- You could technically have more than one instance for a library on a single node, although our core would fall apart as we identify traffic by library.
Identity
- A public/private keypair which represents the library or node.RemoteIdentity
- A public key which represents the library or node.PeerId
- The identifier libp2p uses. Can be derived from aRemoteIdentity
.
sd_p2p
crate
The P2P crate was designed from the group up to be modular.
The P2P
struct is the core of the system, and suprisingly doesn't actually do any P2P functionality. It's a state manager and event bus along with providing a hook system for other components of the P2P system to register themselves.
This modular design helps with separting the concern which helps with comprehending the entire system and makes it easier for testing.
The sd_p2p
crate provides a hook for:
Mdns
- Local network discoveryQuic
- Quic transport layer built on top oflibp2p
What are hooks?
A hook is very similar to an actor. It's a component which can be registered with the P2P system.
A hook allows for processing events from the P2P system and also ensures when the P2P system shuts down, the hook is also shutdown.
Their are special hooks called listeners. These are implemented as a superset of a regular hook and are able to create and accept connections.
Subcrates:
sd_p2p_block
- Block protocol based on SyncThing Block Exchange Protocol v1sd_p2p_proto
- Utilities for zero fluff encoding and decoding.sd_p2p_tunnel
- Encrypt a stream of data between two nodes
sd_p2p_proto
This crate provides utilities for implementing asynchronous deserializers and matching synchronous serializers. The goal of these implementations is to really quickly send and receive Rust structs over the network.
This crate allows for creating implementations faster than other common options, at the cost of some developer experience.
Before building this I originally compared the performance of both msgpack and bincode against manual implementations using AsyncRead
and I found that over the network using asynchronous deserialization was faster.
This makes logically makes sense as if you want to use a synchronous serializer you will do the following:
- Send the total length of the message
- Allocate a buffer for the message
- Wait asynchronously for the buffer to be filled
- Synchronously copy from the buffer into each of the struct fields
When using an asynchronous serializer you can skip sending the messages length and allocating the intermediate buffer as we can rely on the known length of each field while decoding and this is a win for performance and memory usage.
This crate provides utilities to make the implementations less error prone, however long term it would be great to replace this with a derive macro similar to how crates like serde work.
From my research no crate exists that meets these requirements. It is also a difficult problem because your juggling lifetimes and async which is rough. I attempted an implementation called binario, however it is still incomplete so we never adopted it. I suspect Rust's recent stablisation of RPITIT would make this much easier.
Local Network Discovery
Our local network discovery uses DNS-Based Service Discovery which itself is built around Multicast DNS (mDNS). This is a really well established technology and is used in Spotify Connect, Apple Airplay and many other services you use every day.
Service Structure
The following is an example of what would be broadcast from a single Spacedrive node:
# {remote_identity_of_self}._sd._udp.local.
name=Oscars Laptop # Shown to the user to select a device
operating_system=macos # Used for UI purposes
device_model=MacBook Pro # Used for UI purposes
version=0.0.1 # Spacedrive version
# For each library that's active on the Spacedrive node:
# {library_uuid}={remote_identity_of_self}
d66ed0c3-03ac-4f9b-a374-a927830dfd5b=0l9vTOWu+5aJs0cyWxdfJEGtloEepGRAXcEuDeTDRPk
Within sd-core
this is defined in two parts. The PeerMetadata
struct takes care of the node metadata and libraries are inserted by the libraries_hook
.
Modes
note
This section discusses 'Contacts Only' which is not yet fully implemented (refer to ENG-1197).
Within Spacedrive's settings the user is able to choose between three modes for local network discovery:
- Contacts only: Only devices that are in your contacts list will be able to see your device.
- Enabled: All devices on the local network will be able to see your device.
- Disabled: No devices on the local network will be able to see your device.
Enabled and Disabled are implemented by spawning and shutting down the sd_p2p::Mdns
service as required within sd-core
.
Contacts only the mDNS service will not contain the PeerMetadata
fields and instead will contain a hash of the users Spacedrive identifier. If a Spacedrive node detects another node in the local network with a hash in it's contacts, it can make a request to the node and if the remote node also has the current node in it's contacts, it will respond with the full metadata.
Implementation
We make use of the mdns-sd crate.
Manual connection
The user can manually provide a set of SocketAddr
's and the P2P system to attempt to connect to.
This feature primarily exists for usage in combination with Docker but it could be useful for working around difficult network setups.
Implementation
TODO - TODO
Problems with Docker
TODO - MDNS daemon TODO - Docker and why it's a pain mDNS. Explain the current stuff i've done with it.
Transport layer
TODO - Quic
TODO - Explain authentication
Relay
TODO
Direction Connect via Relay
TODO
Authentication
TODO - How we gonna restrict this???
Billing
TODO - How we gonna bill for this???
Design Decisions
TODO
Things I would do differently?
TODO
Crates
TODO
Security
Threat model
TODO - Risks of sharing IP's using discovery, risks of compromised relay, risks of compromised local network during pairing
Authentication
TODO
Authorization
TODO
Tracking
TODO - Link to Apple stuff
Version compatibility and breaking changes
TODO - Compatibility across versions of Spacedrive
libp2p
TODO - Why libp2p fork?, Why libp2p can be problematic for what we do TODO - How we transpose our certificates to libp2p certificates
Major issues
TODO - mDNS issues on Linux TODO - The double up of service discovery when using local and relay
TODO - Question? Why does remote_identity_of_self show up in metadata and the mDNS record itself.
TODO - Request flow. Eg. incoming goes from Quic to mpsc to the users code, to the handlers. TODO - Resumable uploads/transfers