Backing up dscs-spec from its repo

master
James Martin 2018-03-17 03:02:39 +00:00 committed by GitHub
parent e2e8d2ed42
commit 3062b05aa9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 163 additions and 0 deletions

163
dscs-spec.md Normal file
View File

@ -0,0 +1,163 @@
# Distributed State Coordination Suite
## Introduction
The source for this draft is maintained on Github. Suggested changes should be
submitted as pull requests at https://github.com/lijerom/dscs-specs. The original
author is not an expert, so if something seems strange, you're probably right.
The Distributed State Coordination Suite (DSCS) is intended to form a generic
network for distributed applications that can provide the same guarentees as and
similar or better performance than a centralized client/server model does. It
itself is a protocol suite for coordinating state, data distribution, and
network topology in a peer-to-peer network.
## Design
### The flaws of distributed systems
There are four incompatible guarentees that a distributed data store can make:
* Everyone is syncronized (consistency)
* Everyone can access the data (availability)
* Minimization of latency (latency)
* Tolerance to network splits/partitions (partition tolerance)
#### The choice between consistency and latency
The PACELC theorem states that even when there is no network partition, one must
choose between consistency and latency, an inherent flaw of distributed systems.
It can be made acceptable by immediately performing computations on incoming
data, and then retroactively integrating latent packets. This is similar to
client-side prediction in modern games. It allows the presentation of an
acceptably imperfect state to the consumer, or when perfection is needed,
a head start on the relevant computations.
This, in general, would be extremely difficult with stateful computations.
However, functional reactive programming allows deriving a body of state
from a body of computations, preventing such conflicts by ensuring the
independence of the vast majority of computations, and easily allowing
cascading updates when necessary, while preserving still-valid results.
#### Maintaining trust in the face of network partitions
While the authenticity of a packet to an identity can be verified with digital
signatures, that can not be used to prove the order of state changes
(transactions), or that no transactions are being intentionally left out. In
addition, it's usually impractical to achieve the current state by replaying an
enormous list of transactions, so there must be a way to trust an opaque blob
of state.
This can be handled in a variety of ways. (transactions = state changes)
* Don't have state. Static data can be trusted solely by digital signatures.
* Don't care about trust. There is no real persistent state, only the here and
now. An example might be a real-time chat program. Chat logs from one side of
the network can optionally be relayed when the networks rejoin, though it is
not strictly trustworthy. Timestamps can prevent replay attacks.
* Use proof-of work (i.e. blockchains). The biggest chain was too expensive to
spoof, so it must be true. However, proof-of-work computations result in a high
latency, and any transactions on a smaller chain in the event of a network
partition are nullified, destroying availability.[1] This relies on there being
a certain level of activity though, if my understanding is correct.
* Use trusted observers. While this sacrifices being perfectly distributed, a
dynamic web-of-trust can allow the verification of all data-- as long as enough
people are connected. Any actions that occured during a period when the chain of
trust was insufficiently configured would result in the nullification of that
data or a netsplit.
* Use third-party trusted observers. Rather than a dynamic friend-to-friend
web-of-trust, what in essence is a server is used, probably as a supplement
to the dynamic system. This is the least trustworthy system, but is no worse,
when not better than, a client/server model.[2]
The DSCS will support all of these options, and they may be chosen as appropriate
for an application.
[1] In a lot of situations, this is actually acceptable, since only small,
local portions of the network are likely to disconnect at any time, which is why
cryptocurrencies like Bitcoin work. If two networks are inherently going to have
an unreliable link, one can simply run two independent networks. In the case of
a cryptocurrency, atomic exchanges between them can be securely done while the
network is up, allowing currency availability on either side, while still
allowing trade between them. There was an excellent article on this, but I seem
to have lost it.
[2] At the present time I foresee only two instances where this would be
necessary. One might be if network is too irregular for a persistent sufficient
quantity of your trusted observers to be available, yet nobody can trust
eachother anyway. The other is when a secret needs to be kept to prevent
cheating in a game (e.g. a player's location, or the location of a secret base),
in conjunction with a much more complex system to handle this described below.
#### The choice between consistency and availability
According to the CAP theorem, in the event of a network partition, either
consistency or availability must be sacrificed. A sacrifice in availability
is equivalent to the client/server model, where the data on the other part of
the network is simply unavailable. Since the entire point of DSCS is distributed
state, implying massive redundancy, this would often not be an issue, except for
in the case of soft partitions (described below), or blockchains (described
above), unless a complete lock was intentionally forced. Otherwise, the networks
will simply run independently and merge, at the cost of either side being
inconsistent with the other until that point.
#### The choice between consistency and availability
The PACELC theorem can be made acceptable by immediately performing
computations on the data coming in, and then retroactively including the
latent packets. This is similar to client-side prediction in modern games.
It allows presenting an acceptably inaccurate state to the user, or when
perfect accuracy is absolutely necessesary, reduces the latency caused by
computation. That task can be made easier by functional reactive programming
(wherein the body of state is derived from a body of calculations rather than
being directly set, which would make retroactively changing state extremely
difficult).
### Performance and scalability
A serious internet network needs to scale with inevitable exponential growth,
provide high bandwidth, and very importantly, minimize latency, without
sacrificing any of its guarentees.
#### Distributed data
In a DSCS network, depending upon the trust system, a very large number of peers
at any given time redundantly store data. If the data is broken up into chunks,
each connected peer can provide a chunk, resulting in extremely high throughput,
similar to how bittorrent functions.
#### Routing optimization
DSCS will have an optional protocol for self-organizing networks, with the
fastest nodes carrying more traffic or something.
#### Soft partitions
### Anonymity, privacy, and security
#### Friend-to-friend routing
#### Compatibility with existing anonymity overlay networks
#### End-to-end encryption
#### Digital signatures and identities
### Spam and DoS resiliance
#### The web-of-trust revisited
#### Proof-of-work
#### Throughput rationing
#### Secret identities (?)
### Adaptability
#### Providing options
#### Fallbacks and stripping
#### Modularity and reusability