_Research

Digital and self-sovereign identity – an overview

by Felix Machart, Jan. 19

Identity fundamentally defines who we are and how we interact with the world. In this article, we want to share some of the research we have done around this topic. We cover identity and digital identity generally, self-sovereign identity and related concepts as well as blockchain-based primitives such as NFTs in regards to identity and some trade-offs, use-cases and project examples.

We are intensively engaged with the Identity vertical – especially since some of our portfolio teams are touching aspects of it, if not building on the forefront of corresponding solutions themselves:

  • Access control
    Layer 1 chains such as Celo, NEAR, and Q provide every user with a public / private key pair as their identity. Safe smart contract accounts (account abstraction) serve as a programmable avatar that can be used for universal access control with potential social recovery built-in.
  • Reputation
    Colony is a DAO framework with a focus on reputation-based governance, i.e. a way to build a context-specific identity and attributes, which is being used for decision-making.
  • Expression
    Other portfolio teams like “The Fabricant” and “Nina” deal with different ways of expressing human identity (or identities) through things like fashion and music.
  • Privacy
    Nym is a crypto network providing metadata privacy & Coconut credentials (privacy-preserving on-chain credentials for authentication and more). The word nym comes from the Greek for “name” and is connected to digital pseudonyms – persistent, unforgeable network personas which are yet unlinkable to the “true names” of their owners, which is essential to ensuring free speech, especially regarding controversial opinions. We have been witnessing a trend toward pseudonymous avatarism not only for privacy reasons but also for expression.

Generally, crypto/web3 attempts to give back control of identity as well as the related data- and asset ownership to the individual. However, on-chain data is in many instances even more prone to surveillance, which is why we need to focus on privacy-preserving (often off-chain) methods as much as possible.

Taking a step back

Identity is a complex and multifaceted concept. How it is defined and understood can vary greatly from one person to another. It generally refers to attributes that make an individual or a group (collective identity) distinct. 

  1. In a psychological sense,
    identity is an individual’s conception and expression of their own individuality or group affiliations influenced by a variety of factors such as culture, family, personal experiences, and social environment. It is the sense of self that defines who a person is and how they fit into the world around them. Since identity is context-specific, individuals can have multiple distinct or overlapping identities. 
  2. In sociology,
    emphasis is placed on collective identity, in which an individual’s identity is strongly associated with role-behavior or the collection of group memberships that define them.
  3. Identity also refers to identification and authentication documents that establish a person’s name, nationality, and other characteristics, usually issued by central authorities and increasingly shifting into the digital realm.

Identity is decentralized by nature: the attributes that make individuals and groups unique pertain to them. It is just the bureaucratic control structures as well as the client-server paradigm in the digital sphere that made individuals depend on centralized providers ever more for their identities. 

Digital identity

Digital identity refers to the online representation of an individual’s identity regarding use cases such as social media, ecommerce and finance. The way users create and extend their identities heavily influences their online experiences and interactions.  At the same time, people’s identities have become increasingly reliant on large internet monopolies, which monetize their data based on more and more surveillance. Furthermore, centralized systems also offer a significant attack surface for data leaks. Security is paramount for individuals to protect their personal information against threats such as identity theft, which has led to losses of $5.8B in 2021 in the US alone. Thus, data is becoming a liability for corporations to increasing extent.

There is a wide range of use-cases around identity solutions that can pertain to humans but also objects (e.g. unique identifiers for products and input materials as well as digitally native objects).

Greenfield adaptation from https://w3c.github.io/did-use-cases/

Especially important concepts and primitives that are specifically being discussed in the blockchain ecosystem but also beyond include:

  • Authentication
    proving that an action such as sending a transaction or logging into a website was authorized by an agent that has some identifier, such as an ETH address or another public key, without attempting to say anything else about who or what the agent is.
  • Attestations/credentials
    proving claims about an agent made by other agents (“Bob attests that he knows Alice”, “the government of Germany attests that Max is a citizen”).
  • Names
    establishing consensus that a particular human-readable name (or domain-name) can be used to refer to a particular agent or (web-)object.
  • Proof of personhood/sybil resistance
    proving that an agent is human, and guaranteeing that each human can only obtain one identity through the proof of personhood system (often done with attestations/credentials) as an important building block for things like governance (1 person – 1 vote/ quadratic voting) or UBI (universal basic income); in this context self-sovereign/decentralized, privacy-preserving, sybil resistance is a
    hard challenge/trilemma .
  • Key management
    since on-chain identities are demonstrated by public keys (and associated private-keys), secure & good UX key management becomes crucial (wallets).
  • Compliance
    in order to be compliant with regulation such as AML (anti-money-laundering), KYC (know-your-customer) processes need to be implemented.

Centralized Identity – siloed / traditional

The most basic method of authentication has been username and password. Password managers have been improving the security/usability trade-off space, since short & repeatedly used passwords are weak, while sophisticated and ever-changing passwords are hard to remember. Since usernames and passwords need to be stored on centralized servers (albeit usually in hashed form), data leaks of individual services frequently have been compromising a user’s other services as well if passwords are simple and can be brute forced. Password manager services, as well, have not been spared from leaks.

Federated Identity – SSO by 3rd party identity providers

Single sign-on (SSO) is a system that allows users to use a single set of login credentials to access multiple services. Based on their strong network effects, it is largely driven by players such as Google and Facebook to provide a more convenient and seamless user experience, but also to collect more data on connected services.

Decentralized Identity / Self Sovereign Identity (SSI)

An early attempt at decentralizing digital identity has been OpenPGP (pretty good privacy), mainly to encrypt emails. At crypto parties, public keys are exchanged in order to confirm the digital identity refers to the correct real-world identity and to form a web of trust.

The term self-sovereign identity (SSI) was first coined by Christopher Allen (a co-author of the TLS security standard), and has since set off a movement around 10 principles. Users should not only be the center of the identity process, they should be rulers of their own identity.

Principles of self-sovereign identity - see original article (https://bit.ly/_principles) & 5 years later (https://bit.ly/_principlesII)

Two of the crucial technical standards centering around that vision have been DID (decentralized identifiers) and VC (verifiable credentials) by the W3C

DIDs should have the following characteristics: 

  • decentralized
    there should be no central issuing agency;
  • persistent
    the identifier should be inherently persistent, not requiring the continued operation of an underlying organization;
  • cryptographically verifiable
    it should be possible to prove control of the identifier cryptographically;
  • resolvable
    it should be possible to discover metadata about the identifier.

Generally, DIDs have the following format: 

https://www.w3.org/TR/did-core/

DIDs can be public keys such as PGP keys or ETH addresses, which makes the standard widely applicable. In the case of an ETH address a did could look like the following:

did:ethr:0x6b175474e89094c44da98b954eedeac495271xyz

The DID method, in this case, is ethr with a subsequent Ethereum address as the DID method-specific identifier.

Verifiable credentials (VCs) transform real world credentials into digital, interoperable (json) files with a couple of key advantages:

  • resistant to fakes
    they are cryptographically signed by the private key of the attester and can be verified against her public key by anyone; 
  • resistant to impersonation
    they are issued to the public key of the subject of the credential, who can prove her identity by cryptographic signature with her private key;
  • flexibility in custody
    they can be stored by anyone, anywhere, and with replication without losing authenticity (e.g. on mobile devices, centralized and/or decentralized clouds, as well as on personal decentralized web nodes) – this increases flexibility, reliability as well as censorship resistance, since no one can stop a user from showing a credential in self-custody; a challenge is, however, that users might be overwhelmed by optionality as well as the responsibility of self-custody;
  • efficiency
    they are cheap to issue, replicate and send; 
  • privacy 
    • since no centralized intermediary is required to store VCs, attack vectors for surveillance and data leakage can be minimized (however, when revealing a VC to a verifier, he could save, share or publicize them allowing actors to build a graph of credentials revealing correlatable information);
    • as opposed to the DID which could be an on-chain address and should be public in order for parties to verify signatures against, VCs should not be on-chain since they will mostly contain personal information, which should not be stored publicly forever (based on regulatory and ethical grounds);
    • zero-knowledge proofs can be leveraged to prove aspects of a credential without revealing everything;
  • portability
    since DIDs & VCs are an open standard, lock-in to a specific system or platform (and thus dependency) is minimized, which is especially important for such a fundamental piece of infrastructure of people’s lives as their identity.

Roles and information flows in the VC specification | https://bit.ly/w3_identifiers

VCs represent statements made by an issuer in a tamper-evident and privacy-respecting manner about a subject. Acting as issuer, holder, or verifier requires neither registration nor approval by any authority, as the trust involved is bilateral between parties. An issuer might revoke a verifiable credential, while a holder might delete it. Verifiable presentations allow any verifier to verify the authenticity of verifiable credentials from any issuer. 

A verifiable data registry mediates the creation and verification of identifiers, keys, and other relevant data, such as verifiable credential schemas, revocation registries and issuer public keys, which might be required to use verifiable credentials. These could be various types of systems including trusted corporate databases, government ID databases, as well as public blockchains. The latter permits the best availability and sovereignty for users. 

Using DIDs, especially, when user data (or hashes thereof) are put on a blockchain, are however problematic in regards to privacy due to their immutable nature, exemplified in the case of attempts at building covid passports based on such technologies: 

Global identity on a blockchain cannot be done in a secure and privacy-preserving manner without advanced cryptography, which DIDs lack. For the use-cases of identity management, there are many well-known cryptographic techniques that offer strong and rigorous guarantees of privacy and security, although they are not used as the foundation of the W3C standards for digital identity, but merely included as an optional afterthought.

Harry Halpin, founder of Nym & privacy, security researcher; see https://arxiv.org/pdf/2012.00136.pdf

Privacy spectrum ranging from fully identified to pseudonymous | https://bit.ly/w3_privacy_spectrum

There are various shades of gray as to how a privacy-preserving DID is set up. The more publicly known and more frequently used a certain identifier is, the more likely it is that a single or set of parties can correlate aspects of identity to a specific individual leading to a loss of privacy, especially when credentials are shared without cryptographic techniques to protect privacy.

Always on-chain: NFTs & soulbound-tokens

NFTs (non-fungible tokens) on public blockchains have become an important piece of the identity stack as well from several perspectives. They can showcase the personality and interests of an individual (see our posts on avatars & fashion), show a track record of work or attendance with POAPs and serve as an indicator of reputation. Theoretically, all types of credentials that could be issued as a VC could also be issued as an NFT, publicly visible for everyone on-chain (which is however more complex).

Since it certainly makes a difference if an individual earned a certain NFT or if she bought it, non-transferability can be crucial with respect to identity and reputation. Such non-transferable tokens have recently been dubbed soulbound tokens (SBTs) in analogy to non-transferable items in video games. In order to make sure that a given token will truly remain attached to a given person (let alone soul), some mechanism to prevent transfers of public/private key-pairs needs to be in place (such as KYC & 2FA – see humanbound tokens). Most individuals will have a large disincentive to give up their identity (even if it’s pseudonymous, non-KYC), especially if there is a reliable mechanism for proof-of-personhood, which makes it extremely difficult to build up several identities. 

Non-transferable tokens do not need to be non-fungible necessarily. Reputation for example that is tied to a given identity could be represented as tokens as well as other forms of scores that might be decaying to reflect the most recent contributions.

By recording non-transferable credentials on-chain, there is an opportunity to build a more pluralistic ecosystem, a “Decentralized Society” that is defined rather by  social relationships of trust than transferable, financial assets, as in decentralized finance. Based on such non-transferable credentials, governance mechanisms such as quadratic funding, or quadratic voting become more feasible in a decentralized, sovereign way (while controlling for potential collusion / correlation of participants).. A basic enabler here is to allow individuals to prove their unique personhood in a decentralized way, by accumulating a number of credentials that would be hard to obtain multiple times for fake accounts. Otherwise, projects often need to rely on web2 infrastructure such as social media profiles for sybil resistance.

Proving to own a certain credential is possible with off-chain VCs. Proving not to have an attribute or credential is however only possible if they are issued on-chain (e.g. having taken out a loan or being member of a certain rivalrous community), while still requiring a proof of personhood.

The immutable nature of blockchains and the fact that no one needs to ask for permission to send / issue a token to a given access can make the concept of SBTs problematic in regards to privacy.

Hybrid systems that respect user’s privacy while maximizing interoperability and on-chain composability could include certain on-chain representations by e.g. NFTs that link to VCs stored securely off-chain, while an on-chain address as a DID would generally avoid censorship of entire identities. Using separate on-chain addresses for different activities, while aggregating zero-knowledge badges allows for not revealing all on-chain transactions. 

Examples of projects, approaches and technologies with regards to privacy as well as sovereignty / interoperability based on demonstrating open standards and/or offering permissionless composability (by no means exhaustive; projects and the classification is debatable, especially since projects/technologies are work-in-progress and evolving) | Greenfield Matrix

The matrix above shows various projects, grouped according to technologies as well as use-cases to provide examples and highlight the multitude of layers, as well as players in the identity stack. Individual, centralized identity systems provide no interoperability. However, they don’t gather data as widely as data-sharing/selling mega-corporations. Still, such centralized systems have significant risks of data breaches, as seen in the past. Everything on-chain is publicly visible forever, except if there is zero-knowledge cryptography for privacy-preserving proofs in use (which has been making significant progress in recent years but is still in its infancy). DIDs/VCs with data stored off-chain pose a lower attack surface regarding privacy, again while the use of zk-crypto minimizes data disclosure.

Technologies are tools – let’s use them for good

Digital, self-sovereign and decentralized identity technologies offer many great prospects.  They allow users to control their identity and own their data. They can enable access for external apps and connections in order to prove aspects about them as well as let services be customized to them without creating strong lock-in (such as music taste/playlists across services or providing privacy preserving proofs about certain credentials on a dating app). They can provide alternative or quasi-legal identities for marginalized communities and populations with a lack of legal identity (for access to economic opportunity and more – see SDGS)

However, there are more dangers than the risks to privacy revelation based on leaks of centralized systems and on-chain data, that center around social and economic pressures that could lead to an erosion of interpersonal trust and a re-centralization of power despite the best intentions.

It’s beyond the scope of this article to provide a complete discussion of all implications, but a couple of quotes should showcase some of the challenges that have been discussed not only since recently:

An acquaintance now quits those ‘old-fashioned’ relationship-building niceties and gets straight to the SSI point. Where do you work? Which college did you go to? Which college did your parents go to? Republican or Democrat? What’s your gender? Your ethnic origins? Do you have this gene or the other one? If you fail to offer up the requisite verifiable claims then you fail to get to ‘trust building’ first base in the SSI century. (Note: this is in fact trust avoidance not trust building.) You are then ignored or indeed rejected. But it’s worse. The new social norm now expects you, expects everyone, and more accurately expects your agents to perform similar examinations as a matter of course.
– see
generative-identity.org

… we cannot accept technology as a substitute for taking social, cultural, and political considerations seriously. Decentralized technology does not guarantee decentralized outcomes.
– see
Nathan Schneider .

There have been previous attempts at building decentralized web technologies and open standards. Some more, some less successful. See also Can Blockchains Solve Ten Years of Standardization Failure? by Harry Halpin.

We are optimistic – let’s talk!

We are extremely excited about all the work around user-centric identity approaches and remain optimistic about the progress the space can make collectively in terms of protecting individuals’ rights and sovereignty, while at the same time allowing increasing amounts of opportunity to collaborate and solve collective action problems through public blockchains, zero-knowledge technologies and open standards. We need better solutions on various fronts including improving UX and meeting users where they are.

Tell us about what you are building and how you are viewing the space!