A Database for me

Computing is more personal than ever. It organizes the food we eat, the work we do, and mediates our social lives. But the substance of these computations, the data, is only growing more impersonal. It's locked away in corporation's databases, available to us only through their tightly controlled apps.

A lot has been said about the negative aspects of this state of affairs. It means corporations can sell your data, cut off your service, and . Ultimately it means that most of the software we use is bad. Even when it isn't actively working against us, it doesn't work in the ways that would be most useful to us.

Because it's produced and controlled by companies serving millions of customers this software lacks any awareness of our local, intimate contexts. These companies are these collecting mind boggling amounts of data but they have no idea what to do with it. In many cases, the most productive thing they can think of is selling it to advertisers!

The person who does know what to do with it is you. You know that it would be useful to have your recpies being able to talk to your grocery list. Or have your calendar automatically tie into your workout manager. Instead, all this data is scattered across different apps, unable to talk to each other.

The core problem is, software is really hard to produce. And so, it's expensive. From a companies perspective, they need to produce the best possible software for the most generic need so they can access the largest market and make the most money. From their perspective, every unique need is a source of complexity and expense.

But, from your perspective, it's a simple problem. So, how can we get it so that you can make and use software, from your perspective?

Keep It Simple Stupid

We need to make software drastically easier to produce. And that means making it drastically simpler. We have one key advantage. We're selfish. We're only concerned with making software for us. Later we'll consider the (very important) question of how our software interacts with others, but for now, let's just focus on ourselves.

So, where does complexity come from in software? In the paper "Out of the Tar Pit" Ben Mosley and Peter Marks point to two root causes, State and Control.

I'm going to focus on State. State is all the information that a program interacts with and specifically that which needs to persist and change over time. The mass produced apps we've been talking about deal with the problem of state by seperating it out from their code and putting it in database.

A database is a piece of software designed specifically to manage data over time, and, most importantly, to constrain that data. The core "problem" of data is that it is, left to it's own devices, it is essentially unbounded in complexity. A database ensures that all the data your system is operating on fits a certain way, and constrains all interactions with it, so that it's easier for you, the developer to reason about.

Modern databases are incredibly powerful systems, but they're also quite complex themselves. They're made for the companies producing mass market software, who can employ people solely to manage them. If we want to make use of our local context to simplify software development, the most useful thing we could have is a personal database.

So, we (finally) get to the core question of this essay. What does a personal database for personal software look like? As the mainframe is to the personal computer, what is to the database?

Wrong answers

We're not talking about just local applications that run a database on your computer. Those are still removed from you by software designed by a third party. We're looking for a piece of software that manages your data, and that is designed for you to interact with directly.

The Right Answer: A simple database

What we need is a database that is simple itself, and hence, easy to think about.

There are a couple different parts of database, and we can think about how to reduce complexity in each.

The architecture

What is the very core of a database? The simplest abstraction is that a database is just a record of different events that have happened. Every time you take an action, it's recorded, appended to the sequence of all other actions you've taken.

Any questions you have about the data can be answered by looking at that sequence of actions. Want to know what books you've read? Just look over the log for every "read a book" action. You may be worried that reading over the entire log sounds like a really slow thing to do and you'd be right. Luckily, it's simple to build other data structures (called indexes) that make asking these kinds of questions easier, while still keeping the log of events as the "source of truth".

This is the append-only-log model model of databases. It's actually how many current databases work under the hood, but it's more of an implementation detail instead of a fundamental primitive. If you're interested in diving in more, I highly recommend the talk, Turning the Database Inside Out by Martin Kleppman.

It gives us a very simple to reason about mental model of our database. Importantly, it also models time, and how data changes.

The data model

On top of this fundamental architecture we can start adding constraints. You could imagine the log consisting of arbitrary data that you can just add as you go along, but this would get messy really fast. Instead, we need to figure out a way we can structure that data to make it as easy as possible to reason about it.

The two main questions are: 1. What is the fundamental unit of information 2. How do we want to group different pieces of information together?

We want this model to match as closely as possible to how we actually think about information in the real world. It needs to balance expresivity and structure.

We tend to think about data as being about unique "entities", different things in the world. This could be a person, or a school, or a course. So, what if we say that the base unit of our database is an object, and we can organize them into groups of the same kind of entity. This essentially gives us the SQL data model!

The problem, is while we definitely do think in terms of entities, we don't have very consistent views about them. Think about the difference in the information you have about a passing accquintance and all the information you have about a sibling or parent. They're both people, but the data about them is anything but consistent.

So instead of viewing an entire entity as the fundamental unit, why don't we take a single piece of information about an entity as the fundamental unit?

Then, instead of grouping them together based on fundamental distinctions, we can group them together based on the different pieces of information we have about them. The "people" in our database are all the entities with the pieces of information we think it's important for a person to have (a name, or maybe a birthday). But any single person could have any number of other pieces of information attached to them.

This data model goes under a bunch of different names, but is most commonly reffered to as a "triple", because each piece of information is made up of three parts:

An entity
An attribute: (the "type" of the information, i.e is this a persons name, or age, etc)
A value: the actual data. This could be a number, some text, or a reference to another entity!

Using this data model you can represent all sorts of complex information.

The interaction model

Okay, so now we can take this data model and use it to create the interfaces an end-user would actually interact with. These are often the biggest source of complexity, so it's important we get them right.

You can think of there being three kinds of interactions you want to do with a database, reading, writing, and reacting.

Writing, we already have a good abstraction for, appending events to the log. We know that our events are about triples, so we can say that each event is just adding or removing a triple.

Reading is slightly more complicated. You want to be able to ask complex questions about the data in your system, easily and concisely. Importantly, you don't want to have to define how your question gets answered, just what the question actually is. You also want to easily tell what a question is asking, and what the answer is going to look like, just by looking at the question itself.

Reacting is perhaps even trickier. You want to be able to respond to changes in your database, either by TK

Alright, so what?

Okay, so let's assume we have a piece of software that hits all these marks. It's based on an append-only log structure, uses triples as its data model, and has simple models for transactions, queries, and reactions.

How would this change peoples lives?

The rest of the fucking owl

There is of course so much more than the pieces I've described here. For this vision to come to fruition a couple things must have answers.

The first is an interface. I talked about the programmatic interfaces, but higher level constructs for doing all three of the basic interactions, reading, writing, and reacting, will be neccessary.

Secondly, this database needs to have a notion of social interaction. Our lives are not neatly segmented from one another, and neither is our data or our software. We need the ability to reference and interact with data that lives in other databases.

This doesn't mean going straight to the massive central databases we have today. There's a lot of room in the middle to explore smaller, more intimate social contexts.

Finally, we need ways to share not only data, but software. While I shouldn't be forced to use a tool made by someone else, if I want to do the same thing as them, there's a ton of value gained by sharing the code. Re-use is important, but it's also incredibly hard to square with customizabliity and context specific code.

Outline

Why do this
- lots of cries for us to own our data, but that means more than a technical architecture, it means the tools for end users to understand and manipulate their data
- Social applications are really hard, but many really useful tools aren't social and we can start there.
- Applications are built on top of data. If I want to build situated software for myself, it needs a database
- What could I use, and why it's not good enough
  - sqlite
  - ssb/hypercore/various other things
- My computer at some point stopped feeling like a place I controlled, and I miss that.
What am I going to use it for
- Collecting things
  - Websites, papers, recordings, books
- People
  - why a CRM isn't the right fit
- Organizing my writing, tweets, newsletter, etc
- Making new habits, rituals, etc
How it works
- The data model
- The query language
- The programs
Some examples
- Books and papers (I'll have to actually implement this!)
- Notes and backlinks
A database in the long now
- Why this is so hard for me
- How to experiment while keeping a foundation
  - Migrations
  - Time travel
Future exploration
- Interface ideas
- The API, limitations and opportunities
- Social
Inspiration
- RDF
- Datomic
- Roam
- Org-mode

Draft

built with nextjs and typescript

view source