awarm.spacenewsletter | fast | slow

A Database for Me

Alright well I held off for the first one so y'all would think I've turned a new leaf, but, as world the keeps turn, I'm back on my bullshit.

My next 4 newsletters are going to be focused on the design and implementation of a personal database.

What's a personal database for?

Most databases today are designed for "applications". That is, they slice data by use-case. For example, hyperlink.academy stores data related to courses, cohorts, payments, etc. The database is designed to let us explicitly model that domain.

As an individual though, my experience isn't neatly divided into different use-cases. The information that is woven through my daily life, from information about my friends, to notes I write, to this blog, is messy, interconnected, and ever-evolving. A personal database is a tool that can let me store, query, and interact with that kind of data.

What's different this time

Okay yes, I have attempted this before. Fancynote, the project that was woven throughout much of last year, was in many ways a database project, though I never quite got down to calling it that. That's the first thing that's different this time around, I'm explicitly focusing just on the data layer.

The other thing is I learned a lot over the last year, and have what I think is a pretty good approach in mind. Here are some of the constraints I'm thinking of:

1. Use a triple data model

I've become somewhat obsessed with this data model lately, which is a wierd thing to say I know. The core idea is that your entire database is modeled as a set of "facts" about entities. Each "fact" is a "triple", or composed of three values: an entity, attribute, and value.

The entity is just what this fact is about. It could be a person, an object, an event, or any self-contained unique thing. The attribute is what property this fact is about. For example the "name" attribute means this fact is about a person's name. And finally the value is the actual data of the fact. So if I were to store data about myself, the attribute "name" would have the value "Jared".

For example you'd get something like [entity:"me", attribute:"name", value:"Jared"]

I'll be getting into it more over the next couple weeks, but this simple system lets you model a really large set of scenarios. It get's particularly powerful when you have references to other entities as the values and start modelling graphs.

2. Use the file-system

Storing all data as independant text files on the filesystem makes it way easier for other programs to interact with it, to both read and write. Specifically, I want to use my text-editor to work with these files, grep to search them, and git to version them.

3. Stable query langauge and file format

Instead of focusing on the internals of this database, I'm going to focus on getting the external interface, the thing that other programs interact with, as stable as possible.

This means I can change the "backend" over time, without having to re-write any external code, or deal with complex migrations.

You can see some of my initial thoughts on this here

4. Do the stupidest thing possible

That ability to change is really important because there are a lot of ways to optimize a database and a lot of them seem like fun to implement. But I want to start and get things working now, which means intentionally leaving all the interesting, smart optimizations for later.

The Plan

I want to have an MVP of this done by next week. So, reading data from files and querying it. The stretch goal is to also have an API for creating and updating files, but I can just manually create them for now.

Over the next 4 weeks I'll be writing about the implementation challenges, the arhictecture, and hopefully a couple applications I build on top. It's going to be a learning experience!

Ultimately my goal is to have something that I could conceivably use for the next 5 years at least. That's ambitious (at least by the standards of my previous attempts), but I think I can pull it off.

subscribe for updates