Contemplation and Collaboration with CRDTs
A common complaint these days is that our work tools and practices bias much too heavily towards constant collaboration. We’re all drowning in Slack pings and meetings, and what we really desperately need is some time to close the office door and just...think.
While I’m sympathetic, I think this view is too simplistic. Thinking alone and thinking together aren’t two ends of a one-dimensional number line that we can tune with a dial; the interplay is far richer than that!
A couple years ago, I moved from working at a fast-growing YC startup to being a grad student in computer science. Ah, no more Slack pings, and an empty calendar: I could finally just...think. This has actually been quite nice in practice: I can spend a morning reading a research paper on a whim, and the empty space does seem to enable me to think new thoughts. But I also quickly realized that this isn’t sufficient, and collaboration still plays an essential role in nearly all of my work. Whether it’s getting feedback from my advisor, leaning on teammates to make progress, or just having a fun time throwing around ideas, I need to work with others to make progress.
In this context, the question isn’t simply “how can I stem the tide of over-collaboration?” or even “how much time should I go solo vs collaborate?” It’s something more like, “what are the right patterns for combining individual thinking and group collaboration?” This is the subtle question at the heart of many creative processes. A design critique, a brainstorming session, an editor’s notes on a piece: these are tools which, wielded effectively, can nurture a healthy relationship between contemplation and collaboration.
So where do computers fit in to this picture?
We all know that computers are great for real-time collaboration these days. We write a document live in Google Docs, we draw diagrams together in Figma, we talk through an idea instantly in Slack. I find these tools incredibly effective when used properly, and they deserve heaps of praise.
What about contemplation? It’s easy to criticize computers here—notifications constantly appearing, Twitter one click away—but I find that it’s actually not too hard to make a computer into a decent tool for contemplation if you’re motivated enough. I’m writing this in a full-screen private Notion document with notifications disabled on my computer, and it’s a fine place to write. Maybe an analog typewriter would reduce the need for self-control, but this works well enough.
I think the most room for improvement lies not in how computers support either of these modes individually, but how they support the space between. How can computers help us better synthesize contemplation and collaboration, moving fluidly between the two as needed?
Collaborative writing woes
Personally, I feel this problem most acutely when doing collaborative writing. As a grad student, I write many papers, essays, and talks with other people. These aren’t just casual internal docs; they’re publications that I’m signing my name to. Even though my collaborators are brilliant and lovely people, I find that writing together is often very difficult—partially because navigating ideas together is just hard work, but also because of tooling friction.
Google Docs is my preferred tool for getting feedback. Inline comment threads and suggestions are fantastically useful. But I don’t like writing in Google Docs; I find it unnerving to do deep writing in a publicly visible place, and I don’t like the look of the editor. The simple collaboration mechanics also start breaking down in larger groups; for example, it’s difficult to get independent feedback from multiple people, and there aren’t nice ways to go try out an edit before committing to it publicly. Overleaf (a similar tool for academic Latex publishing) has similar pros and cons, with an uglier UI.
One approach I’ve used to good effect is to write in Markdown documents, shared over git, and compiled to PDF/HTML via Pandoc. If everyone on the project is already familiar with Markdown and git (big if! but usually true in my context), this process has a number of interesting benefits. The most important one for me is space to think. I can write or edit calmly in a local text editor, knowing that no one else can see my work until I share. I’m not quite sure why this makes such a difference, but it’s very noticeable. I suspect part of the reason is that I find writing very cognitively demanding, so small differences in my environment can make a big difference.
I also like recording granular commit messages explaining the reasoning behind a change to my collaborators, so they can understand and react better. This kind of thing can also be done via a side channel like Slack or email, but it helps to co-locate the rationale with the change history. For example, here’s an explanation I wrote of a change in a recent essay:
commit 937b536b00efd4af7d9a10b42af1c37208b065f7 Author: Geoffrey Litt gklitt@gmail.com Date: Fri Nov 19 12:26:22 2021 -0500
Strengthen transition into ops-to-marks section
I thought that it wasn't obvious enough that ops-to-marks is a key part of the CRDT. We now emphasize in the transition that this is a critical last step. I also added a diagram showing this last phase in the algorithm.
I use branches surprisingly often when using git for writing. It’s a nice way to try out a major restructuring or rewrite of a section, while others are continuing to tweak the main version. The patterns here are somewhat reminiscent of the ways git is used when collaborating on a codebase. With branches, I’m more free to go try something speculative, and if it works out, there’s a path to merging it back into the shared version.
Still, there are many problems with the git + markdown workflow. I haven’t found a good way to leave inline comments on the document and have those live in the shared repository. Edit suggestions aren’t ergonomic: in Google Docs it’s easy to suggest 10 distinct tiny changes and allow a collaborator to accept/reject them individually; doing 10 tiny pull requests in GitHub wouldn’t make sense. There’s no quick path to live collaboration when it’s occasionally desired (like workshopping a paragraph on a call), which often leads to copy-pasting back and forth from other tools like Google Docs.
And the worst problem of all is that this entire setup is basically only usable for programmers, mainly because git is hard to use, but also because compiling documents in pandoc isn’t particularly accessible either. Even though I usually collaborate with people who can figure out these tools, it’s still a burden. And ultimately this workflow is a non-starter for broader collaborations involving non-programmers.
I’m speaking from my own experience here because I know it best, but there’s some great research out there about how people use collaborative writing tools. One of my favorite references in this space is Collaborative Writing Across Multiple Artifact Ecologies, by Ida Larsen-Ledet, Henrik Korsgaard, and Susanne Bødker. These researchers studied how people cobble together different writing tools to get work done, and how the social collaboration process interacts with the tools. Table 1 is a real gem, a summary of various workflows like “horizontal divide and conquer” and “joint writing” that people use to write together.
Anyway, I sense that there’s an opportunity for new kinds of collaborative writing tools that do a better job helping people work both alone and together to produce great writing. I’m optimistic that these tools could be quite impactful too, because writing together is one of the best ways to mind-meld across a group and align on a detailed, shared view of some topic.
Peritext, a rich text CRDT
I spent much of this year working with Slim, Martin Kleppmann and Peter van Hardenberg at the Ink & Switch research lab, chipping away at one aspect of this problem. We developed a conflict-free replicated data type (CRDT) for rich text called Peritext.
I’ve been fascinated by CRDTs ever since I first heard about them in the local-first software essay by I&S. They’re a kind of data structure that allows different copies of the same data to diverge and merge back together again, while avoiding irritating merge conflicts and preserving as much user intent as possible. They’re a promising foundation for hybrid offline-online collaboration, as well as flexible branching and merging on shared documents. I’ve found the core mathematical ideas elegant and surprisingly simple to understand—most of the challenges seem to be in making things performant, and in thinking carefully about how end-user expectations map on to the mathematical guarantees.
People have done a ton of work on managing plaintext in CRDTs, but surprisingly there’s been very little work on handling rich text with formatting. There are some code implementations out there—the most prominent one being yjs, a nicely full-featured CRDT system with many rich text editor integrations that people are happily using in production. But until now no one had clearly articulated what properties you’d expect a rich text CRDT to maintain, which makes it difficult to evaluate, much less design, an algorithm.
As we started digging in and analyzing how we thought rich text formatting should behave when being merged together, we found that no existing implementation worked the way we wanted. So we decided to write down our idea of what should constitute correct behavior, and to design and implement a new CRDT that meets those criteria. Here’s our essay describing the work:
Peritext: A CRDT for Rich-Text Collaboration
Meta note: it was not easy getting to this point! The initial plan was a three-month summer sprint; in reality the project went many months over, and we had to go back to the drawing board at least three or four times to get it right. The current state is still limited in scope—we decided to exclude block formatting (e.g., nested bulleted lists) for now, and we haven’t yet built any serious writing tools with the CRDT, which will surely reveal new challenges. The lab is going to continue pursuing these next steps, but we decided it was worth sharing the initial milestone for now, and I’m hopeful that having these correctness criteria and a reference implementation out there will be useful to the community. Sometimes building solid foundations takes time...
Coming soon
The past few months, I’ve been working with Nicholas Schiefer at MIT on a new project: a “reactive relational” state management framework. We want to make it much easier to build user interfaces by leveraging powerful ideas from databases and spreadsheets. Not much to share publicly yet, but hopefully we'll be sharing a demo and some details within the next few months.
Reading these days
- The Essence of Software, by Daniel Jackson. My PhD advisor released the book he’s been working on for many years! It's about how to systematically design software at a deep, conceptual level; it really helped me clarify some vague intuitions and gave me a fundamentally new lens for design. I’ll write more about this sometime, but for now I recommend giving it a read. Oh, and Fred Brooks called it a “monumental work”, what recommendation could be more epic than that??
- Engineering and the Mind’s Eye, by Eugene Ferguson. On the role of visual thinking and drawing in engineering. It focuses on physical engineering but many of the ideas could have analogues in software too.
- The Aesthetics of the Japanese Lunchbox, by Kenji Ekuan. As someone who grew up in Japan and appreciates a good lunchbox, this book has been a joy. Oblique writing, but maybe that’s unavoidable when trying to explain subtle Japanese ideas around beauty.
That’s it for this dispatch. Hope you enjoy what's left of 2021, and happy new year!