Improving ActivityPub

Hello. This is not an official document. It hasn’t even been endorsed by anyone interesting.

Right now, this is merely the ramblings of a madman (me) trying to get my head around the biggest pain points in ActivityPub development, and what we might do in the future to move things forward.

0. Goals

The goal is to gather ideas and feedback, then publish a series of Fediverse Enhancement Proposals (FEPs) with broad support from the developer community, where we can make the real-world enhancements that ActivityPub desperately needs. Someone proposed that we bring this discussion to the Fediverse (via the apps we’re activity developing). I love this idea, and I’ll link to it from here as soon as it exists.

One last caveat: because I’m still learning this whole environment, I guarantee that there are many incorrect facts in this document. I apologize in advance. So please assume mistakes are because of my ignorance, and not malice, and please help me to make this document more accurate by correcting the things that I’ve gotten wrong, and the ideas I’ve omitted. I don’t intend to step on anyone’s toes; I’m just here dancing in the dark.

You may say I'm a dreamer
But I'm not the only one
I hope someday you'll join us
And the world will live as one

1. Strict Object/Property Schema

With ActivityPub and JSON-LD, objects are so loosely defined that it is difficult to know what any document means – and different applications end up making their own ad-hoc definitions. In addition, properties are very loosely defined, and can often be one of several different types, such as: a URL, a fully populated Object, a partially populated Object, or an array of URLs or Objects. This is madness.

1.1. Proposal - Strict Mode

I believe it’s possible to define a strict subset of ActivityPub’s expansive vocabulary definitions to be used by applications in the future. If the biggest pain point is the very loosely defined types and properties, then we collectively publish a strict definition that is easier to use.

This would be similar to strict mode in HTML, introduced in HTML version 4.0 that limited designers to a subset of HTML tags in exchange for improved performance and reliability in their designs.

Coming back to ActivityPub and ActivityStreams, I suggest that we look at common implementations for the most common values associated with each object/property. (I believe I saw a project at FediForum that’s already doing this work) We can document best practices, such as: the Name property is a string, not an array of Objects, and eventually publish a schema that is a strict subset of the existing Activity Vocabulary – 100% compatible with current implementations, then encourage app developers to use that schema.

Strict definitions of the data types used for each property. Which properties are objects, and which are arrays?
Define standard responses to each kind of activity, such as HTTP status codes, standard error messages, or other JSON object responses.
Specifically call out standard HTTP features for rate limiting and caching, to make otherwise chatty ActivityPub calls as efficient as possible.

1.2. Migration Path

Since this format would be a strict subset of ActivityPub, it should be possible for servers to SEND data in strict format at any time (with some compatibility testing, of course).

And, while servers may not be able to depend on RECEIVING strict mode data right away, with a common definition of what things should look like, we might be able to make standard adapters that translate “loose mode” ActivityPub into its “strict mode” counterpart.

Once “enough” instances have adopted strict mode (and, “enough” will be an important topic for discussion) we can start taking steps to remove the cumbersome processing requirements of JSON-LD and start parsing ActivityStreams documents as simple JSON.

2. Rich Interactions in the Inbox

As social applications grow, there are more (and richer) interactions that apps want to perform. Examples of these “rich activities” might be: managing a shared “to do” list, playing a song shared via ActivityPub, checking in to a location, or making a move in an interactive game.

But this currently requires that a user’s home server supports that kind of activity. If (for example) your server software doesn’t support Mastodon’s Question activity, then you can’t participate in an online poll.

Looking at the Fediverse as a single, integrated ecosystem, it should be possible for users of any server app to participate in these richer activities from their own inboxes without requiring that their server implements that activity. Or at the very least, to get from one part of the ecosystem to another without having to sign in to yet another service.

2.1. Proposal - Authenticated Activities

It is certainly too much to ask that every server implement UIs for every kind of activity on the Fediverse. So the rich interactions above need to be handled by the “remote server” that generated them in one way or another.

We could could accomplish this by sending notifications to the user’s inbox with link them out to the interactive content while still bringing the user’s existing authenticated session with them.

A naive approach could be to include some authentication token with the link, such as the BearCaps currently supported by Mastodon. However, this makes links un-shareable, as anyone who receives a copy of the link could authenticate as the original user.

Instead, I’d like to explore ways to include additional metadata in an activity that requests

FEP-61cf - Open WebAuth may be a solution to this, and is worth exploring more.

2.2. Proposal - Embedded Functionality

I’m less certain about this idea, but I think it’s worth exploring a bit, here. If applications were able to send small bits of interactive content in an Activity, then users would be able to participate in the unified ecosystem regardless of what software they use on their home server.

Obviously, strict limits would need to be in place to prevent malicious actors from breaking users’ timelines. But, there are already numerous ways to embed functionality from one server into another: IFrames and oEmbed are two that I think of first, and there may be better examples to follow. In particular, oEmbed is already partially supported by Mastodon (and others?) so it might be an interesting starting point.

2.3. Migration Path

Adding metadata into an Activity would be fully backwards compatible, so that servers that don’t use this metadata would still be able to display messages as they currently do, but their users would just be required to re-authenticate when they use those links.

Alternatively, if we define something similar to oEmbed, this would also be 100% backwards compatible. Servers that don’t support the embedded content would just display a regular link, while others could provide richer interactions via the embeddable content.

3. Remote Interactions (Activity Intents)

In working to make the Fediverse a truly integrated ecosystem, the converse to issue #2 above is also true. User’s don’t experience the whole Internet through the compressed lens of their inboxes. They discover information and services out there, on the open web. But once their, it is difficult to get back to their inbox to us their social identity and initiate social workflows.

I’ve detailed this problem in FEP-3b86 Activity intents and I believe this is an important part of bringing the entire social web closer together.

3.1. Proposal

FEP-3b86 Activity intents provides a simple way for users’ home servers to publish the social activities that they support, and for remote servers to initiate social workflows remotely. It builds on the old concept of “remote follows” created by oStatus, and generalizes and expands the kinds of interactions that are possible.

3.2. Migration Path

Activity Intents are backwards compatible and don’t break any existing workflows. But, they can fail if a user’s home server does not support a particular intent. For example, if a user clicks “Like” on a post on a remote server, they may reach a dead end if their home server does not publish an endpoint for this activity. The current FEP could be enhanced to support this, and better exception handling could be built into implementations.

4. HTTP signatures

Implementing HTTP signatures is one of the most difficult and error-prone aspects of Fediverse development. The list of reasons is long. Here are a few:

There is no mechanism for updating or rotating public/private keys.
Private keys are typically held by the server, which limits the ability to perform end-to-end encryption (E2EE)
Error messages from most Fediverse apps are typically useless
Errors are difficult to troubleshoot because there are so many moving parts. It’s hard to tell if the signature broke, or something else.

4.1. Proposal - Object

FEP-8b32 - Object Integrity Proofs aims to solve many of the issues above with signatures embedded into documents themselves. It may be the best solution to HTTP signatures. However, it still requires complicated encryption and public key infrastructure, so I’m not personally thrilled about implementing it.

4.2. Proposal - Lookup Validation

IndieWeb protocols typically authenticate documents by re-loading them from the original server. We could reduce the overhead of signing documents (and of validating those signatures) if ActivityPub worked more like WebMentions, where servers receive a notification of a document exists, then simply retrieve and validate that document via HTTPS directly from the source.

Example In this model, Activities themselves become very small, possibly consisting only of the actor, activity, and IDs/URLs of the various ActivityStreams documents involved. Here’s an activity that adds a document to a collection that simply references the two objects involved.

{
	actor:"https://me.com/@me"
	activity:"Add"
	subject:"https://me.com/my-document"
	object:"https://me.com/some-collection"
}

4.3. Migration Path

HTTP signatures are too intertwined in the Fediverse to simply remove without breaking compatibility with nearly every existing software. So, nothing is not going to change until “enough” servers form a critical mass around a better alternative, then set a sunset date to force the remaining apps to move to the new standard.

However, individual apps could implement this type of validation verification as a primary method of receiving documents, while still signing outbound documents for now, and could choose to stop using HTTP signatures once “enough” instances support a better standard.

5. Reply Collections and Moderation

I received several comments about replies, reply collections. For example:

Original post owners should have some control over who replies to their articles, or at least, which replies they allow on their timeline and share with their followers
Replies to posts may not be listed on all servers because of Federation limitations. I believe this issue is called “phantom replies” and no other self-respecting application would operate this way.
It’s also annoying that Mastodon’s requirement that every person in a thread be @mentioned. This is clunky and stupid, because it puts an upper limit on the number of people who can participate before Mastodon’s 500 character limit is reached.

Fortunately, I believe the Forums and Threaded Discussions Task Force is making good progress on these issues (among others). To summarize what I understand of their current direction:

Applications should use the standard ActivityPub context property as the URL of an OrderedCollection that is managed by the originating server.
Replies are added to this collection, and shared back with members of the group via an Add activity.

This mechanism provides a good way for remote servers to “back fill” missing replies because they can just traverse the replies collection.

This mechanism gives the original poster the option to turn off all replies, or to filter/moderate replies before they are added to the collection, or to remove them after the fact if necessary. In addition, this does not prevent others from side-posting replies to their own followers, which seems fine to me. You’re allowed to talk behind my back. I’m just not required to boost your comments if I don’t want to.

5.1. Proposal

I have nothing better to add than what the Forums and Threaded Discussions Task Force is already doing. All app developers should follow their progress and implement as many of their FEPs as possible.

5.2. Migration Path

I believe this work is all backwards compatible, so it will not break existing implementations that use the context property differently.

6. Account Portability

I don’t know that this is entirely in scope for me, but it’s a big-enough, and popular-enough issue that we should at least list it here. I believe there’s already a W3C group working on account portability and a few draft specs have already been published so maybe that’s enough for now? IDK. Is there a reason to duplicate the existing efforts?

Here are some notes I’ve found on the existing effort:

7. Client API

I believe most developers agree that ActivityPub C2S API was a nice architectural aspiration at the time, but was never ready for real-world use.

Currently, most apps simply mimic the Mastodon JSON API, and shoehorn their behavior into Mastodon’s format. But ActivityPub needs a real client API that is not bound to Mastodon’s use cases.

I don’t have a specific proposal here, aside from “someone should try greenfielding this, and listen to input from the developer community”.

Fortunately, the existing C2S API is so underused that migration issues are not really a problem. When we have a new spec for a JSON API, we can just roll it out.

8. Community Resources

We already have several existing resources and groups that could help to organize this. Let’s make sure we work together with:

Fedidevs.org
Fedi.foundation
Coding.social
More? Please help me complete this list.

9. Migration: What is “Enough”?

While many of the proposals here are backwards compatible and do not break existing apps or functionality, a small number of them do. Notably, any change to HTTP signatures could not be finalized until “enough” applications support a new verification method. This situation is echoed in other proposals to a lesser degree.

As a community, we should pick a percentage of active users served on the Fediverse – for instance 90% or 95% – as a threshold for making a switch to any new toolsets that are not backwards compatible, then monitor our progress towards rolling out new features, then announce a sunset date 12 months after we reach this threshold.

9.1. NodeInfo

This means that we should provide some way for instances to announce their support for any new conventions we want to introduce so that we can make a reasonable choice about when to sunset various standards.

The NodeInfo standard already serves this purpose, and it makes sense to me that we begin publishing this metadata via various NodeInfo properties. This would allow us index this information and publish our progress towards our goals (similar to Are We HS2019 Yet?).

9.2. Big Applications

If we choose this method for rolling out changes, this means we would have to have support from the most popular applications: Mastodon, Misskey, Lemmy, PeerTube, and PixelFed according to Fedidb.org.

Making changes without bringing these apps along would split the Fediverse, and given the fragile nature of this alternative network (and the massive forces of centralization that we struggle against) that’s an outcome I could not bear to see.

10. More/TODOs

Here are some additional comments from the Fediverse that I still need to ingest into this document:

Scan Mastodon’s Issue list for commonly upvoted ideas:https://github.com/mastodon/mastodon/issues?q=is:issue%20state:open%20sort:reactions-%2B1-desc
One of my pet peeves: Guarantee that the the “fetch”-based sequence of activities is the same as the “POST”-based sequence. In other words, guarantee that if I missed receiving a POST, I can recover by browsing the outbox.
Lack of reference implementation. A spec without a reference implementation or usable test suite hands control of the spec to the largest implementer.
Lack of easy extensibility. A successor needs a clearly documented capability for extensions.
Lack of opinion on implementation. This is a controversial one, but leads to implementations that are spec-conformant but not interoperable. The spec should provide a baseline set of operations that may/must be implemented upon receiving a message, with a set of expected responses.
Feature discoverability. When your protocol allows for wildly different implementations, feature discovery is essential to allow interoperability. This allows servers to negotiate for the largest implemented subset of features instead of defensively assuming the smallest.
Trust at the server level. A server verifies actors it owns, no individual certs. The verification mechanism must be baked into the spec and not left to implementers.
Batching.
Client API. C2S is almost impossible to implement. A replacement should be an optional, lightweight, minimum-surface REST API.