For those who don’t know it, Syn (short for synonym) is a global Process Registry and Process Group manager for Erlang and Elixir that I’ve written a few years back. It has been around for some time now, and it has served me well on many occasions. Through the years, I’ve built up a series of considerations and ideas for improvement, and now the time has come to put all of these back into Syn.
Syn v2 has been rewritten from the grounds up, and I’d like to share the reasoning and the architectural choices behind this work. Since Syn is written in Erlang, the few portions of the code here below will show Erlang syntax. Don’t let that discourage you if you are an Elixir developer though, as the same principles apply.
Things I wanted to keep
I’ve always treated a Process Registry as a specialized subset of a Key / Value store. Yes, the Key is the process registered name and the Value is the
pid(), but the important bit here is that a process inherently belongs to a node – because it runs there. This changes the game quite significantly.
Contrary to standard Key / Value stores where you want to keep all of your data when a node leaves or crashes, why would you want to keep the name reference to a process, if that process runs on the node that left, or worse, crashed (and the process with it)? If a node gets added, why would you want to handoff a running process’ registration handling to this new node (so, not the node the process is running on)? In general, why would you want to decouple the node that handles the registered name of a process from the process itself?
If your answer is that in case of nodes leaving / crashing you need to keep the process’ information, then what you need is a persistent data storage, not a Process Registry. If your answer is that you want to spawn and load balance processes across a cluster (i.e. workers, that also happen to end up registered by name), then maybe what you need is a Job Queue, not necessarily a Process Registry.
Now, if what you want from your Process Registry is to register existing processes, then maybe you can agree that a process‘ registration is strongly tied to the node the process runs on. If the node dies, then the process dies as well, and keeping its registration name around probably doesn’t make much sense. If you agree to this, then my suggestion is that in a distributed registry every node should be handling the registration of the processes that run on it.
In this scenario, we have ourselves a different paradigm from the ones of standard Key / Value stores. For instance, you do not need a Hash Ring to load balance the processes in your cluster and create replicas for fault tolerance of data: if a process or the node it runs on dies, you want to let its registration name go. You could still consider using a hash ring or consensus algorithms such as Raft to register a name, because you might experience race conditions for the same name being registered simultaneously on different nodes. However, Syn takes the approach of using a similar conflict resolution mechanism used to resolve net splits for these cases.
This is even more true in the context where Syn was born. In IoT applications, you generally have a process that handles an external (TCP/UDP) socket to a physical device. It only makes sense that the process runs on the same node of the socket it handles, because in this scenario (maybe it’s just me) I cannot see the sense of having an external socket being managed by a process that runs on a different node:
- If the node the socket runs on dies, you most probably want the process that handles it to die as well.
- Reciprocally, if the socket handler process dies, you’d loose all of its state and you’d probably would want to disconnect the device from the socket anyway.
To sum it up, I wanted to keep Syn v1 paradigm which is: every node is the authority for registering the processes that run on it. Load balancing is not part of this Process Registry, and it rather has to do with whatever causes processes to be spawned in the first place (i.e. external sockets get created on a specific node based on TCP Load Balancing mechanisms).
Finally, I wanted also to keep a full registry replica on every node of the cluster, so that Syn is optimized for read-intensive operations rather than write-intensive ones. This also seems to make sense, since if you do want to register a process by a name it’s probably because you want it to live long enough for it to have an alias and keep track of it system-wide.
Things I wanted to improve
1. Dynamic node membership
Contrary to what some forum users seem to think, Syn v1 does manage dynamic node addition pretty well. The only caveat is that Syn v1 needs to be initialized on a node after a node joins a cluster: your application needs to have the logic to connect to the other nodes, and only then issue a call to
syn:init/0 which initializes the mnesia tables on the node. This is because Syn v1 uses mnesia’s replication features, and mnesia enforces a specific instructions’ order when creating / adding a node to existing replicated tables.
For instance, the call to mnesia:change_config/2 to configure
extra_db_nodes needs to happen before the creation of a table (via mnesia:create_table/2) or the addition of a node to existing replicated tables (via mnesia:add_table_copy/3). If you’re curious about this process, you can head to syn_backbone.erl where you can get the gist on how Syn v1 sets up dynamic replication via mnesia.
This could also potentially lead to a rare race condition, which BTW never happened to me in all of these years. If two nodes of a cluster were to be started during a net split so that they do not see each other on boot, they would initialize mnesia separately which would result in them having their own version of Syn tables, with their own fingerprint, that cannot be merged afterwards (due to mnesia internals).
Finally, while mnesia does support node addition, it does not as easily support node removal. The equivalent command to remove a table copy, mnesia:del_table_copy/2, has some other caveats (for example it requires mnesia to be stopped). It is certainly doable though, but the question here would be when to do it in an automated way. For instance, a node could get notified that another node left the cluster, but should it then remove the missing node’s table copy? To do that, it would first have to stop mnesia – and lose all of the ram data – and then, what if the other node was simply down due to a temporary network failure that caused a net split? For these reasons, node removal was never implemented in Syn v1. If a node left, the local mnesia tables would still have it included in
extra_db_nodes that mnesia uses for replication, which is not a big deal.
Therefore, even if Syn v1 does support dynamic node addition, there are these caveats to keep in mind – and I can see why some users might have misinterpreted them.
2. Net Splits
Even though these events are rare, they do happen. Moreover, I felt that a mechanism that would solve those would also basically implement a fully working dynamic addition / removal of nodes.
Mnesia does not handle net splits very well. To give some support to this issue, Ulf Wiger experimented and created his unsplit framework (he talks about it in this post on the Erlang questions mailing list). However, I’ve had mixed results with the mechanism that he uses in there.
Basically, Ulf’s solution works by first subscribing to mnesia events. If mnesia triggers an inconsistent database event for a remote node (so it is running a partitioned network), the unsplit code will check whether the remote node is already part of mnesia’s local running nodes. If it is not, it will manually connect the local to the remote node using the undocumented mnesia_monitor:connect_nodes/1 method; while doing so, it will inject some custom resolution code that performs dirty read & write operations to the local and to the remote node’s mnesia tables. I took this mechanism and made a specialized and simplified version for Syn v1, and if you’re up for it you can check this implementation in syn_consistency.erl.
While it works nicely in a 2 nodes cluster, I have unfortunately encountered an inconsistent behavior in bigger clusters. In a cluster of 3 nodes I would see the inconsistent database events triggered correctly by mnesia, but unfortunately mnesia would randomly consider the remote node that triggered the event as already part of its local running nodes. Thus, the resolution part of the code that would perform the read & write operations and solve the split brain situation wouldn’t get a chance to run. My hunch is that mnesia is able to reconnect to parted nodes in a way that may happen before all of this mechanism gets the chance to be called, especially in partial net splits scenarios. I’ve also tried forcing the resolution code to run without the checks, but I got the error
not_merged as the result of the merge fun in
mnesia_controller:connect_nodes/1, more precisely here.
I don’t know whether this incapacity of randomly solving net splits in bigger clusters results from using mnesia’s in a non-intended way by taking advantage of Ulf’s mechanism (it seems that Ulf wanted to test it with more than 2 nodes himself), or if I might have missed something in my implementation version. That said, handling net splits is not an immediate task especially if you’re using a library that is not meant to do so. The feeling is that mnesia replication mechanisms are not intended to handle them, and hacking your way through might have unexpected results.
3. Behavior and customization
Syn v1 allows to register a process with one name at a time, a behavior consistent with Erlang’s global module. However, I felt this as a limitation and I wanted to allow multi-aliases process registration.
Also, I wanted a more clean approach to support customization callbacks, i.e. a single callback module with its
syn_event_handler behavior that would be triggered depending on a developer’s choices.
Finally, in case of registry conflicts during a net split resolution (i.e. when two processes have registered the same name on different nodes during the net split), Syn v1 decides which process to keep and which one to discard. I wanted to provide developers with the ability of defining their own conflict resolution method. The developers could, for instance, save vector clocks data into the meta data of each process and use those to choose which process to keep in case of conflict.
Given all of the above, the choices were simple and clear – and so the implementation.
Generic Register / Unregister operations’ flow
- When a registration request comes in for a Name and Pid, this request is routed to the node that the Pid is running on. In most cases, a registration request will be done from a process itself, which means that communication stays on the same node.
- Every registration request is treated by a single (gen server) process (the registry process) on every node. This registry process starts monitoring the newly registered process, and also guarantees that registration / unregistration requests are necessarily consistent – since there’s a per-node single registry process authority that sequentially treats them.
- The registry process writes to a
local (not replicated) mnesia table(Edit in v2.1) local ETS tables the Name and related Pid. This is an in-memory only table that gets created on application start and killed on application stop. I still use mnesia only because of its secondary index feature, as I need to be able to search for table entries both by Name and by Pid.
- The registry process then sends the registration / unregistration information to all of the other nodes in the cluster. This is not done by sending a message to the registry processes of the other nodes, rather it issues a remote procedure call (RPC) that directly writes in the local ETS tables of the remote nodes. This allows the other registry processes to be free from all intra-nodes syncing operations.
- Edit in v2.0.1 —>
- When a node receives a registration information from another node, it checks locally whether there’s a name conflict. If one is found, the node will try to gain a global lock for the conflicting name to run a specific portion of code that merges the conflicting data, since other registry processes might see this conflict as well.
- The registry process that succeeds in gaining the lock will compare the received registration data and will resolve the conflict between itself and the node that sent the registration information.
- Once done, the registry process will free the lock, and all the other nodes that eventually experienced the same name conflict will resolve the same conflict.
- When a node joins a cluster, all the registry processes of the nodes in the cluster receive a NODE UP event for that node. Simultaneously, the joining node’s registry process receives a NODE UP event for all of the existing nodes in the cluster.
- Every registry process that receives a NODE UP event for a remote node will try to gain a global lock to run a specific portion of code that deals with merging remote data.
- The registry process that succeeds in gaining the lock will issue a RPC on the remote node and request the registry data for all of the processes that run on it, for which the remote node is the authority. It will then write this information on its local ETS tables. If this happens after a net split, some naming conflicts might happen at this moment, and a choice on which process to keep will be made depending on the specified logic. By default, Syn will keep the local process, kill the remote process and remove the latter from the ETS tables on the remote node (via a RPC).
- Once done, the registry process will free the lock. It will not send its data, as it will be requested from the other registry processes when they get their turn to grab the lock.
- The code for all of this at the time of writing can be seen here.
- When a node leaves a cluster (voluntarily, because it crashed or because of a net split), all the registry processes of the nodes in the cluster receive a NODE DOWN event for that node.
- The registry processes that receive this event proceed to remove from their local mnesia tables all of the processes that ran on the disconnected node.
- This works well also in context of partial net splits (
A <--> B <--> Cwhere node A can see B, B can see A and C, and C can only see B), since every node keeps locally only the information of the nodes it can see.
These are the results of the rewrite.
- Addition & removal of nodes in a cluster is done in a completely dynamic and transparent way, with no caveats.
- Automated repairing from net splits has now been taken to a new level. You can check the existing test suites to see what is being covered.
- Finally, there now is a single callback module with the
syn_event_handlerbehavior, and the ability for the developer to use a custom function to resolve naming conflicts after net splits.
There still might be corner cases that I haven’t considered in terms of consistency, if some arise I will do my best to tackle any of those in future improvements of Syn.