We also mentioned that the channel module, in which callback functions such as
handle_out are defined, is an actual OTP GenServer. An instance of the channel module is started by the OTP framework when its
start_link function is called.
It possible for the developer to store arbitrary state in the channle instance by using the
assign mechanism. Unfortunately, this state can be lost when the channel instance crashes.
In this post, we are going to see a way to conserve transient assign state between channel instance crashes, by transient we mean that the state should be retained between crashes but not after a regular shutdown.
First let’s get a sense of how the
assign mechanism works.
storing state in a channel instance
One of the characteristics of GenServers is that they are able to maintain state, consider the following listing:
You can see that all the functions in the previous GenServer accepts a
state argument, the state is passed to each defined callback function and must be included in the term it returns.
It is possible to create a mutated copy of the state inside the callback functions(
handle_call) and to return it instead.
When using phoenix channels, there is always a
socket argument passed to all of its callback functions(
socket argument contains state (usually a
Phoenix.Socket struct) needed by the phoenix framework to manage the execution of the channel instance so usually we don’t mess around with its content.
Phoenix.Socket struct contains the
assigns field which is a map that contain arbitrary erlang terms, consider the following listing:
assign function we can get a new
Phoenix.Socket copy with a key-value pair inserted in the
assigns field(in the
join function) which can be later accessed directly in order to read the stored state(in the
channel instances can crash
When developing in erlang/elixir, we tend to follow the let it crash principle. This means that we do not code defensively and we expect our OTP behavior to be restarted via a Supervisor when crashes happen.
When the state maintained by an OTP behavior is important it needs to be recovered when the behavior instance restarts.
Functions defined in channel modules can cause channel instance crashes and if the state stored under
socket.assigns should persist crashes then we need a mechanism to restore it when the channel instance gets restarted.
There is one thing to note about channel instances: they are not supervised. In fact, when a channel instance crashes it is the responsibility of the channel client to reestablish the two way communication causing a new channel instance to be created.
In OTP, there is no automatic way to recover GenServers state even when the instance is supervised. It is the responsibility of the developer to handle state recovery.
conserving state with join, terminate and ets tables
One way to conserve channel instance state between crashes is to backup the important state somewhere when the channel crashes and to fetch it and pass it to the new channel instance when it is created.
The previous assumes that we have a callback that is executed when the channel instance crashes and a global storage space. Fortunately, in OTP we have both.
the terminate callback
OTP GenServers modules can define the
terminate/2 callback that is called when an instance is about to stop and that takes two arguments:
- reason which is usually a tuple representing the error that happened
- state which is the actual instance state when the crash happened
terminate callback allows us to basically store the important state when the
reason is an error, or to perform some other processing when the
reason is a normal shutdown(when the client disconnects explicitly for example).
Be advised that the
terminate callback is not guaranteed to execute in some circumstances, for example when the instance is stopped with a brutal kill. It is usually discouraged to handle resource de-allocation inside the
terminate callback for the previous reason. I encourage you to read the
terminate’s function description in the elixir documentation.
In our case, we are handling transient state conservation i.e. state that should persist unexpected crashes but that is discarded when a normal shutdown happens so we assume it is OK to just use the
Ets tables can be thought of as a key-value store local to an erlang VM, consider the following snippet:
Processes can create and delete tables in which they can store and retrieve key-value pairs and which can be accessed by multiple processes.
- Access to ets tables can be restricted
- Keys and values can be any erlang term
- We can choose the underlying data structure that a table uses
Consider the following channel module:
join function verifies whether a state as been backed up in the
:holder table, if it is the case the channel is initialized with the backed up state otherwise, it is initialized with the default state(
). This allows us to recover previous state.
:holder table is created under
ChanState.start i.e. when the OTP application starts.
terminate function when called in a crash situation, will backup the state in the
:holder table. It will cleanup the relevant state from the
:ets table in the event of a normal shutdown. This allows us to backup the important state.
In the chan_state, you will found a test demonstrating the state conservation under
test/channels/stateful_channel_test.exs, check it out!
There are other methods to conserve state that exists. We can for example create a companion process for each channel instance in which it backs up and recovers its state.
It is also possible to return the state of a crashed GenServer to a Supervisor and to build proper recovery in the Supervisor it self.
In our current project, we decided to stick with the ets tables solution because it is in our opinion the simplest thing that could possibly work.