Extending Ceras with a custom formatter

Usually Ceras automatically supports any type automatically through dynamically generated formatters like EnumFormatter<> for enums or DynamicObjectFormatter<> for any user provided classes and structs.

But there are scenarios where you may want to create a more specialized formatters.
Optimization could be one reason, or maybe you just want more explicit control over types that you can not change (adding Ignore, Include, or MemberConfig attributes).

Lets start with a really simple example:
By default Ceras uses 7bit encoding which often drastically reduces the output size.
But what if you always want to use a standard fixed 4-byte encoding instead?

(Click to expand) Reasons why you might want to do that

This example is somewhat contreived for the sake of simplicity.

Using the normal “dumb” 4-byte encoding (and thereby skipping zig-zag encoding and varint encoding) would give you a small to moderate speed-up!
Assuming your data mostly consists of ints.

And if it’s really performance we’re concerned about, then maybe we should also ensure the data stays 4-byte aligned… but then how would we do that if we’re not dealing with an array? Its certainly possible to do, but we’d need to go much deeper…

But that’d definitely bring us out of the “simple example” zone 😛

Our own formatter

Whenever Ceras wants to read or write any object it uses an IFormatter<T> where T is the type that is getting written/read.
Ceras has hand-written formatters for all commonly used .NET types.
For user types Ceras generates new code at runtime (using the DynamicObjectFormatter). Same goes for enums, lists, dictionaries, collections in general, all generic types, …

So formatters are the basic building blocks, every type has its own specialized formatter.
The only thing “below” a formatter are a few pre-defined methods in SerializerBinary, where the “primitives” are handled:
All numeric primitives like byte, short, int, … and string.
While implementing your own formatters you probably want to use those methods very often as they are optimized pretty well and also handle things like size-checks for you (checking if the target array has enough space, resizing it if not, …).

Of course you don’t have to use them. You can just directly mess around with the given byte-array as well if that’s your style 😛

So in order for us to implement our own formatter we just create a new class that implements this interface.
IFormatter has only two methods:

Serialize

This method is called when an existing value/object has to be serialized.
The parameters are pretty straightforward:

  • buffer is the target byte-array where we should write our data to.
    Why is it passed by ref?
    That’s because the array might not be large enough for whatever we want to write into it.
    So it’s our responsibility to check if there is still enough space left.
    If there’s not enough space, we would allocate a new array and copy the existing content over.
    That means any formatter can swap out the array for a new one, so it has to be passed by reference.

  • offset is just the current “write cursor”, so it tells you where in the buffer to start writing.
    That means buffer[offset] will be the first unused byte.
    Once you have written your data, you have to adjust offset so the next formatter knows where to continue (and won’t overwrite what you just wrote)

  • value is the value that’s being written. This one should be obvious.

Deserialize

The deserialize method is mostly the same as Serialize, except that now offset is used as the read-cursor and value is passed by ref!
So why is value passed by ref?
There are actually multiple reasons:

  • It’s much faster when T is a large value-type
  • If the user has given us an already existing object to populate, we’ll receive the already existing value!
    That’s extremely useful for collections, and reference types in general. For example for lists we could just clear and add the elements to the existing list which improves performance quite a lot.
  • Reference handling. This point is pretty involved so I won’t go into it here.
    Essentially it enables Ceras to handle references (and all the many problems that come with them) for you.

The int formatter

This is how the formatter would look like:

class MyIntFormatter : IFormatter<int>
{
	public void Serialize(ref byte[] buffer, ref int offset, int value)
	{
		SerializerBinary.WriteInt32Fixed(ref buffer, ref offset, value);
	}

	public void Deserialize(byte[] buffer, ref int offset, ref int value)
	{
		value = SerializerBinary.ReadInt32Fixed(buffer, ref offset);
	}
}

Normally Ceras uses its built-in Int32Formatter which calls SerializerBinary.WriteInt32 (which does the zigzag and varint encoding).
And in our custom version we swapped the WriteInt32 for WriteInt32Fixed which represents the “classic” encoding that just writes the 4 bytes of the int directly into the buffer.

Ok, this example was pretty boring, but it should have given you a pretty clear picture of how to write a formatter.

One last thing before we move on: How would we actually use that formatter?

Making Ceras use our formatter

That’s simple, we just add a handler to OnResolveFormatter.
Ceras will go through all the handlers in there whenever it looks for the formater of some type
(First argument is the CerasSerializer, second argument is the type Ceras wants to read/write)

SerializerConfig config = new SerializerConfig();
config.OnResolveFormatter.Add((s, t) =>
{
	if (t == typeof(int))
		return new MyIntFormatter();
	return null; // null means "keep searching"
});

No need to cache the formatter instance. Ceras will never try to resolve a formatter for the same type twice, caching is done automatically.

With that done we can move on to more interesting stuff.

A custom formatter for a user-type

Let’s take the classic Person this time.
Of course Ceras would usually just auto-generate a dynamic formatter (which in some cases might even be faster than a handwritten one!) for this type, but for the sake of this guide we’ll try implementing an IFormatter<Person> ourselves!

This is what we’re dealing with:

class Person
{
	public string Name;
	public int Health;
	public Person BestFriend;
}

This time we’ve got a problem. If we’d want to write a formatter for our Person class, we’d eventually arrive at the 3rd field (BestFriend)…

In Serialize, when confronted with how to serialize value.BestFriend you might be tempted to just recursively call yourself (Serialize) again.
Which would be fine in many cases, but what if someone is their own best friend? 😀
Or more realistically: What if two people are each others best friends? (this situation is known as a “reference loop”)
In that case we’d end up in an infinite loop, and eventually our program would crash with a (StackOverflowException).

So how would we do this?
It’s actually surprisingly simple!

Ceras can actually do all the work for us completely automatically.
Let’s take a look at how that Person formatter would look like.

Automatic Dependencies (Ceras’ dependency injection)

class MyCustomPersonFormatter : IFormatter<Person>
{
	public IFormatter<Person> PersonFormatter;

	public void Serialize(ref byte[] buffer, ref int offset, Person value)
	{
		SerializerBinary.WriteString(ref buffer, ref offset, value.Name);
		SerializerBinary.WriteInt32(ref buffer, ref offset, value.Health);
		PersonFormatter.Serialize(ref buffer, ref offset, value.BestFriend);
	}

	public void Deserialize(byte[] buffer, ref int offset, ref Person value)
	{
		value.Name = SerializerBinary.ReadString(buffer, ref offset);
		value.Health = SerializerBinary.ReadInt32(buffer, ref offset);
		PersonFormatter.Deserialize(buffer, ref offset, ref value.BestFriend);
	}
}

The first thing that you’ll probably notice is the IFormatter<Person> PersonFormatter field, which also strangely doesn’t seem to get assigned from anywhere.

Just by simply having a public field of a type that Ceras knows about you can make use of Ceras’ “dependency injection system”.
Ceras checks your formatters and tries to assign the right objects into them.

The following field types can will be automatically injected:

  • Any IFormatter<> (this is what we’re using with IFormatter<Person>).
  • CerasSerializer, in case you ever need a reference to the serializer itself.
  • Or any explicit type that implements IFormatter<>, so for example MyCustomPersonFormatter.

So why are we using IFormatter<Person> instead of MyCustomPersonFormatter then?
In general you should use the IFormatter<> variant whenver possible.

That is because if you request a IFormatter<Person> Ceras will actually give you something other than your own MyCustomPersonFormatter!
It will actually put a ReferenceFormatter<Person> into that field!

This ReferenceFormatter<> is used for reference types (duh), and it deals with the problem we talked about earlier (reference loops).
Otherwise, if we would just defined the field like public MyCustomPersonFormatter PersonFormatter; we’d be back to square one, because when we get to .BestFriend we’d essentially just call our Serialize method recursively (which doesn’t work as we’ve already established).

Explaining how the ReferenceFormatter<> works and what magic it does to resolve this problem is a huge topic in itself, but if you’re interested you can either read the code or just drop me a message on discord!

And with that our MyCustomPersonFormatter is done. The built-in DynamicObjectFormatter<> (which Ceras would have used by default) would have generated a similar implementation.

Update

Since this blogpost was written a new feature was added:

You can now call CerasSerializer.AddFormatterConstructedType(...).
Usually Ceras creates instances of objects for you, but sometimes you don’t want that! (ImmutableCollections, Tuples, …)
In that case you’d just call the method above and Ceras will not construct an object before calling your formatter (so you get null / default(T)).

Extending Ceras with a custom formatter

Leave a Reply

Your email address will not be published.