Easy network code with Ceras – [Part 2]

Continuing from Part 1 we’ll explore how the demo project works, and what things there are that could be optimized in a real project.

> Link to the demo project

So, we concluded that serialization is the main problem in trying to make some compfortable networking system.
Ceras helps us a lot by allowing us to easily send whole objects over TCP or UDP.

How does it work?

I’ll just talk about the parts relevant to networking.
There are many more useful features though that will get their own guide / tutorial.

Packet length as prefix

In the demo we’re using TCP, so we need to know where each packet begins and ends.
The most straightforward way is just writing the length of the packet, and then the packet itself.

You can see how this works in Ceras\Extensions\Extensions.cs in the demo project:

First we serialize the given object, then we use Ceras’ efficient integer encoding to only write as many bytes as needed (instead of always writing the full 4 bytes for an Int32). Finally we can just write the serialized objects bytes as they are.

On the receiving end we keep reading individual bytes until we can reconstruct the length.
Then we just read the remaining block and deserialize it.

Type Encoding

Since both sides (client and server) can just send whatever object they want at any time, the receiver always needs to know what kind of object it even is they’re getting.
That means the type has to be included in the packet somehow.
In the example we use Ceras in “fully automatic” mode, which is safe and always works, but we could use a more efficient option!

How does this automatic mode work?

The root-object (the object you’re calling .Serialize() on) is always forced to be object by the WriteToStream and ReadFromStream methods.
That’s done so Ceras will always write the type of the object that’s getting serialized.

Otherwise Ceras would do an automatic optimization: If the generic type matches the given object type exactly, no type is emitted.

What does that mean?

It means that Ceras gives you different results for those two things:

int x = 5;
var bytes = ceras.Serialize(x);
object x = 5;
var bytes = ceras.Serialize(x);

Left Example: we’re serializing an int, and the type of the variable is int as well.

Right Example: this time we’re serializing an object, which happens to contain an int, so if we’d deserialize the result again Ceras would obviously need to know how to even read the encoded data, so the type gets embedded.

The “automatic” part here is that Ceras will never embedd any type information unless it is absolutely needed.
So when is the type “absolutely needed”? Whenever the object-type doesn’t exactly match the variable-type. So whenever there’s any polymorphism or boxing going on.

Click to expand an additional example
class Example
{
	public int Number;
	public string Text;
}

When serializing an object of the type above Number and Text will never have their type embedded.
Just like in the first example ceras.Serialize(new Example()) would not embedd any type at all, since it is assumed that the deserialize call would look like ceras.Deserialize<Example>(bytes).
While ceras.Serialize<object>(new Example()) would force Ceras to embedd the root-type (Example) and after that no more types are needed (since by knowing Example we also know that an int and a string will follow, so their types are implicit / obvious).

Optimization: Always omit type information

Always sending the type sucks, is there no better way?
Yes, there are two ways we can do better!

  1. Keeping the type cache
    So when we’re serializing a List<Person> then Ceras will embedd the type for Person, but only once, since after that the type is known. Further Person instances in the serialization will automatically use the already existing type definition, which saves a ton of space.
    Now the big idea is to keep this type-cache even after the Serialize call is finished.
    That would make it so Ceras essentially “learns” types and the next serialization would keep reusing the previously learned types.
    So only on the first occurance of a new type Ceras would emit the type information and from then on we’d never send the lengthy type-name over the network!

    Keep in mind that this can only work when client and server each have two serializers each (one for sending, one for receiving).
    Why?
    Well firstly because you want to receive and send at the same time (so you need two serializer instances anyway),
    and secondly to optimize even further: Imagine the client sending something like ClientLogin and the server sending ServerLoginResponse then we wouldn’t want the serializers in Server->Client direction to “learn” the ClientLogin type and reserve space for it, that’d be wasteful.

  2. “KnownTypes”
    Keeping the type cache is a good idea when you’re in a situation where you don’t even know what you’ll send beforehand (or you’re just busy experimenting and just want things to work fast and not invest time in in-depth optimization yet).
    We get a huge advantage of never sending any type more than once, but we’re still sending it once!
    We can do better.
    You can add your types to the config.KnownTypes collection, making it as if you had already serialized the type previously.
    So we’ve reached maximum optimization: All types are encoded as a single byte, there will never be a single type-name written by Ceras!
    (Unless you forget to add some type, in which case it will be sent once)

Recycle serialization buffers

This one is pretty easy.
In WriteToStream we’re already recycling the two buffers we use: _lengthPrefixBuffer and _streamBuffer.
The demo project has client and server in one process (so they’re sharing the static variable), that’s why it is not easily possible to share buffers in ReadFromStream as well. In a real-world project that wouldn’t be any issue though.

This point is just a remember to recycle read and write buffers whenever possible, it’s super easy to do and makes a big difference.

Next steps

Splitting

The first of those “next steps” would be to investigate a new problem that is very rarely even talked about: object splitting.

So normally when you give an object to a serializer it “drags in” everything.
It follows all the references, collecting all the objects into one huge “object graph”.
Or in other words the “sub-objects” or “references” also get serialized.

And usually that’s exactly what you want! (Actually it’s pretty rare that you’d want anything else.)

But sometimes it would be super useful to serialize objects by some ID instead!
Just like how database objects reference other objects through a “foreign key”.

As far as I know Ceras is the only serializer in existence that provides a way to do that.

For example in a game it would be awesome if we could send objects that reference other objects that already exist on the other side!
Or putting different object types into their own files.

Other Improvements

Of course the example is simple (otherwise it would suck as an example, right? :P), so there’s tons of other problems.
Or rather, tons of things that could be improved.
Over time I’ll eventually go over each of those in detail, but if you are interested then I’d suggest taking a look at the “code tutorial” in the Ceras repo here: https://github.com/rikimaru0345/Ceras/blob/master/samples/LiveTesting/Tutorial.cs
It will show you most of the things that can be done…

There are arguments to be made for specific performance improvements (depending on what exactly it is you’re actually doing), reliability, multi-threading concerns, and on so on…

But all of that is something for another guide / blog post…

If you have any questions, suggestions, feedback or whatever, I’d love to answer them.
Just send me a message (preferably on discord, link at the top of the Ceras main github page), or open an issue on the Ceras github page.

Easy network code with Ceras – [Part 2]

Leave a Reply

Your email address will not be published.