Unbalanced trees and the usefulness of trivial

Late last night I was walking home through the empty city. It was dark and very cold.

I remember I was thinking about the redundancy. Then I thought about teaching binary trees and how to balance then. My imaginary student asked me laughingly if unbalanced trees are of any use.

I answered him – mentally of course – that we can suppose we are trying to express a sequence of things where some things appear more often than the others as shortly as possible. Text for example.

You can think we’d need eight bits for every character, and when the text is n characters long, it will take n * 8 bits. However, if the text consists mostly of a relatively small set of characters (‘a’, ‘t’, whitespace, …) while some others are rare, it would be smart to give these common characters a shorter code than the others. While we don’t want to use bits to denote the end of a character or a beginning of the other, we need a way to tell which string of bits is a character and which isn’t.

This is simple: make it so that no character is a prefix of another. However, it might be hard to see readily how to attain this, but if we think it as a binary tree where each character is a leaf node and bits as instructions to go through the tree from root on, like “0 stands for turn left and 1 for turn right”.

Now the most common character might be ’00’ (left left), the second ’01’ (left right), the third ‘101’, (right left right) where each code leads to an unambiguous leaf node and no code is a prefix of another.

The less evenly the characters are distributed, the less balanced the tree should be. In some extreme cases it might be even degenerated into a list (0, 10, 110, 1110, etc.). Of course the tree hasn’t to be uniform. It the frequqncy distribution reminds of, say, power law, we can give the most common atoms very short codes while the long tail might be a pretty much balanced subtree on a right.

Of course this all is pretty much trivial to everyone who knows anything about data compression and probably too abstract to most who don’t, so it isn’t something you’d tell to your mates in pub. However, it was very delightful to come up with (while walking slightly bored in a cold weather) and I actually think the most important things to think about (both psychologically and pragmatically) are things that aren’t probably worth telling to anyone.

There are more important things than impressing your audience.

There are no comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: