A relatively simple method for compressing data works by creating a so-called Huffman tree for a file and using it to compress and decompress the data it contains. For most applications, binary Huffman trees are used (i.e., each node is either a leaf or has exactly two sub-nodes). One can, however, construct Huffman trees with an arbitrary number of sub-trees (i.e, ternary or, in general, N-ary trees).
A Huffman tree for a file containing Z different characters has Z leaves. The path from the root to a leaf that represents a certain character determines its encoding; each step towards the leaf determines an encoding character (which can be 0, 1,...,(N - 1)). By placing often-occurring characters closer to the root, and less often occurring characters further away from the root, the desirable compression is achieved. Strictly speaking, such a tree is a Huffman tree only if the resulting encoding takes the minimal number of N-ary symbols to encode the complete file.
For this problem, we only consider trees where each node is either an internal node, or a leaf encoding a character; there are no dangling leaves that do not encode for a character.
Answered by Anju Singh
at
7:53 PM on October 31, 2008