variable-size codes consider the four-symbol a1, a2, a3, and a4. if they appear in our data strings with equal probabilities (= 0.25), then

Variable-Size Codes
Consider the four-symbol a1, a2, a3, and a4. If they appear in our
data strings with equal probabilities (= 0.25), then the entropy of
the data is −4(0.25 log2 0.25) = 2. Two is the smallest number of bits
needed, on the average, to represent each symbol in this case. We can
simply assign our symbols the four 2-bit codes 00, 01, 10, and 11.
Since the probabilities are equal, the redundancy is zero and the data
cannot be compressed below 2 bits/symbol. Next, consider the case
where the four symbols occur with different probabilities as shown in
Table 1, where a1 appears in the data (on average) about half the
time, a2 and a3 have equal probabilities, and a4 is rare. In this
case, the data has entropy −(0.49 log2 0.49+0.25 log2 0.25+0.25 log2 0.25+0.01
log2 0.01) ≈ −(−0.050−0.5−0.5− 0.066) = 1.57. The smallest number of
bits needed, on average, to represent each symbol has dropped to 1.57.
Symbol Prob. Code1 Code2
a1 .49 1 1
a2 .25 01 01
a3 .25 010 000
a4 .01 001 001
Table 1: Variable-Size Codes.
If we again assign our symbols the four 2-bit codes 00, 01, 10, and
11, the redundancy would be R = -1.57 + log24 = 0.43. This suggests
assigning variable-size codes to the symbols. Code1 of Table is
designed such that the most common symbol, a1, is assigned the
shortest code. When long data strings are transmitted using Code1, the
average size (the number of bits per symbol) is 1 * 0.49 + 2 * 0.25 +
3 * 0.25 + 3 * 0.01 = 1.77, which is very close to the minimum. The
redundancy in this case is R = 1.77 -1.57 = 0.2 bits per symbol. An
interesting example is the 20-symbol string a1a3a2a1a3a3a4a2a1a1a2a2a1a1a3a1a1a2a3a1,
where the four symbols occur with (approximately) the right
frequencies. Encoding this string with Code1 yields the 37 bits:
1|010|01|1|010|010|001|01|1|1|01|01|1|1|010|1|1|01|010|1
(Without the vertical bars). Using 37 bits to encode 20 symbols yields
an average size of 1.85 bits/symbol, not far from the calculated
average size. (The reader should bear in mind that our examples are
short. To get results close to the best that’s theoretically possible,
an input stream with at least thousands of symbols is needed.)
However, when we try to decode the binary string above, it becomes
obvious that Code1 is bad. The first bit is 1, and since only a1 is
assigned this code, it (a1) must be the first symbol. The next bit is
0, but the codes of a2, a3, and a4 all start with a 0, so the decoder
has to read the next bit. It is 1, but the codes of both a2 and a3
start with 01. The decoder does not know whether to decode the string
as 1|010|01 . . ., which is a1a3a2 . . ., or as 1|01|001 . . ., which
is a1a2a4 . . .. Code1 is thus ambiguous. In contrast, Code2, which
has the same average size as Code1, can be decoded unambiguously.
The property of Code2 that makes it so much better than Code1 is
called the prefix property. This property requires that once a certain
bit pattern has been assigned as the code of a symbol, no other codes
should start with that pattern (the pattern cannot be the prefix of
any other code). Once the string “1” was assigned as the code of a1,
no other codes could start with 1 (i.e., they all had to start with
0). Once “01” was assigned as the code of a2, no other codes could
start with 01. This is why the codes of a3 and a4 had to start with
00. Naturally, they became 000 and 001.
Following two principles thus does designing variable-size codes: (1)
Assign short codes to the more frequent symbols and (2) obey the
prefix property. Following these principles produces short,
unambiguous codes, but not necessarily the best (i.e., shortest) ones.
In addition to these principles, an algorithm is needed that always
produces a set of shortest codes (ones with minimum average size). The
only input to such an algorithm is the frequencies (or the
probabilities) of the symbols of the alphabet.
Prefix Codes
A prefix code is a variable-size code that satisfies the prefix property.
The binary representation of the integers does not satisfy the prefix
property. Another disadvantage of this representation is that the size
n of the set of integers has to be known in advance, since it
determines the code size, which is . In some applications, a
prefix code is required to code a set of integers whose size is not
known in advance. Several such codes are presented here.
Four more prefix codes are described in this section. We use B(n) to
denote the binary representation of integer n. Thus |B(n)| is the
length, in bits, of this representation. We also use B-(n) to denote B(n)
without its most significant bit (which is always 1).
Code C1 is made of two parts. To code the positive integer n we first
generate the unary code of |B(n)| (the size of the binary
representation of n), then append B-(n) to it. An example is n = 16 =
100002. The size of B(16) is 5, so we start with the unary code 11110
(or 00001) and append B-(16) = 0000. The complete code is thus 11110|0000
(or 00001|0000). Another example is n = 5 = 1012 whose code is 110|01.
The length of C1(n) is 2 + 1 bits.
Code C2 is a rearrangement of C1 where each of the 1+ bits of
the first part (the unary code) of C1 is followed by one of the bits
of the second part. Thus code C2(16) = 101010100 and C2(5) = 10110.
Code C3 starts with |B(n)| coded in C2, followed by B-(n). Thus 16 is
coded as C2(5) = 10110 followed by
B- (16) = 0000, and 5 is coded as code C2(3) = 110 followed by B-(5) =
01. The size of C3(n) is
Code C4 consists of several parts. We start with B(n). To the left of
this we write the binary representation of |B(n)| − 1 (the length of n,
minus 1). This continues recursively, until a 2-bit number is written.
A zero is then added to the right of the entire number, to make it
decodable. To encode 16, we start with 10000, add |B(16)| − 1 = 4 =
100 to the left, then |B(4)| − 1 = 2 = 10
to the left of that and finally, a zero on the right. The result is 10|100|10000|0.
To encode 5, we start with 101, add |B(5)| −1 = 2 = 10 to the left,
and a zero on the right. The result is 10|101|0.

5

  • EUROPOS ŠALIŲ SUSITARIMAS DĖL KELIŲ TRANSPORTO PRIEMONIŲ EKIPAŽŲ VAŽINĖJANČIŲ
  • ОПЕРАТИВЕН ПЛАН ЗА НАСТАВЕН ЧАС 33 ВИДОВИ НА ЦИРКУЛАЦИЈА
  • SPATIAL SENSE AND GEOMETRY HSPA PREP WITH SOLUTIONS NAME
  • UPOVEXNBRD DRAFT 6 PÁGINA 5 S UPOVEXNBRD DRAFT 6
  • PEMERINTAH KABUPATEN BOMBANA PERATURAN DAERAH KABUPATEN BOMBANA NOMOR
  • PROTOCOL FOR REPORTING ENVIRONMENTAL BREACHES IN PORTS 1 OBJECTIVE
  • NA TEMELJU ČLANKA 54 STAVKA 2 ZAKONA O PRORAČUNU
  • 3 BAISSE DE LA SCOLARISATION DES ENFANTS ÂGÉS
  • POZIV ZA SUDJELOVANJE NA ONLINE GLUMAČKU RADIONICU – TON?
  • INVITATION PASSIVE SAMPLING WORKSHOP (910 NOVEMBER 2011 UTRECHT THE
  • 12TH FEBRUARY 2014 DEAR COLLEAGUE RE COMMUNITY PHARMACY EVENING
  • REJESTR KLUBÓW INTEGRACJI SPOŁECZNEJ NUMER KOLEJNY WPISU DATA
  • COLLOQUE « LA PRISE EN CHARGE DES ENFANTS PLACÉS
  • “THE NAMES” BY BILLY COLLINS DEFINE THE UPPERCASE UNDERLINED
  • REPUBLIKA HRVATSKA SISAČKOMOSLAVAČKA ŽUPANIJA G R A
  • INTEGRATED CATCHMENT MANAGEMENT REDISCOVERED AN ESSENTIAL TOOL FOR A
  • BUMERANGS QUE RETORNEN PER JORDI BADOSA ELS PRIMERS BUMERANGS
  • VARIOS EN WWWEURONEWSNET SE PUEDEN VER LAS NOTICIAS SUBTITULADAS
  • EK3 ÖZGEÇMİŞ 1 ADI SOYADI DENIZ EKINCI 2 DOĞUM
  • LANDESKUNDE BIBLIOGRAPHY SE PREPARATION COURSE GENERAL BOOKS ON UK
  • PUT YOUR ADDRESS IN THE UK HERE VISA SECTION
  • REPUBLIKA HRVATSKA  GRAD VINKOVCI UPRAVNI ODJEL ZA GOSPODARENJE
  • 17B UDÁLOSTI LET 1938 – 1939 U NÁS V
  • CONSELLERÍA DE EDUCACIÓN E ORDENACIÓN UNIVERSITARIA CEIP DE BORMOIOAGUALADA
  • INSTRUCTIVO PARA LA ELIMINACION DE CHEQUES OBJETIVO REALIZAR LA
  • LOCAL GOVERNMENT (MISCELLANEOUS PROVISIONS) (NI) ORDER 1985 APPLICATION FOR
  • NAMED GP FOR PATIENTS AGED 75 AND OVER
  • DUCT INSULATION – QUIETR® SPIRAL DUCT LINER GENERAL SPECIFICATION
  • EL SUJETO PRUEBA DE LA CONCORDANCIA EXPRESO
  • FACULTY YOU SHOULD NOW BE ABLE TO ACCESS THE