Click here to advance to the next essay.

School of Computing  |  Napier  |  Cisco  |  CNDS  |  NOS  |  Code Snippets |  Old page


Most transmission channels have a restricted bandwidth, either because of the limitations of the channel or because the bandwidth is shared between many users. Many forms of data have redundancy in it, thus if the redundant information was extracted the transmitted data would make better use of the bandwidth. This extraction of the information is normally achieved with compression.

Our cat Cookie - see below, and you'll see what she looks like with only four colorsWhen compressing data it is important to take into account three important characteristics: whether it is possible to lose elements of the data; the type of data that is being compressed; and how it is interpreted by the user. These three factors are normally interrelated. For example type of data normally defines whether it is possible to lose elements of the data. For computer data, normally if a single bit is lost, then all of the data is corrupted, and cannot be recovered. In an image, it is normally possible to lose data, but it could still contain the required information. For example, when we look at a photograph of someone, do we really care that every single pixel displays the correct color. In most cases the eye is much more sensitive to changes in brightness, and less sensitive to changes in color. Thus in compressing an image, we could actually reduce the amount of data in an image, by actually losing some of the original data. Compression which loses some of the elements of the data is defined as lossy compression, where lossless compression defines compression which does not lose any of the original data. Video and sound images are normally compressed with a lossy compression whereas computer-type data has a lossless compression. The basic definitions are:

Lossless compression. Where the data, once uncompressed, will be identical to the original uncompressed data. This will obviously be the case with computer-type data, such as data files, computer programs, and so on, as any loss of data may cause the file to be corrupted.

Lossy compression. Where the data, once uncompressed, cannot be fully recovered. It normally involves analyzing the data and determining which data has little effect on the perceived information. For example, there is little difference, to the eye, between an image with 16.7 million colors (24-bit color information) and an image stored with 1024 colors (10-bit color information), but the storage will be reduced to 41.67% (many computer systems cannot even display 24-color information in certain resolutions). Compression of an image might also be achieved by reducing the resolution of the image. Again, the human eye might compensate for the loss of resolution (and the eye might never require the high-resolution, if the image is view from a distance).

For example the following shows four pictures taken with 16.7 million colors, 32 colors, 8 colors and 4 colors, respectively:

16.7 colors (JPEG format) - 3.62KB32 colors (GIF format) - 2.42KB 8 colors (GIF format) - 1.10KB4 colors (GIF format) - 564B

Even when the colors have been reduced, you can still see that the image is of a cat. The sizes of the files produced are 3.62KB, 2.42KB, 1.10KB and 562B. It can be seen that reducing the number of colors in the image has a considerable effect on the file size (and the bandwidth used, of course, if the image is being sent over a communications channel). The reason that the file sizes reduce is that the file are compressed using an algorithm which detects long sequences of the same color, and replaces it with a special code. Thus the fewer the colors, the more likely these sequences will occur, thus the smaller the file size will be.

It can only be seen that once the number of colors have been reduced, it will not be possible to recover the original image with all its colors.

Apart from lossy and lossless compression, an important parameter is the encoding of the data. This is normally classified into two main areas:

Entropy coding

Entropy coding does not take into account any of the characteristics of the data and treats all the bits in the same way. As it does not know which parts of the data can be lost, it produces lossless coding. As an example, imagine that you had a system which received results from sports matches, around the world. With this there would be no way of knowing the scores that would be expected, and the maximum size of the name that is stored for the result. For example a UK soccer match might have a result of Mulchester 3 Madchester 2, and the results of an American football match might be Smellmore Wanderers 60 Drinksome Wanderers 23. Thus, all the information would have to be stored as characters, where each character is stored in an ASCII format (such as with 8 bits for each character), with some way to delimit the end of the score. We could compress this, though, even not knowing the content of the result. For example there are repeated sequences in the scores: chester and Wanderers. These we could store each of these once, and then just make reference to it in some way. This technique of finding repeated sequences is a typical one in compression, and is used in ZIP compression (which is a general-purpose compression technique).

Typically entropy coding uses:

Statistical encoding - where the coding analyses the statistical pattern of the data. For example if a source of text contains many more 'e' characters than 'z' characters then the character 'e' could be coded with very few bits and the character 'z' with many bits.

Suppressing repetitive sequences - many sources of information contain large amounts of repetitive data. For example this page contains large amounts of 'white space'. If the image of this page were to be stored, a special character sequence could represent long runs of 'white space'.

Source encoding.
Source encoding normally takes into account characteristics of the information. For example images normally contain many repetitive sequences, such as common pixel colors in neighboring pixels. This can be encoded as a special coding sequence. In video pictures, also, there are very few changes between one frame and the next. Thus typically the data encoded only stores the changes from one frame to the next. In our example of sports matches we could identify the type of sport, and then compress the data based on this. For example in a UK soccer match we could have a table of all the names of the sports clubs that we were storing the results for. Thus, for example, we may have 256 professional clubs, which would require 8 bits to store a reference value for each of these (number 0 to 256). So that 0 could be Mudchester, 1 could be Malchester, 2 could be Readyever United, and so on. We can also compress the scores, as we know that the goals scored for a team is very unlikely to be more than 31, thus we could use 5 bits to encode the score (0 to 31). Thus each score could be sent as an 8-bit reference for the home team, followed by a 5-bit value for their score, followed by an 8-bit reference for the away team, and finally by a 5-bit value for their score. Thus we only need 26 bits to store each of the scores, which is a large saving.

In fact we could compress the data even more, as we know that the most probable goals scored will be 0, 1 or 2. Thus we could store zero goals as 00 (in binary), one goal as 01 (binary), two goals as 10 (in binary), three goals as 110 (in binary), four goals as 1110 (in binary), and so on. Thus, as most scores will only have between zero and two goals scored for each team, the average number of bit will be just over 2 bits. For example if the scores were: 0, 1, 0, 2, 3, 2, 2, 1 and 1 (which would be stored as 00, 01, 00, 10, 110, 10, 10, 01 and 01), this would require 19 bits, which is an average of 2.11 bits. This is a reduction from 5 bits for each score. This technique is known as Huffman coding.







My favorite Edinburgh sites (these are real sites, and not WWW sites, of course) are:
Edinburgh Castle, especially looking up from Princes Street (or from upstairs in McDonalds - there are many McDonalds in the world where you can see such a beautiful backdrop).
[Live] [Again]
Charlton Hill and look down over Princes Street.
Arthur's Seat, from anywhere, especially coming in Edinburgh from the South.
Craiglockhart Campus, where the School of Computing are currently based, but will be moving soon (Boo, hoo!).
Queen's Street, which is beautiful on a hot Summer's day.
Tynecastle Park, as it's such an exciting place on match day, especially walking back up Gorgie Road after the match.
North Berwick Beach, which is quite a long way out of Edinburgh, but it's such a beautiful place, especially when the sun shines (and the haar doesn't come in from the East Coast).
Gullane Beach, which is heaven-on-earth (when it's warm and the sun is shining).
Forth Rail Bridge, which is always a great one to see when you come back into Edinburgh from the North.
Waveley Station, this was the first place that I ever saw in Edinburgh, and the place still gives me tingles up the back of neck.
Other notable mentions:
Nicolson Street, which is such a vibrant place, and where I've lived a few times.
The Meadows, which, as a student, I tramped across every day, and was amazing with its changing faces.
The Royal Mile, which looks great, no matter which season. I especially like it in the Wintertime. [Link]
Princes Street, which recently has had to compete with lots of out-of-town shopping, but hopefully it will survive as one of the most beautiful shopping streets in the world. At New Year it has the most amazing atmosphere, as it's full of people from every part of the world, having fun! [Link]



School of Computing  |  Napier  |  Cisco  |  CNDS  |  NOS  |  Code Snippets |  Old page

Page maintained by bill