Security Now 205
Topic: Lempel-Ziv Data Compression Algorithim
Recorded: July 1, 2009
Published: July 16, 2009
Security Now 205: Lempel-Ziv
An examination of Lempel-Ziv data compression, one of the most prevalent computer algorithm inventions in history.
News & Errata
02:35 - 03:10
- No news or errata this week as the show was recorded in advance with Leo being in China
- Next week will be a mega security news update though
05:55 - 08:36 Jacob Janzen (Unknown)
- 4 years ago partition magic failed whilst working on his drive and he couldn't access his data. His brother told him to buy Spinrite to fix the drive and it did. Also Canadian Armed Forces used SpinRite to save a hard disk drive in the field.
Lempel-Ziv Data Compression Algorithim
12:24 - 44:07
12:24 - 16:20
- MP3 and JPEG are lossy compression meaning that when you decompress it you do not get an exact copy of the original but it looks or sounds the same.
- With lossless compression you get an exact copy when you decompress
- A test for the quality of pseudo random data is how compressible it is, because you cant compress truely random data
16:21 - 19:04
- Run-length compression is the simplest type of compression it is what fax used in the early days
- If there was a line of black that went all the way across the page, instead of sending individual bytes of black code it would send the data once and then say how many times it needs to be repeated
- This is only useful in certain situations
19:05 - 25:40
- The next fundamental technology was invented by David Huffman
- He realised that one way to represent data in a smaller space was to represent those things that occur most often with a shorter code, and the things that occur less often with a longer code
- In 1840 Samuel Morse also used this idea he had a telegraph key and a wire heading out West somewhere, and a little clanker at the other end, an electromagnet with an armature and all he could do was press the key and go clankety-clank-clank at the other end.
- So he invented a way of using this to send messages and he needed it to be efficient.
- So he represented the letters which occurred most often by shorter patterns for example an E is one dot
- So what Huffman came up with was a way that you could take a block of text and count the number of each of the different characters and build an efficient, like, the most efficient, the provably most efficient encoding such that those characters that occur the most often will have the shortest representation.
- What simple Huffman coding does, when it's applied, is to essentially re-encode this fixed-length eight bits per character in a variable length token.
25:41 - 30:38
- The next advancement involved the use of dictionaries
- If both ends had the same dictionary then instead of sending the words in a message you can just send a pointer to a location in the dictionary
- The problem with this is both ends need to have the same dictionary and you would have to send the dictionary as well as all the pointers
30:39 - 39:00
- The next advancement was what Lempel & Ziv come up with
- They devised a way of never having to send the dictionary
- Both the sender and receiver start with an empty dictionary
- The sender sees if the first word is in the dictionary if it is not they send the word and put it in their dictionary
- The receiver checks if the word is in their dictionary and if it is not adds it to their dictionary
- If the word is in the dictionary already you send a pointer to it instead of the word
- The dictionary is a fixed length and the older words get knocked out as time passes
39:01 - 44:07
- A refinement of this technique was invented by Terry Welch in 1984
- He preloaded the dictionary with a set of symbols so you only ever have to send pointers
- Ad Times: 0:35-0:49 and 9:32-12:23
- Recorded Date: July 1, 2009
- Release Date: July 16, 2009
- Duration: 45:50
- Log line:
- Edited by: Tony
|This area is for use by TWiT staff only. Please do not add or edit any content within this section.|