On Compressing the English Language

Someone picked my brain the other day looking for a technique to compress language files.

After walking away to think about it… my method was to re-order the ASCII code to the letters by their frequency and the most common words by their frequency.

Where lowercase e is stored as an ASCII value using 1 byte
ASCII  e = 0x61 = 0b1100001 = 7 bits
vs
APK e = 0x1 = 1 bit

… this method stores an E in 1 bit.  This is similar to the Huffman Code with the addition of whole words being included in the code.

For example:

because” is the 94th most used word in the english language and in this method is stored in 7 bits.

I don’t know if this has been done before… but I would imagine it could compress Language files substantially.

I have thought about a third addition of using the most used 2 or three letter combinations commonly used.

APK ORDER APK LET FREQ WORD FEQ APK BIN APK HEX APK BITS USED
0 space 0 0 1
1 e 12.70% 1 1 1
2 t 9.06% 10 2 2
3 a 8.17% 11 3 2
4 o 7.51% 100 4 3
5 i 6.97% 101 5 3
6 n 6.75% 110 6 3
7 s 6.33% 111 7 3
8 h 6.09% 1000 8 4
9 r 5.99% 1001 9 4
10 d 4.25% 1010 A 4
11 l 4.03% 1011 B 4
12 c 2.78% 1100 C 4
13 u 2.76% 1101 D 4
14 m 2.41% 1110 E 4
15 w 2.36% 1111 F 4
16 f 2.23% 10000 10 5
17 g 2.02% 10001 11 5
18 y 1.97% 10010 12 5
19 p 1.93% 10011 13 5
20 b 1.49% 10100 14 5
21 v 0.98% 10101 15 5
22 k 0.77% 10110 16 5
23 j 0.15% 10111 17 5
24 x 0.15% 11000 18 5
25 q 0.10% 11001 19 5
26 z 0.07% 11010 1A 5
27 the 1 11011 1B 5
28 be 2 11100 1C 5
29 to 3 11101 1D 5
30 of 4 11110 1E 5
31 and 5 11111 1F 5
32 a 6 100000 20 6
33 in 7 100001 21 6
34 that 8 100010 22 6
35 have 9 100011 23 6
36 I 10 100100 24 6
37 it 11 100101 25 6
38 for 12 100110 26 6
39 not 13 100111 27 6
40 on 14 101000 28 6
41 with 15 101001 29 6
42 he 16 101010 2A 6
43 as 17 101011 2B 6
44 you 18 101100 2C 6
45 do 19 101101 2D 6
46 at 20 101110 2E 6
47 this 21 101111 2F 6
48 but 22 110000 30 6
49 his 23 110001 31 6
50 by 24 110010 32 6
51 from 25 110011 33 6
52 they 26 110100 34 6
53 we 27 110101 35 6
54 say 28 110110 36 6
55 her 29 110111 37 6
56 she 30 111000 38 6
57 or 31 111001 39 6
58 an 32 111010 3A 6
59 will 33 111011 3B 6
60 my 34 111100 3C 6
61 one 35 111101 3D 6
62 all 36 111110 3E 6
63 would 37 111111 3F 6
64 there 38 1000000 40 7
65 their 39 1000001 41 7
66 what 40 1000010 42 7
67 so 41 1000011 43 7
68 up 42 1000100 44 7
69 out 43 1000101 45 7
70 if 44 1000110 46 7
71 about 45 1000111 47 7
72 who 46 1001000 48 7
73 get 47 1001001 49 7
74 which 48 1001010 4A 7
75 go 49 1001011 4B 7
76 me 50 1001100 4C 7
77 when 51 1001101 4D 7
78 make 52 1001110 4E 7
79 can 53 1001111 4F 7
80 like 54 1010000 50 7
81 time 55 1010001 51 7
82 no 56 1010010 52 7
83 just 57 1010011 53 7
84 him 58 1010100 54 7
85 know 59 1010101 55 7
86 take 60 1010110 56 7
87 people 61 1010111 57 7
88 into 62 1011000 58 7
89 year 63 1011001 59 7
90 your 64 1011010 5A 7
91 good 65 1011011 5B 7
92 some 66 1011100 5C 7
93 could 67 1011101 5D 7
94 them 68 1011110 5E 7
95 see 69 1011111 5F 7
96 other 70 1100000 60 7
97 than 71 1100001 61 7
98 then 72 1100010 62 7
99 now 73 1100011 63 7
100 look 74 1100100 64 7
101 only 75 1100101 65 7
102 come 76 1100110 66 7
103 its 77 1100111 67 7
104 over 78 1101000 68 7
105 think 79 1101001 69 7
106 also 80 1101010 6A 7
107 back 81 1101011 6B 7
108 after 82 1101100 6C 7
109 use 83 1101101 6D 7
110 two 84 1101110 6E 7
111 how 85 1101111 6F 7
112 our 86 1110000 70 7
113 work 87 1110001 71 7
114 first 88 1110010 72 7
115 well 89 1110011 73 7
116 way 90 1110100 74 7
117 even 91 1110101 75 7
118 new 92 1110110 76 7
119 want 93 1110111 77 7
120 because 94 1111000 78 7
121 any 95 1111001 79 7
122 these 96 1111010 7A 7
123 give 97 1111011 7B 7
124 day 98 1111100 7C 7
125 most 99 1111101 7D 7
126 use 100 1111110 7E 7

Using a POD 2 switcher

I wanted to add some components to my home listening system, so a switcher was needed. I’ve decided to use a POD 2 switcher…
the POD-2 is a (6) Stereo input – stereo switcher that has the ability to sum all inputs. The output of the switcher feeds a Stereo Penny and Giles fader then directly to the Mackie HR824 speakers.
I’ve got to find a panel… but as far as home friendly user interface goes this is as simple as it gets.
1 Turn Table
2 Radio
3 Mac Mini
4 Cassette
5
6 Airport Express out
2014-03-17 10.00.15

A Social Experiment

I’ve been thinking about a social experiment.
I often take calls from people looking for services, I’ve met a lot of great people in this industry and I do stand by their great work and contributions.   So instead of fielding calls and answering the questions I’ve created a Wizard Services section at http://audioaholics.com/wizard_services

I am asking you my friends to use it to work, get work share work and do work together.  I would like to consider it my rolodex

I’ve got the permission to post a few contacts and will be doing so this week.

Screen Shot 2014-03-16 at 1.53.56 PM

 

A project I’ve been working on has come… full circle ;-)

52d6c72ee8992-AMM_1963

Congrats to Kevin Park and the Lacquer channel team for getting the Neumann VMS70 back in the game. A fantastic project to have been a part of! We recapped the entire audio electronics for Kevin before he had lathe aficionado Chris Muth calibrate it. It’s a fine precision piece of audio electronics and Kevin is one HELL of a lathe mechanic and cutting engineer! It, and he, are in good hands over at Toronto’s Lacquer Channel. I’m excited to cut a few plates myself!

Photo and story here: http://www.thegridto.com/culture/music/play-it-as-it-lathes/

Did you see it?
Screen Shot 2014-01-16 at 8.38.00 PM

IMG_3954

IMG_3961

IMG_3952
IMG_3759

IMG_6397

Screen Shot 2014-01-16 at 8.59.54 PM

Screen Shot 2014-01-16 at 8.59.07 PM

IMG_5477

Screen Shot 2014-01-16 at 8.58.29 PM

IMG_5514

IMG_6354
Many Thanks to Dave, Jose, and Tyler for their help and enthusiasm about the project!