On Compressing the English Language

Someone picked my brain the other day looking for a technique to compress language files.

After walking away to think about it… my method was to re-order the ASCII code to the letters by their frequency and the most common words by their frequency.

Where lowercase e is stored as an ASCII value using 1 byte
ASCII  e = 0x61 = 0b1100001 = 7 bits
vs
APK e = 0x1 = 1 bit

… this method stores an E in 1 bit.  This is similar to the Huffman Code with the addition of whole words being included in the code.

For example:

because” is the 94th most used word in the english language and in this method is stored in 7 bits.

I don’t know if this has been done before… but I would imagine it could compress Language files substantially.

I have thought about a third addition of using the most used 2 or three letter combinations commonly used.

APK ORDER APK LET FREQ WORD FEQ APK BIN APK HEX APK BITS USED
0 space 0 0 1
1 e 12.70% 1 1 1
2 t 9.06% 10 2 2
3 a 8.17% 11 3 2
4 o 7.51% 100 4 3
5 i 6.97% 101 5 3
6 n 6.75% 110 6 3
7 s 6.33% 111 7 3
8 h 6.09% 1000 8 4
9 r 5.99% 1001 9 4
10 d 4.25% 1010 A 4
11 l 4.03% 1011 B 4
12 c 2.78% 1100 C 4
13 u 2.76% 1101 D 4
14 m 2.41% 1110 E 4
15 w 2.36% 1111 F 4
16 f 2.23% 10000 10 5
17 g 2.02% 10001 11 5
18 y 1.97% 10010 12 5
19 p 1.93% 10011 13 5
20 b 1.49% 10100 14 5
21 v 0.98% 10101 15 5
22 k 0.77% 10110 16 5
23 j 0.15% 10111 17 5
24 x 0.15% 11000 18 5
25 q 0.10% 11001 19 5
26 z 0.07% 11010 1A 5
27 the 1 11011 1B 5
28 be 2 11100 1C 5
29 to 3 11101 1D 5
30 of 4 11110 1E 5
31 and 5 11111 1F 5
32 a 6 100000 20 6
33 in 7 100001 21 6
34 that 8 100010 22 6
35 have 9 100011 23 6
36 I 10 100100 24 6
37 it 11 100101 25 6
38 for 12 100110 26 6
39 not 13 100111 27 6
40 on 14 101000 28 6
41 with 15 101001 29 6
42 he 16 101010 2A 6
43 as 17 101011 2B 6
44 you 18 101100 2C 6
45 do 19 101101 2D 6
46 at 20 101110 2E 6
47 this 21 101111 2F 6
48 but 22 110000 30 6
49 his 23 110001 31 6
50 by 24 110010 32 6
51 from 25 110011 33 6
52 they 26 110100 34 6
53 we 27 110101 35 6
54 say 28 110110 36 6
55 her 29 110111 37 6
56 she 30 111000 38 6
57 or 31 111001 39 6
58 an 32 111010 3A 6
59 will 33 111011 3B 6
60 my 34 111100 3C 6
61 one 35 111101 3D 6
62 all 36 111110 3E 6
63 would 37 111111 3F 6
64 there 38 1000000 40 7
65 their 39 1000001 41 7
66 what 40 1000010 42 7
67 so 41 1000011 43 7
68 up 42 1000100 44 7
69 out 43 1000101 45 7
70 if 44 1000110 46 7
71 about 45 1000111 47 7
72 who 46 1001000 48 7
73 get 47 1001001 49 7
74 which 48 1001010 4A 7
75 go 49 1001011 4B 7
76 me 50 1001100 4C 7
77 when 51 1001101 4D 7
78 make 52 1001110 4E 7
79 can 53 1001111 4F 7
80 like 54 1010000 50 7
81 time 55 1010001 51 7
82 no 56 1010010 52 7
83 just 57 1010011 53 7
84 him 58 1010100 54 7
85 know 59 1010101 55 7
86 take 60 1010110 56 7
87 people 61 1010111 57 7
88 into 62 1011000 58 7
89 year 63 1011001 59 7
90 your 64 1011010 5A 7
91 good 65 1011011 5B 7
92 some 66 1011100 5C 7
93 could 67 1011101 5D 7
94 them 68 1011110 5E 7
95 see 69 1011111 5F 7
96 other 70 1100000 60 7
97 than 71 1100001 61 7
98 then 72 1100010 62 7
99 now 73 1100011 63 7
100 look 74 1100100 64 7
101 only 75 1100101 65 7
102 come 76 1100110 66 7
103 its 77 1100111 67 7
104 over 78 1101000 68 7
105 think 79 1101001 69 7
106 also 80 1101010 6A 7
107 back 81 1101011 6B 7
108 after 82 1101100 6C 7
109 use 83 1101101 6D 7
110 two 84 1101110 6E 7
111 how 85 1101111 6F 7
112 our 86 1110000 70 7
113 work 87 1110001 71 7
114 first 88 1110010 72 7
115 well 89 1110011 73 7
116 way 90 1110100 74 7
117 even 91 1110101 75 7
118 new 92 1110110 76 7
119 want 93 1110111 77 7
120 because 94 1111000 78 7
121 any 95 1111001 79 7
122 these 96 1111010 7A 7
123 give 97 1111011 7B 7
124 day 98 1111100 7C 7
125 most 99 1111101 7D 7
126 use 100 1111110 7E 7

Using a POD 2 switcher

I wanted to add some components to my home listening system, so a switcher was needed. I’ve decided to use a POD 2 switcher…
the POD-2 is a (6) Stereo input – stereo switcher that has the ability to sum all inputs. The output of the switcher feeds a Stereo Penny and Giles fader then directly to the Mackie HR824 speakers.
I’ve got to find a panel… but as far as home friendly user interface goes this is as simple as it gets.
1 Turn Table
2 Radio
3 Mac Mini
4 Cassette
5
6 Airport Express out
2014-03-17 10.00.15

A project I’ve been working on has come… full circle ;-)

52d6c72ee8992-AMM_1963

Congrats to Kevin Park and the Lacquer channel team for getting the Neumann VMS70 back in the game. A fantastic project to have been a part of! We recapped the entire audio electronics for Kevin before he had lathe aficionado Chris Muth calibrate it. It’s a fine precision piece of audio electronics and Kevin is one HELL of a lathe mechanic and cutting engineer! It, and he, are in good hands over at Toronto’s Lacquer Channel. I’m excited to cut a few plates myself!

Photo and story here: http://www.thegridto.com/culture/music/play-it-as-it-lathes/

Did you see it?
Screen Shot 2014-01-16 at 8.38.00 PM

IMG_3954

IMG_3961

IMG_3952
IMG_3759

IMG_6397

Screen Shot 2014-01-16 at 8.59.54 PM

Screen Shot 2014-01-16 at 8.59.07 PM

IMG_5477

Screen Shot 2014-01-16 at 8.58.29 PM

IMG_5514

IMG_6354
Many Thanks to Dave, Jose, and Tyler for their help and enthusiasm about the project!

Transfering Records in reverse

I was given a 78 to transfer. When I transfer something one of a kind I do 3 passes. The first is dry, dusted as the client delivered.

The second is wet (Ultrasonic bath if possible)

The third is in reverse. I find it cuts the pops and skips differently. EX: if you have a scratch in one direction it MAY not show up in the reversed transfer. Edit together and whadaya get… bitty bopity boo.

  • Here is the same scratch forward (top) and transfered in reversed and reversed in PT (Bottom)

    Screen Shot 2013-02-06 at 9.57.14 PM

  • I use a Stanton STR8-100 with a SPDIF out. It has a button that simply reverses the direction of playing, it is rock solid. It’s got the direct drive faults, but non the less sounds great.

    stanton_STR8-100

  • Modern Recording

    Image

    Shared From The Rukkus Room

    This is a great illustration of where quality has gone. (and a good drawing of the actual gear helps). We the content creators, and broadcasters care about this long gone thing called fidelity and dynamic range, but we’re at the mercy of Apple right now. The compression is moot with even todays data transfer rates. If you’re releasing a record I must now suggest not making CDs and go right to HiFi uncompressed downloads, or Vinyl (or Both). CDs are just something I rip and shelf, I’d rather have LPs on my shelf and I have limited choices when it comes to portable music and it may as well be a 24bit 96k master. Not holding my breath for a 1bit portable player and 100gb/µs fiber to my door.

    modern recording

    PS: I don’t take credit for this guys work, I just haven’t figured out how to shut off the watermark on individual pics: my bad