You are on page 1of 11

ON HEURISTIC METHODS FOR DECIPHERING THE ZODIAC 340-CIPHER

Ryan Garlick <garlick@unt.edu> Visiting Assistant Professor, University of North Texas, e!art"ent of #o"!uter $cience and %ngineering, enton, T& '()*+ and $ran# n % J r !y All !an <,ere"ya'-@yahoo.co">, All n Arn"l# <,aa*./.@unt.edu>, <0gl***-@unt.edu>, J"&n Dani l % "n <,ohndleon@g"ail.co">, J" y %i c&'y <,oey1iechty@g"ail.co">, Tann r Oak ( <tanneroakes@g"ail.co">, C"r y R"( !)r*y <c!r**..@unt.edu>, R)'+ik Pan#ya ,rutvik!andya@yahoo.co">, Ar'&)r -illia!( <retekin@g"ail.co">, R)(( ll A. / r!al <yer"al@unt.edu>

A$STRACT
2e !resent several heuristic "ethods for atte"!ting to deci!her the unsolved 3odiac 4iller5s 6/* character ci!her. 7ased on !revious co""unication, the "essage is !ossi0ly a ho"o!honic su0stitution ci!her. ue to the large nu"0er of !otential keys, heuristic "ethods are e"!loyed. These algorith"s are !resented, along 8ith their "otivation in relation to the characteristics of the ci!her. 9esults, a!!lications of distri0uted !rocessing, and notes on future direction are also !resented. Index Terms (See section 12) : A!!lied #ry!togra!hy, $u0stitution #i!her, 3odiac, ;enetic Algorith", <ntelligent 7rute =orce Algorith", 9ando" 7rute =orce Algorith", #haracter =re>uency Analysis, Natural 1anguage Processing, istri0uted Processing

!u0lic 8ith letters and ci!hers. Cn Duly 6., .?(?, less than a "onth after the 3odiac@s attack on Eike Eageau and his third confir"ed "urder of arlene =errin, the 3odiac su0"itted his first ci!her. <t 8as sent to the Valle,o Ti"es Berald, the $an =rancisco #hronicle, and the $an =rancisco %xa"iner in three e>ual !arts 8ith no o0vious distinction as to 8hich order they 8ere to 0e read. Along 8ith each !art of the ci!her 8ere near identical handF8ritten letters threatening to kill a doAen !eo!le in a 8eekendFlong ra"!age, if his ci!her 8as not !rinted on the front !age of the very next ne8s!a!er edition. %ach !ortion of the ci!her 8as eventually !u0lished, though not all initially on the front !ages and not all on the re>uested date. The co"0ined ci!her 8as .' characters across, and )/ characters long for a total of /*- characters. Although code 0reakers fro" the =7<, N$A, #<A, and Naval <ntelligence all tried to solve the ci!her, it 8as onald Barden, a forty one year old high school history teacher, and his 8ife 7etty Barden, 8ho finally cracked it. <n addition to so"e !rior kno8ledge of so"e 0asic code 0reaking techni>ues, Barden 8as a0le to solve the ci!her 0y "aking a very insightful and very lucky assu"!tion that the coded "essage 0egan 8ith G< like killingH. Barden also "ade a nu"0er of other 0asic assu"!tions as 8ell. Na"ely that it 8as a ho"o!honic ci!her Ione letter 0eing !otentially re!resented 0y "any sy"0olsJ and that the "essage read fro" left to

0 INTRODUCTION
The 3odiac 8as a notorious serial "urderer fro" the greater $an =rancisco area during the .?(*@s and '*@s. The 3odiac 8as confir"ed to have seven victi"s 8ith five left dead, and clai"ed to have killed doAens "ore. Be is "ost 8ell re"e"0ered for threatening to 0lo8 u! school 0uses full of children, skillful "ani!ulation of the "edia, and of "ost interest to us, for his taunting of the !olice and

right and to! to 0otto" like a 0ook. The 3odiac@s second ci!her 8as received on Nove"0er -th, .?(? 0y the $an =rancisco #hronicle. The ci!her ca"e less than one "onth after the 3odiac@s fifth and final confir"ed "urder, of ca0 driver Paul $tine. #ontained in the envelo!e 8ith the ci!her 8ere a greeting card and a 0loody !iece of Paul $tine@s shirt taken fro" the scene of his "urder. This second ci!her 8as again .' characters across, 0ut only )* characters long, resulting in 6/* characters total I(- characters less than the !revious ci!herJ. <n addition, this ne8 6/* ci!her had (6 uni>ue sy"0ols, "ore than the !revious. The co"0ination of "ore uni>ue sy"0ols and a shorter "essage gave this ci!her a "uch higher degree of difficulty. Although the 3odiac sent three "ore ci!hers to the ne8s !a!ers, the third and forth are very short and therefore no conclusive analysis can yet 0e done on the". Also, author of the 0ooks 3odiac and 3odiac Un"asked, 9o0ert ;rays"ith, clai"ed to have solved the 6/* ci!her. Bo8ever, it is our 0elief that even though his solution is the 0est to date, it is not a co"!lete, a correct, nor a satisfactory solution. 2e have 8orked to solve this ci!her once and for all. U!on exa"ination of the !ro0le", one can >uickly calculate the total nu"0er of !ossi0le keys to the ci!her. As there are (6 uni>ue sy"0ols to account for the )( letters of the %nglish al!ha0et I8e are assu"ing the decoded text is %nglishJ, there are a total of (6 )( co"0inations. This is roughly (.*' x .* /( !ossi0le keys.

FIGURE 0 The 3odiac 6/*F#i!her

) Previous %fforts
#i!hers of every variation s!an the ga"ut of hu"an history and use. Eostly they 8ere used for "ilitary and !olitical reasons for the !ur!ose of secrecy. <n "odern ti"es, they have taken on the guise of a "urderer trying to !rove his su!eriority over la8 enforce"ent. The 3odiac 4iller utiliAed the ho"o!honic ci!her variant to taunt !olice 8ith his na"e and vignette, 0ut has also left one 0est kno8n ci!her in his 8ake. To this day, no credi0le solution exists, 0ut history has !rovided "any tools to the cry!tologist for 0reaking such codes. These are fre>uency analysis of sy"0ols and letters, 0iF and trigra" recognition, and the identification of Gcri0s.H =re>uency analysis dates 0ack to alF4indi, an Ara0ic scholar that used the techni>ue to analyAe religious texts to validate the 8orks of Eoha""ada. <ts use states that for any given language, there 8ill 0e a fre>uency that each letter 8ill a!!ear. This is an interesting challenge for the 3odiac 6/*Fci!her, as the 3odiac 4iller 8as fond of "issFs!elling 8ords in his docu"ents. To have an accurate fre>uency chart, 8e took all !u0lished corres!ondence that the 3odiac 4iller gave to the ne8s!a!ers and !oliceK)L and then transcri0ed these docu"ents to a textF0ased digital for"at. This allo8s us to calculate 8hat the fre>uency of letters is for his 8ork, and includes other 0enefits that are discussed 0elo8. This analysis 0eco"es "ore difficult 0ecause of his 0elieved use of a ho"o!honic ci!her0. The ty!ical strategy involves taking the fre>uency !ercentages, and using a nu"0er of sy"0ols e>ual to the !ercentage of every letter. =or exa"!le, the letter GeH in %nglish is used a!!roxi"ately .)M of the ti"e. Thus, t8elve different sy"0ols should 0e used. 2hen !ro!erly used, this reduces the fre>uency of each letter to .M for the ci!her text.
a

htt!NOO888.vectorsite.netOttcodeP*..ht"lQ". R as of ..O)(O)**' $ince there are (6 uni>ue sy"0ols used in the 6/*Fci!her, his !revious use of ho"o!honic su0stitution ci!hers, and the lack of anything that 8ould "ark it as any other ty!e of code or ci!her, 8e 8ent on the assu"!tion that this 8as indeed another ho"o!honic su0stitution ci!her.
0

<n his !revious ci!her, the 3odiac 4iller used seven sy"0ols for the letter GeH alone. Bo8ever, "any other letters 8ere left 8ith either one or t8o sy"0ols. This o!ened u! his ci!her to 0e analyAed for 0igra"s and trigra"s that are co""on to his 8riting. 2e "entioned a0ove that 8e !laced his 8riting into a digital for"at, 8hich allo8ed us to analyAe the fre>uency of his letter usage. <t also allo8ed us to find all the co""on 0igra"s and trigra"s that he used. Eany of his 0igra"s 8ere identified in his original ci!her 0ecause of his use of 8ords like GkillH and GkillingH that used a dou0le GlH, 8hile he only used t8o sy"0ols to re!resent the letter GlH. This caused the t8o sy"0ols for GlH to a!!ear next to each other fre>uently. 7y analyAing his !revious 8ork, 8e can tag "any of the !otential 0igra"s and trigra"s inherent to his 8riting style, and use it to identify the !ossi0le !laces for 8ords in his 6/*Fci!her. =inally, fro" the 8ork of others, 8e have the 8isdo" to look for Gcri0sH, or co""on 8ords used for a given grou! or su0,ect, in his 8orks. =or the 3odiac 4iller, he talks "ainly a0out collecting GslavesH and ho8 he en,oys Gkilling,H including his "iss!elling of 8ords like G#hrist"assH and G=ryH for =riday, are all signatures of his. <f 8e can identify 8here these 8ords are likely to sho8 u!, then 8e 8ill 0e farther along in our solving of the 6/*Fci!her. <n our atte"!t, 8e also looked at other algorith"ic "ethods to utiliAe co"!uters to do the "a,ority of the 8ork for us. 2e revie8ed one "ethod in !articular, that of 7rax #isco. The "ethod develo!ed 0y 7rax #isco utiliAes a Billcli"0er algorith" to enter a solution key for the ci!her text that is either develo!ed 0y guess8ork or generated rando"ly. <t then takes the generated solution for the given key and scores it 0ased on a !reloaded set of 0iF, triF, tetraF, and !entagra!hs co""on to the %nglish language. <f the resulting score is 0etter than the !revious 0est score, then !erfor"s a shuffle on the key Iexchanging the value of t8o characters in the key rando"lyJ and then !erfor"s another solution and scoring on the ne8 variation of the key. <f the ne8 key scores higher, it re!laces the older key and runs through the algorith" again 0y

s8itching another !air and co"!aring the ne8 key against the saved one.

3 DISTRI$UTED PROCESSING IMP%EMENTATION


The client is i"!le"ented 8ith the feature of sending keys a0ove a constant threshold score value to a dedicated server. The server stores the keys in a data0ase and kee!s track of the nu"0er of keys sent to it. The keys "ay then 0e checked 0y a hu"an reader to verify correctness in the ci!her. The hu"an reader "ay also see that if the solution is al"ost correct exce!t for one sy"0ol in the ci!her, it is a nearFcorrect solution. The edited or unedited key "ay 0e sent 0ack to the server as a solution that is 8orth looking at. These keys are stored in a se!arate data0ase 8hich "ay 0e looked over at a later date.

4 %INGUISTIC PROPERTIES
Cne of the first things 8e 8anted to do is find the fre>uencies of the sy"0ols and letters in the 6/*F ci!her to see if "ay0e there 8as a !attern to 0e found. <n order to hel! us find a !attern 8e decided to try to use so"ething that has already 0een solved 8hich 8as 3odiac5s /*-Fci!her. =irst 8e gave every sy"0ol in the /*-Fci!her a uni>ue nu"0er to "ake it easier to read and co"!ute. This 8ill also hel! 8hen 8e !lug it in to our !rogra" to find the fre>uencies. Next thing 8e did 8as another fre>uency test to see if there 8as anything to 0e found. $ince the /*-Fci!her 8as a ho"o!honic su0stitution ci!her, 8e 8ere ho!ing there 8ould 0e so"e si"ilarities 0et8een the t8o ci!hers. Cne !attern 8e did find 8as a !attern consistency 8ithin his ci!her. There 8ould 0e + to ' sy"0ols that 8ould re!resent the sa"e letter, and in his 8riting 3odiac 8ould cycle the" re!eatedly. =or instance, say the letters 7, B, T, S, and O all re!resent the sa"e letter in the decoded ci!her. The encry!ted ci!her could read so"ething like thisN $A#;HA3#TPCP/%%V1. After the slash, the encry!ted ci!her 8ould restart, 8ith 7 a!!earing 0efore B, a!!earing 0efore T, and so on, and 3odiac 8ould continue this !attern 8ith all the other sy"0ols in the ci!her.

Unfortunately this !attern does not see" to exist in the 6/* ci!her. Another direction 8e took 8as to "ake a voca0ulary co"!osed of all of his letters and ci!hers. 2e "ade a !rogra" to find the 8ords he used "ost fre>uently in his !revious 8ritings. This, along 8ith so"e of the "ost fre>uent 8ords used in the %nglish language, "ake u! our voca0ulary, 8hich 8e use to find out 8hat he could !ossi0ly 0e saying in the 6/* ci!her. <n considering the linguistic !ro!erties of the 6/*F ci!her, 8e took a thorough analysis of 0oth the 6/*F ci!her and the original, three !art /*-Fci!her. <n the 6/*Fci!her, there are (6 uni>ue characters used in the ci!her, "ost of the" 0eing 1atin characters, 1atin characters "irrored, and sha!es, such as a circle 8ith a cross I8hich the 3odiac used as his signatureJ, and circles, s>uare, and triangles 8ith different designs in the". <n order to !rocess the individual characters, each sy"0ol 8as given a t8o digit nu"0er that corres!onded to its !osition in the ci!her 8here it 8as first used. =or exa"!le, the first character is **, the second character *., and so on. 2e !erfor"ed a fre>uency analysis on the ci!her using a custo" !rogra" Isee =igure )J. The results sho8ed that 8e could not easily use a fre>uency analysis to 0reak the ci!her. Using a se!arate custo" !rogra", 8e analyAed the nFgra"s of ci!her I8here nT),6,/,+J. The results sho8ed that the 3odiac 8as very good at kee!ing a8ay fro" !atterns. Very fe8 0iFgra"s or trigra"s 8ere re!eated in the ci!her. 2e also used the !rogra" to analyAe a large 0ody of %nglish text. $ee the $coring section to see ho8 the scoring algorith" uses these statistics the score the solutions. Also, a se!arate !rogra" 8as develo!ed in order to take the 0ody of 3odiac letters and kee! statistics of 8hat 8ords 8ere used 0y the 3odiac, including "iss!ellings, and use this in our dictionary for the client.

FIGURE 2 =re>uency Analysis of the 6/* #i!her

3 SCORING
An auto"ated !rocess for >uickly deter"ining if a given key is of interest to us, that is, it a!!ears to 0e close to a valid solution to the !ro0le", is necessary for the vast a"ount of !ossi0le keys that exist. Cur atte"!t to create such a !rocess resulted in our scoring algorith" 8hich is descri0ed 0elo8. The first ste! to8ards scoring a key is si"!ly to decode the ci!her using the given key. This si"!ly consists of "a!!ing the key@s letter re!resentations

of each sy"0ol to each sy"0ol occurrence in the ci!her. 2e assu"e that the ci!her is a ho"o!honic su0stitution ci!her in 8hich each key letter "a!s directly to its sy"0ol in the ci!her and that there can 0e "ore than one sy"0ol !er letter. This leaves us 8ith a 0lock of %nglish letters 8hich 8e can no8 analyAe and assign a score to.

0y generating an average fre>uency that 8e could ex!ect for each letter. This 8as done 0y analyAing 3odiac and %nglish text files and generating the average fre>uency for each letter. These generated values are considered to 0e our average letter fre>uencies for %nglish text. 2e then score the decoded text 0y co"!aring its letter fre>uency values 8ith those of our standard values. A !erfect score is given 8hen the fre>uency of each letter in the text 0lock exactly "atches our standard values, and fro" there, the larger the deviation fro" the standard the lo8er the score. 1etter fre>uency is an i"!ortant as!ect to scoring in that it does not re8ard re!etitive textN such as a 0lock that only consists of the 8ord GthereH re!eated over and over for exa"!le.

FIGURE 3 %xa"!le of decoding ci!her text 8ith key

2ith our scoring algorith", 8e are 0asically scanning a string of letters and trying to deter"ine if it is 8hat 8e consider to 0e reada0le %nglish text. Eore s!ecifically ho8ever, our routine assigns a score to each key that lets us kno8 ho8 close or far the solution is to our standards for %nglish text. This relative scoring is an i"!ortant as!ect in that it allo8s us to set a threshold for flagging any good keys that 8e generate to 0e looked at. $coring is also necessary for running algorith"s such as the genetic algorith" 8hich needs to kno8 8hich keys are 0etter than others. As 8e read through a 0lock of letters, our scoring algorith" is analyAing and 0asing the final score on three different as!ects of %nglish textN letter fre>uency, nFgra!h fre>uency, and 8ord fre>uency. Cnce scores have 0een deter"ined for each of these individual as!ects, they are then co"0ined to create a final score 8hich tells us the key@s relative likeness to our standards for %nglish text. The !rocess for each of these !rocedures is ex!lained 0elo8. Letter Scoring The "ost 0asic as!ect of our scoring algorith" is to analyAe the individual fre>uency of each letter in the %nglish al!ha0et in the decoded text. 2e started

N-Graph Scoring The next !art of our analysis is to generate a score for the nFgra!h fre>uency of the text. Cur "odel for this !rocess is 0ased on the 8ork done 0y 7rax #isco in his 3odiac 6/* hill cli"0er !rogra". NF gra!h is a ter" 8e use to re!resent all !ossi0le letter co"0inations of length n. The nFgra!hs 8e use for our scoring routine are 0iFgra!hs, triFgra!hs, tetraF gra!hs, and !entaFgra!hs. These are letter co"0inations of t8o, three, four, and five res!ectively. =or exa"!le, the set of all !ossi0le 0iF gra!hs is Uaa, ab,, zzV, and so on for all other nF gra!hs. 2e used a large sa"!le of %nglish text to generate a list of all nFgra!hs that occurred and then calculated a fre>uency for each that is used for nF gra!h scoring. -"r#4 4<11<N; $i-Gra5&(4 4<, <1, 11, 1<, <N, N; Tri-Gra5&(4 4<1, <11, 11<, 1<N, <N; T 'ra-Gra5&(4 4<11, <11<, 11<N, 1<N; P n'a-Gra5&(4 4<11<, <11<N, 11<N;
FIGURE 4 %xa"!le of 0reaking text into nFgra!hs

The "odel 8e use for generating an nFgra!h score is to ste! through each index of the text 0lock and incre"ent the score 0ased on the fre>uency of the found nFgra!hs for each index. The "ore fre>uently the given nFgra!h occurs in %nglish, the larger the

incre"ent to the total nFgra!h score. Using this !rocedure, 8e then generated a standard nFgra!h score for sections of %nglish text of length 6/*. This 8as done 0y 0reaking u! sa"!les of 3odiac and %nglish text into sections of length 6/* and then averaging the found nFgra!h scores for each section Inote that !rocedure this can 0e scaled to any siAe !ro0le"J. =ro" here, 8e can give decoded keys an nFgra!h score 0y co"!aring its !rocessed nFgra!h score to our standard score. A !erfect score is given to keys 8ith an nFgra!h score that e>uals or exceeds our standard, and fro" there the larger deviation fro" the standard the lo8er the score. NFgra!h scoring is advantageous to solving our !ro0le" in that it re8ards the !artial for"ulation of %nglish 8ordsN an i"!ortant as!ect to increasing the score of keys that "ay 0e close to a !ossi0le solution, and to key generation algorith"s such as a genetic algorith" 8hich 8ill atte"!t to co"!lete these !artially found 8ords to increase its score. Also, this s!ecifically 8orks 8ell for the 3odiac ci!her 0ecause the 3odiac killer is kno8n to !ur!osefully "iss!ell certain 8ords. $o, even though these "iss!ellings "ay receive no 8ord score, they 8ould still get an increased score due to their !artial co"!letion. Word Scoring The third as!ect of our scoring syste" is to deter"ine a 8ord score for our decoded texts. 2hile the 0asic idea is to search through the text for 8ords, giving texts 8ith "ore 8ords a higher score, there are "any o!tions and varia0les for doing so and 8e descri0e a nu"0er of our efforts 0elo8. The first necessity for 8ord scoring is a dictionary for 8ord looku!. There are a 8ide nu"0er of reasona0le !ossi0ilities for 8hich dictionary to use and each has its o8n advantages and disadvantages to our !articular !ro0le". =irst and fore"ost, a large nu"0er of 3odiac s!ecific 8ords 8ere chosen and included in every dictionary. This includes co""on 8ords used 0y the 3odiac killer and even his kno8n "iss!ellings. These 8ords 8ere also given a higher score value than co""on 8ords in an atte"!t to find 3odiacFlike solutions. =ro" there, 8e have tried "ulti!le dictionaries for 8ord

scoringN such as the entire %nglish language or the .*** "ost co""only used 8ords. Although there are an infinite nu"0er of !ossi0ilities for this choice, 8e assu"e that the 0est dictionary for solving the !ro0le" includes all of the 8ords that the 3odiac has used and a reasona0le selection of those he hasn@t used 0ut !ossi0ly 8ould. Another >uestion for generating 8ord scores is ho8 to search through the text. Cnce again, there is any nu"0er of o!tions availa0le for doing so and our 0est efforts are descri0ed here. Cur first atte"!t si"!ly advanced through the text, searching for 8ords starting fro" each index. This allo8s the score to 0e incre"ented "ulti!le ti"es for each index, 8hich "ay not !roduce the desired results. Cur "ost successful atte"!t also searches fro" each index, 0ut if it finds a 8ord the index is advanced to the end of the longest found 8ord. Using the latter !rocedure, the su""ation of the lengths of all found 8ords divided 0y the ci!her siAe leaves us 8ith the !ercentage of 8ords in the given text. A text that is co"!letely filled 8ith 8ords generates a !erfect score and so on. The 8ord score is !erha!s the "ost i"!ortant as!ect of scoring in that it ties together 0oth the letter fre>uency and nFgra!h scores to flag 8hat 8e ho!e to find in coherent %nglish text. Total Score Cnce 8e have generated each of the three individual scores, 8e then si"!ly tie the" all together to create a total score for each text. 2e have set u! our !ercentages so that texts that a!!roach our standards for %nglish text also a!!roach a score of .**, 8hile texts that a!!ear to 0e furthest fro" anything reada0le generate a score of *.

FIGURE 3

%xa"!le of total score generation

Cne challenge is to deter"ine 8hat 8eight to assign to each ele"ent of the total score. Cur efforts have sho8n i"!roved results 8hen the 8ord score is given the greatest 8eight of the three, 8ith the nF gra!h and letter scores 8eighed slightly less. This gives the 0est scores to text 8hich have a large a"ount of 8ords 8ith little re!etition and the !ossi0ility of so"e !artial or "iss!elled 8ords.

"ust 0e initialiAed to all WX@ IasteriskJ. Until there are no "ore asterisks in 4eyKL, the follo8ing o!erations 8ill loo!. Using a rando" nu"0er 0et8een * and TF., a token is selected fro" the TokenKL array. This token then atte"!ts to GfitH in each !osition of the ci!her until it is successful or runs out of s!ace in the ci!her. The conditions of a successful GfitH are as follo8sN ..J %ach character of the token can only 0e !laced in an e"!ty IWX@J 4eyKL !osition, or a !osition occu!ied 0y the sa"e letter. ).J A token "ust not 0e !laced on to! of the exact sa"e token Ithis is to avoid ste""ing re!etitionJ. An exa"!le is that the token GTB%H can 0e !lace into !ositions re!resented 0y GXXXH or GTXXH 0ut not into GBXXH or GTB%H. Cnce 4eyKL is GfullH Iall WX@ have 0een re!laced 8ith charactersJ, the key, or the solution "ay 0e out!utted, the algorith" "ay reset the key and find another solution. Genetic Algorithm Cne atte"!t to intelligently generate good ci!her keys is our creation and i"!le"entation of a genetic algorith"N "ore s!ecifically, one 8ith a !articular e"!hasis on solving the 3odiac 6/* ci!her. A genetic algorith" is a glo0al search heuristic that uses techni>ues ins!ired 0y si"!le 0iological evolution to find exact or a!!roxi"ate solutions to search and o!ti"iAation !ro0le"s. 2e i"!le"ented our genetic algorith" into a co"!uter a!!lication 8ith the intent of finding good ci!her keys that "ay lead us to the solution. <n general, a genetic algorith" has t8o s!ecific re>uire"entsN a genetic re!resentation of the solution, and a fitness function to evaluate the solution. =ro" there, the intent of the genetic algorith" is to slo8ly evolve the solution fro" an initial !o!ulation. This !rocess consists of evolutionary selection, re!roduction, and "utation.

6 7E/ GENERATION A%GORITHMS


IBF The <ntelligent 7rute =orce or <7= algorith" is a rando" solution generator for a given !olyF al!ha0etic su0stitution ci!her. <t uses !ara"eters !rovided 0y an nFgra" fre>uency analysis of %nglish 8ritten texts to atte"!t a "ore accurate "odeling of the %nglish language. <7= uses six hardFcoded constant !ara"eters I) strings and / integersJ. The t8o strings re!resent the na"es of .txt files containing the nF gra"Ofre>uency list, and the target ci!her. T Ifor tokensJ re!resents the su" of all nFgra" fre>uency counts. # Ifor ci!herJ re!resents the siAe of, or nu"0er of sy"0ols in, the target ci!her. 4 Ifor keyJ re!resents the nu"0er of uni>ue sy"0ols in the ci!her, and su0se>uently, the siAe of the key. N Ifor nFgra"sJ indicates the siAe, or character length of the nFgra"s. <7= 0egins 0y initialiAing the TokenK$tringL array fro" the text file of nF gra" tokens and integer fre>uency counts. %ach nF gra" is then stored in an array of siAe T and occu!ies the sa"e nu"0er of cells as its res!ective fre>uency count. Next, <7= initialiAes the #i!herKintL array, fro" the ci!her text file. This text file needs to re!resent the ci!her as a series of integers fro" * to 4. NoteN The follo8ing !rocedure should 0e loo!ed to !roduce a constant strea" of !otential solutions Ihence the na"e G7rute =orceHJ. An array 4eyKcharL re!resents the current solution key and

This !rocess then continues until any one or "ore of the !redeter"ined halting criteria are reached. =or exa"!le, these criteria could 0e reaching a desired fitness score or conducting a "axi"u" nu"0er of iterations. Cur genetic organis" is re!resented 8ith a character array re!resenting the uni>ue characters of the given ci!her. The scoring algorith" that 8e used to assign fitness values to each organis" is the sa"e as the one descri0ed a0ove. 2e 0egin a si"ulation 0y filling each array 8ith rando" letters of the al!ha0et until 8e have reached our desired !o!ulation siAe. Next, 8e evaluate our !o!ulation and assign a fitness score to each "e"0er. 2e then select the individual organis"s 8ith the highest fitness score to "ove on to the next generation. The nu"0er of organis"s that survive each generation I$!J is deter"ined 0y the total !o!ulation IT!J and the survival rate I$rJN

To "ake a child, it is rando"ly decided if the "other or the father 8ill 0e a donor. To 0e a donor for the nth !osition in the array "eans that the "other or father 8ill !lace the character in the nth !osition in its array in the nth !osition of the child5s array. The re!roduction !rocess continues until you have created the desired nu"0er of children. Cur desired nu"0er of children is re!resented 0yN total population - (total population * survival rate) * breed rate The "utation !rocess 8ill 0e used to fill out the ne8 !o!ulation so that it is e>ual to the siAe of the starting !o!ulation. The "e"0ers ke!t for re!roduction are !art of the ne8 !o!ulation as 8ell as the children created. The nu"0er that 8ill 0e "utated is given 0yN total population - (total population * survival rate) * mutate rate 2here the mutate rate = 1 - breed rate Cnly the !arent "e"0ers 8ill 0e "utated. =or each "utation a !arent "e"0er is chosen for "utation. Next a rando" nu"0er of characters are chosen to 0e changed. Then for each character chosen to 0e changed a !osition in the array is seleceted and a letter is rando"ly chosen to 0e !ut in that s!ot. The sa"e !osition can 0e selected "uti!le ti"es and a letter "ay 0e re!laced 0y the sa"e letter. Cur ;A is set to re!eat this !rocess until it has gone .** generations, the default a"ount generations it is allo8ed to run, 8ithout finding a ne8 0est score. %very ti"e it finds a ne8 0est score the nu"0er of generations !assed is reset to Aero. Cur ;A is set to the 3odiac 6/* ci!her 0y defualt 0ut 8e can set it to solve another ci!her 0y su!!lying the ci!her siAe, ci!her 8idth, ci!her height, nu"0er of uni>ue characters in the ci!her, and a text file location of the ci!her.

Sp = Tp * Sr, 8here 0 Y Sr Y 1
%ach of the "ost fit individuals u! to this nu"0er are selected to !ass on to the next generation. The rest are assu"ed to 0e una0le to ada!t to the re>uire"ents of the environ"ent and are therefore discarded. Next, 8e use the "e"0ers of the surviving !o!ulation to 0reed the next generation of individuals. Cur re!roduction !rocess rando"ly assigns "e"0ers fro" the surviving !o!ulation to 0e the !arents of the ne8 offs!ring.

FIGURE 6 %xa"!le of 4ey Eating Process

=or fine tuning our ;A 8e can change the survival rate, 0reed rate, siAe of total !o!ulation and nu"0er of generations to run. #hanging the 0reed rate 8ill

also change the "utate rate as the "utate rate isN 1 - breed rate The "ax nu"0er of generations our ;A is allo8ed to run is .****** and the "ax siAe of the !o!ulation is .****. 2e have successfully i"!le"ented our ;A 0ut have not found a via0le solution to the 3odiac 6/* ci!her yet. 2e have co"e to the conclusion that there is nothing 8rong 8ith our ;A "odel and our ;A is finding 8hat 8e have asked it to find according to our fitness function. Therefore 8e need to "ake a fe8 changes to fitness function. 2e have started this !rocess and things are looking 0etter. Random Brute Force Criginally designed to 0e the "ain focus of our distri0uted !rocessing a!!roach, the 9ando" 7rute =orce algorith" si"!ly rando"ly generates (6 letters and !lugs the" into the solution. C0viously, the 9ando" Algorith" is not nearly as effective as the ;A or <7=. 2e co"0ined these "ethods into a single Wclient@ availa0le for do8nload to facilitate distri0uted !rocessing.

4eys sent 0y the Bu"an 9eader are stored in a se!arate list of !ro!osed Zgood keys.Z The Bu"an 9eader can also read keys fro" text files, in the case that the "ain client could not send keys to the server. This !rogra" is necessary as it allo8s us to actually see and edit the results of a key and deter"ine if the key creates a 0elieva0le solution, as o!!osed to a rando" string of 8ords or nFgra!hs. Optimi ing the !lient Cf course, crunching such a "assive a"ount of data re>uired >uite a 0it of o!ti"iAation. The first thing 8e needed to o!ti"iAe 8as our dictionary for 8ord looku!. 2e used a hash ta0le class as a 0ase for our dictionary. The hash ta0le gives us the "ost efficient look u! ti"e of constant ti"e, CI.J. The dictionary class hashes each 8ord in a text file to a "e"ory location. <n the "e"ory location for each 8ord, an integer score for that 8ord is stored. $cores are initially assigned according to 8ord length, 0ut custo" scores can 0e set for s!ecified 8ords. The dictionary serves t8o si"ilar !ur!oses for our !rogra". =irst, it !rovides us in constant ti"e 8hether or not a co"0ination of characters for"s a 8ord, and second, ho8 "uch 8eight that 8ord holds. 2indo8s threads 8ere i"!le"ented in order to !rovide s"ooth functioning of the !rogra" as a 8hole. 2hen a s!ecific algorith" is chosen to run, it is run in a ne8 thread. This allo8s the o!erating syste" control over the hard8are intense algorith". <f "ulti!le !rocessors are availa0le, the o!erating syste" "ay choose to "ove the thread to a less utiliAed #PU.

8 C%IENT DETAI%S
2ith our a!!roach, 8e actually use t8o clients. The first client, 8hich is availa0le to the !u0lic, is the 3odiac ecoder #lient. This client can run any of the algorith"s, it 8orks on a "ultiFthreaded 0asis, and 8henever one of the algorith"s generates a key over a certain threshold, it sends this key to our server, 8hich 8ill 0e discussed in the next section. The second client is called the G$u!er#lientH, or Bu"an 9eader. Unlike the "ain !rogra", only 0y those 8ho are to revie8 the keys is the Bu"an 9eader accessi0le. The $u!er #lient receives keys fro" the server and allo8s the user to vie8 the 8ords found in the current key or edit the current key, if necessary. After revie8ing the key, the Bu"an 9eader can send the key 0ack to the server.

FIGURE 8 A 9unning <nstance of the 3odiac #lient

9 SER:ER PROPERTIES
Cur server has three !ri"ary functions. =irst off, our server receives keys over the defined threshold fro" the "ain client, and stores these keys, as 8ell as essential data such as 8hich algorith" generated the key, the nu"0er of 8ords in the keyFgenerated solution, the !ercentage of nFgra!hs in the solution, and the fit of our generated character fre>uencies, in our data0ase. $econd, the server 8ill send keys generated 0y the client to a running instance of the $u!er #lient, so the keys "ay 0e revie8ed 0y an authoriAed !erson. =inally, the server can receive feed0ack fro" running $u!er #lients regarding certain keys.

very encouraging F 8e can only ex!ect to see "ore and "ore logical !hrases a!!earing in the future. Ulti"ately though, our results are not definitive enough to rule out the loo"ing thought that 3odiac "ade a Z,unkZ ci!her "erely to infuriate a tea" of college students, so"e 6* years after his ti"e.

00 CONC%USIONS AND FUTURE DIRECTION


C0viously, our 8ork re>uires "ore t8eaking, creative ins!iration, and hard 8ork 0efore any hard conclusions "ay 0e reached. <s the 6/*Fci!her nothing 0ut ,unk[ =or the hundreds of hours < have s!ent to8ards the !ro0le", < ho!e not. No8, of course, 8e have discovered "any "ore directions 8hich 8e 8ill take 8hile 8orking to8ards a solution. The ci!her "ay read to!FtoF0otto", 0otto"FtoFto!, even as a s!iral, for all 8e kno8. The ci!her "ay actually 0e t8o distinct ci!hers, divided at line .*, 8hich 0oth starts and ends 8ith the sy"0ol GFH. The ci!her "ay even 0e half a real ci!her, and half ,unk. 2e 8ill continue to 8ork to exa"ine all !ossi0ilities. Cur next ste! is to create a "ore !erfect scoring algorith". Cur !lan is to use a ;enetic Algorith" to 0reed a scoring algorith". 2e 8ill do this 0y creating a "ultitude of !ro!erties 8hich "ay i"!act the score : the "ore, the 0etter. 2e 8ill then rando"ly generate a large !o!ulation of different co"0inations of as!ects, 8herein each as!ect has a different 8eight to contri0ute to the score. 7y running the large !o!ulation of scoring algorith"s against "any 0odies of %nglish text, 8e can 0reed a scoring algorith" 8hich is "axi"iAed over as "any 0odies of text as !ossi0le. $uch a scoring algorith" 8ill a!!roach an algorith" to deter"ine if a 0ody of text is %nglish or not, and therefore 8ill 0e a0le to give us 0etter results as to 8hether or not 8e have created a good key, and 8ill 0e extre"ely !o8erful 8hen cou!led 8ith our keyF generating ;A. =or no8, our strongest conclusion is >uite si"ilar to our starting !ointN 8e have "uch to do, and the 8ill!o8er to do it. 2e look for8ard to our future !rogress, and to eventually discover the truth 0ehind the 3odiac 4iller5s 6/*Fcharacter ci!her.

? RESU%TS
As the tuning !rocess for our 3odiac ecoder !rogra" is ongoing, the o0vious desired result of a .**M certainly cracked ci!her is still far fro" o0tained. As you exa"ine our !rogress, though, < 0elieve you 8ill realiAe that our atte"!t has 0een one of the "ost so!histicated to date. The "ost i"!ortant result 8e have collected is that a genetically "utating algorith" is the strongest decry!ting !rocess 8e have utiliAed. Cur 8ork no8 8ill deal "ainly 8ith fine tuning our scoring "ethods. 2ithout !ro!er scoring, even the "ost fit algorith" is useless. 2hile 8e have co"e u! 8ith a "ultitude of ideas for the 0est 8ay to score, 8e have concluded that a co"0ination of 0iFgra", triF gra", and 8ord fre>uencies is !ara"ount. 2e can ex!ect there 8ill 0e a fe8 key Zsolving 8ordsZ that 8ill undou0tedly aid the ci!her cracking !rocess. These Z8ordsZ co"e "ainly fro" our "assive collection of 8ritings confir"ed to 0e fro" the Aodiac killer hi"self. 7eing that our efforts include the killer5s !ersonal 8ritings as a source looku! dictionary, 8e can ex!ect to see so"e of these Zsolving 8ordsZ to 0e the catalyst that 8ill 0reak the decry!ting 8ide o!en I"uch in the 8ay that Z<likekillingZ gave a foothold to the first solutionJ. Cn that note, there have 0een certain !hrases a!!earing in our keyF>solution atte"!ts, 8hich is

.. INDE;

A55li # Cry5'"*ra5&y4 the use of algorith"s to decry!t "essages. S)<('i')'i"n Ci5& r4 an encoded "essage in 8hich sy"0ols transcri0e to "eaningful sy"0ols. Z"#iac 7ill r4 infa"ous .?'*5s serial killer res!onsi0le for the ci!her 8e 8ork to solve. G n 'ic Al*"ri'&!4 an algorith" 8hich follo8s suit to evolution and nature in atte"!t to 0reed a desired result. In' lli* n' $r)' F"rc Al*"ri'&!4 an algorith" 8hich uses a 0rute force a!!roach to !lace nFgra!hs in a 0lock of text in a 8ay in 8hich they fit. Ran#"! $r)' F"rc Al*"ri'&!4 an algorith" 8hich uses a 0rute force a!!roach to rando"ly generate a solution. C&arac' r Fr =) ncy Analy(i(4 an analysis of the fre>uencies of all characters in a 0lock of text. Na')ral %an*)a* Pr"c ((in*4 a su0field of artificial intelligence and co"!utational linguistics. <t studies the !ro0le"s of auto"ated generation and understanding of natural hu"an languages. Natural language generation syste"s convert infor"ation fro" co"!uter data0ases into nor"alFsounding hu"an language, and natural language understanding syste"s convert sa"!les of hu"an language into "ore for"al re!resentations that are easier for co"!uter !rogra"s to "ani!ulate. K+L Di('ri<)' # Pr"c ((in*4 a co"!utational "ethod in 8hich the co"!utational 8ork re>uired to solve a given !ro0le" is distri0uted 0et8een a nu"0er of !rocessors.

ourselves to the cause, $hiner 7ock, and the #ollege of %ngineering at the University of North Texas.

03 REFERENCES
K.L ;rays"ith, 9., G3odiacH, .?-(. K)L 888.Aodiackiller.co" K6L 7rax #isco
K/L htt!NOOen.8iki!edia.orgO8ikiONaturalPlanguageP!rocessing

as of ..O6*O*'

02 SPECIA% THAN7S
The authors 8ish to thank, first and fore"ost, our talented grou! of students, 8e5d also like to thank 4evin 4night, 9ada Eihalcea, 7rax #isco, 9o0ert ;rays"ith, Er. #ho!sticks, the co""unity at >>>.?"#iackill r.c"!, all of our friends and fa"ily 8ho have !ut u! 8ith us 8hile 8e dedicate

You might also like