Knock Knock Jokes

UNIVERSITY OF CINCINNATI
May 24, 2004 Date:___________________
Julia Michelle Taylor I, _________________________________________________________,

hereby submit this work as part of the requirements for the degree of:
Master of Science
in:
Computer Science
It is entitled:
Computational Recognition of Humor in a Focused Domain
This work and its defense approved by:

Dr. Lawrence Mazlack Chair: _______________________________ Dr. Carla Purdy _______________________________ Dr. John Schlipf _______________________________
Dr. Michele Vialet _______________________________
_______________________________
Computational Recognition Of Humor In A Focused Domain

A thesis submitted to the Division of Research and Advanced Studies of the University of Cincinnati in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in the Department of Electrical and Computer Engineering and Computer Science of the College of Engineering 2004 by Julia Taylor B.S., University of Cincinnati, 1999 B.A., University of Cincinnati, 1999 Committee Chair: Dr. Lawrence Mazlack
Abstract. With advancing developments of artificial intelligence, humor researchers have begun to look at approaches for computational humor. Although there appears to be no complete computational model for recognizing verbally expressed humor, it may be possible to recognize jokes based on statistical language recognition techniques. This is an investigation into computational humor recognition. It considers a restricted set of all possible jokes that have wordplay as a component and examines the limited domain of Knock Knock jokes. The method uses Raskin's Theory of Humor for its theoretical foundation. The original phrase and the complimentary wordplay have two different scripts that overlap in the setup of the joke. The algorithm deployed learns statistical patterns of text in N-grams and provides a heuristic focus for a location of where wordplay may or may not occur. It uses a wordplay generator to produce an utterance that is similar in pronunciation to a given word, and the wordplay recognizer determines if the utterance is valid by using N-gram. Once a possible wordplay is discovered, a joke recognizer determines if a found wordplay transforms the text into a joke.
Acknowledgments
I would like to express my sincere gratitude to Dr. Lawrence Mazlack, who not only made this project possible, but also very enjoyable. His advice, patience, ideas, and many late evenings of arguments and inventions are only a few reasons in a very long list. Thank you!
I would like to thank the Thesis committee, Dr. John Schlipf, Dr. Michele Vialet and Dr. Carla Purdy. This work has greatly benefited from your suggestions.
Thanks are due to Electronic Text Center at the University of Virginia Library for the permission to use their texts in the experiments. To Dr. Graeme Ritchie, thank you for your comments in the initial stage of the project, and making your research available. I would also like to thank Adam Hoffman for allowing the flexibility in time that made it possible to complete this thesis. The list would not be complete without G.I. Putiy, who has been an inspiration for many years.
I would like to thank my parents, Michael and Tatyana Slobodnik, and my brother Simon for their love, encouragement, and support in too many ways to describe.
Last but not least, a sincere thank you to my husband, Matthew Taylor, without whose love, help, understanding and support I would be completely lost.
Table of Content List of Tables ................................................................................................. 4 1 Introduction ................................................................................................ 5 2 Background................................................................................................. 7

2.1 Theories of Humor................................................................................................... 7 2.1.1 Incongruity-Resolution Theory ........................................................................ 8 2.1.2 Script-based Semantic Theory of Humor ...................................................... 12 2.1.3 General Theory of Verbal Humor.................................................................. 17 2.1.4 Veatchs Theory of Humor ............................................................................. 21 2.2 Wordplay Jokes...................................................................................................... 24 2.3 Structure of Jokes .................................................................................................. 26 2.3.1 Structural Ambiguity in Jokes........................................................................ 26 2.3.1.1 Plural and Non-Count Nouns as Ambiguity Enablers................................. 26 2.3.1.2 Conjunctions as Ambiguity Enablers........................................................... 28 2.3.1.3 Construction A Little as Ambiguity Enabler ........................................... 28 2.3.1.4 Can, Could, Will, Should as Ambiguity Enablers........................................ 28 2.3.2 The Structure of Punchline ............................................................................. 29 2.4 Computational Humor .......................................................................................... 35 2.4.1 LIBJOG ............................................................................................................ 35 2.4.2 JAPE.................................................................................................................. 36 2.4.3 Elmo .................................................................................................................. 37 2.4.4 WISCRAIC....................................................................................................... 38 2.4.5 Ynperfect Pun Selector.................................................................................... 40 2.4.6 HAHAcronym .................................................................................................. 41 2.4.7 MSG .................................................................................................................. 42 2.4.8 Tom Swifties ..................................................................................................... 43 2.4.9 Jester ................................................................................................................. 44 2.4.10 Applications in Japanese .............................................................................. 44
3 Statistical Measures in Language Processing........................................ 46

3.1 N-grams................................................................................................................... 46 3.2 Distant N-grams ..................................................................................................... 49
4 Possible Methods for Joke Recognition ................................................. 50

4.1 Simple Statistical Method...................................................................................... 50 1
4.2 Punchline Detector................................................................................................. 51 4.3 Restricted Context ................................................................................................. 52
5 Experimental Design ................................................................................ 54 6 Generation of Wordplay Sequences ....................................................... 56 7 Wordplay Recognition ............................................................................. 61 8 Joke Recognition ...................................................................................... 64
8.1 Wordplay in the Beginning of a Punchline.......................................................... 65 8.2 Wordplay at the End of a Punchline .................................................................... 66 8.3 Wordplay in the Middle of a Punchline............................................................... 67
9 Training Text ............................................................................................ 67

9.1 First Approach ....................................................................................................... 67 9.2 Second Approach ................................................................................................... 68 9.3 Third Approach ..................................................................................................... 69 9.4 Fourth Approach ................................................................................................... 71 9.5 Fifth Approach ....................................................................................................... 72
10 Experimentation and Analysis........................................................... 73

10.1 10.2 Training Set ..................................................................................................... 73 Alternative Training Set Data Test ............................................................... 76
10.3 General Joke Testing ...................................................................................... 76 10.3.1 Jokes in the Test Set with Wordplay in the Beginning of Punchline ..... 79 10.3.2 Jokes in the Test Set with Wordplay in the Middle of a Punchline ....... 81 10.4 Testing Non-Jokes........................................................................................... 82
11 Summary .............................................................................................. 86
12 Possible Extensions ............................................................................. 88 13 Conclusion............................................................................................ 90 Bibliography ................................................................................................ 92 Appendix A: Training texts ....................................................................... 97 Appendix B: Jokes Used in the Training Set ........................................ 101 Appendix C: Jokes Used in the Test Set ................................................. 105 Appendix D: KK Recognizer Algorithm Description ........................... 115 Appendix E: A table of Similarity of English consonant pairs using the natural classes model, developed by Stefan Frisch...117 Appendix F: Cost Table developed by Christian Hemplemann .......... 118
List of Tables
Table1: The three-level scale............................................................................................. 23 Table2: Subset of entries of the Similarity Table, showing similarity of sounds in words between different letters..................................................................................58 Table3: Examples of strings received after replacing one letter from the word water and their similarity value to water..........................................................................60 Table4: Training Jokes results........................................................................................... 74 Table5: Unrecognized jokes in the training set ................................................................. 75 Table6: Results of the Joke Test Set.................................................................................. 79 Table7: Non-joke results ................................................................................................... 84
Introduction
Thinkers from the ancient time of Aristotle and Plato to the present day have strived to discover and define the origins of humor. Most commonly, early definitions of humor relied on laughter: what makes people laugh is humorous. Recent works on humor separate laughter and make it its own distinct category of response. Today there are almost as many definitions of humor as theories of humor; as in many cases, definitions are derived from theories [Latta, 1999]. Still, we are unsure of complete dimensions of the concept [Keith-Spiegel, 1972]. Some researchers say that not only there is no definition that covers all aspects of humor, but also humor is impossible to define [Attardo, 1994].
Humor is an interesting subject to study not only because it is difficult to define, but also because sense of humor varies from person to person. Not only does it vary from person to person; but the same person may find something funny one day, but not the next, depending on what mood this person is in, or what has happened to him or her recently. These factors, among many others, make humor recognition challenging.
Although most people are unaware of the complex steps involved in humor recognition, a computational humor recognizer has to consider all these steps in order to approach the same ability as a human being.
A common form of humor is verbal, or verbally expressed, humor. Verbally expressed humor can be defined as humor conveyed in language, as opposed to physical or visual humor, but not necessarily playing on the form of the language [Ritchie 2000]. Verbally expressed humor is easier to computationally analyze as it involves reading and understanding texts. While understating the meaning of a text may be difficult for a computer, reading it is not an issue.
One of the subclasses of verbally expressed humor is the joke. Hetzron [1991] defines a joke as a short humorous piece of literature in which the funniness culminates in the final sentence. Most researchers agree that jokes can be broken into two parts, a setup and a punchline. The setup is the first part of the joke, usually consisting of most of the text, which establishes certain expectations. The punchline is a much shorter portion of the joke, and it causes some form of conflict. It can force another interpretation on the text, violate an expectation, or both [Ritchie, 1998]. As most jokes are relatively short, and, therefore, do not carry a lot of information, it should be possible to recognize them computationally.
Computational recognition of jokes seems to be possible, but it is not easy. An intelligent joke recognizer requires world knowledge to understand most jokes.
Computational work in natural language has a long history. Areas of interest have included: translation, understanding, database queries, summarization, indexing, and
retrieval.
There has been very limited success in achieving true computational
understanding.
A focused area within natural language understanding is verbally expressed humor. As Ritchie [1998] states, It will probably be some time before we develop a sufficient understanding of humour, and of human behaviour, to permit even limited form of jokes to lubricate the human-computer interface. The goal of creating a robot that is
sufficiently human to use humour in a way that makes sense or appears amusing is a long term one.
Background
Joke examples in all sections are taken from the papers discussed, unless specifically noted. The examples are unmodified from their appearance in the papers.
2.1
Theories of Humor
Just as there are many definitions of humor, there are many theories that cover the different aspects of humor. The three major classes of these theories are: incongruitybased, disparagement-based (sometimes called the superiority theory), and release-based.
Incongruity-based theories suggest that humor arises from something that violates an expectation. Many supporters of incongruity in humor have emphasized the importance of surprise in a joke [Raskin, 1985].
Disparagement, or Superiority, theories are based on the observation that people laugh at other peoples infirmities, especially if they are enemies [Suls, 1976]. This class of theories of humor goes back to Plato (Philebus) and Aristotle (Poetics), who maintained that people laugh at the misfortunes of others for joy that they do not share them [Raskin, 1985] [Attardo, 1994].
Release/relief theories explain the link between humor and laughter. The principle for release-based theory is that laughter provides relief for mental, nervous and psychic energy, and this ensures homeostasis after a struggle, tension, and strain [Raskin, 1985]. The most influential proponent of a release theory is certainly Freud [Attardo, 1994, p.50].
2.1.1
Incongruity-Resolution Theory
There are many incongruity theories. All theories state in one way or another that humor consists of the juxtaposition of the incongruous. There is a debate whether incongruity alone is sufficient for laughter. Some researches argue that incongruity is the first step of a multi-stage process, and that a retrieval of information resulting in satisfactory resolution of incongruity is a necessary step for a humorous response [Suls, 1976]
[Ritchie, 1999]. This theory is called Incongruity-Resolution (IR) theory, since it requires an extra step: resolution of the incongruity; to get the joke one must make an indirect connection between the incongruity and the resolution.
There are different ways to create and resolve incongruity. Ritchie [1999] addresses two different models for incongruity resolution: the two-stage model of Suls [1972] and the surprise disambiguation (SD) model.
The surprise disambiguation model [Ritchie, 1999] states that the setup part of a joke has two interpretations, one of which is obvious the other more vague. Once the punchline is reached, the audience becomes aware of the second, hidden meaning of the setup. The meaning of the punchline conflicts with the first obvious interpretation but is compatible with the second meaning, so the audience is forced into adopting the second meaning.
Joke1: Postmaster: Heres your five-cent stamp Shopper (with arms full of bundles): Do I have to stick it on myself? Postmaster: Nope. On the envelope.
The first more obvious meaning of the setup of Joke1 is that the shopper needs help putting the stamp on the envelope. When the punchline is read, the second, hidden meaning is evoked: putting a stamp on the shopper, as opposed to putting it on the envelope.
Ritchie argues that to process a joke similar in format to Joke1, the processor must be able to analyze the setup. It should then predict the meaning of a likely continuation of the text. It must also detect the punchline, no matter how long it is. After detecting the punchline, it must process it and find the hidden meaning of the setup. Based on the predicted meaning, the hidden meaning, and the punchline, the processor determines whether the text is humorous.
The SD model makes use of ambiguity. Suls [1972] two-stage model does not demand any ambiguity to be present in setup. Instead, the punch line creates incongruity, and then a cognitive rule must be found which enables the content of the punch line to follow naturally from the information established in the setup [Ritchie, 1999]. The following algorithm is used to process a joke using two-stage model [Ritchie, 1999]:
As a text is read, make predictions While no conflict with prediction, keep going If input conflicts with prediction: o If not ending PUZZLEMENT o If is ending, try to resolve: No rules found PUZZLEMENT Cognitive rules found HUMOR
Since no ambiguity has to exist in the setup, the two-stage model can be used to analyze Joke2. The SD model cannot be used to do it due to the absence of ambiguity in the joke.
10
Joke2: Fat Ethel sat down at the lunch counter and ordered a whole fruit cake. Shall I cut it into four or eight pieces? asked the waitress. Four, said Ethel, I m on a diet.
Even though it seems like absence of the necessary ambiguity is a plus, the two-stage model has some problems. It is unclear how much of a conflict a text should contain to make it count as a punchline. There is no clear definition of a cognitive rule. Finally, what type of cognitive rule governs the creation of a humorous resolution rather than just a misunderstanding? This question, slightly modified, can be directed to most IR theories: how can one tell if a stated ambiguity created a joke or just a misunderstanding [Ritchie, 1999]?
Some researchers [Rothbart, 1976] suggest that incongruity and resolution should have a further classification system. It is suggested that there should be at least two categories of incongruity, possible and impossible incongruity; and, two categories of resolution, complete and incomplete resolution. In Rothbart [1976] the terms are defined as follows:
Impossible Incongruity: elements that are unexpected and also impossible given ones current knowledge of the world. Possible Incongruity: elements that are unexpected or improbable but possible
11
Complete Resolution: the initial incongruity follows completely from resolution information Incomplete Resolution: the initial incongruity follows from resolution information in some way, but is not made completely meaningful because the situation remains impossible.
To illustrate this idea, consider Joke3:
Joke3: Why did the cookie cry? Because its mother had been a wafer so long
The authors provide this explanation:
There are two elements of incongruity, the fact that cookies dont cry and the initial incongruity or surprisingness of the answer to the riddle. The answer contains its own resolution the phonological ambiguity of a wafer (i.e. away for), but also adds the additional incongruity of a cookie having a mother.
2.1.2
Script-based Semantic Theory of Humor
One of the leading theories of verbal humor is Victor Raskins Script-based Semantic Theory of Humor (SSTH) [Raskin, 1985]. It is designed to be neutral with respect to
12
three groups of humor theories. SSTH is easily compatible with most theories from the three groups.
SSTH is a script-based theory. A script is an enriched, structured chunk of semantic information, associated with word meaning and evoked by specific words [Raskin, 1985, p.99]. Attardo [1994] cites Raskin when saying that the main hypothesis of the theory is:
A text can be characterized as a single-joke-carrying text if both of the [following] conditions are satisfied. (i) scripts. (ii) The two scripts with which the text is compatible are The text is compatible, fully or in part, with two different
opposite. The two scripts with which some text is compatible are said to overlap fully or on part on this text. The set of two conditions [above] is proposed as necessary and sufficient conditions for a text to be funny.
From the hypothesis above, it is clear that verbal humor is based on ambiguity that is deliberately created. However, ambiguity itself is not enough: the scripts must not only be opposed, they must do so unexpectedly. Some examples of different oppositions are: good : bad, real : unreal, money : no money, life : death, etc [Raskin, 1985].
13
To show how SSTH works, Raskin [1985] analyzes Joke4.
Joke4: Is the doctor at home? the patient asked in his bronchial whisper. No, the doctors young and pretty wife whispered in reply. Come right in.
The first step in analyzing the joke, is listing the scripts evoked by the text of the joke clause by clause [Attardo, 1994] [Raskin, 1985]:
Word:
Script:
(i)
IS=BE v.
1. EQUAL OR BELONG TO A SET 2. EXIST 3. SPATIAL 4. MUST
(ii) THE det
1. DEFINITE 2. UNIQUE 3. GENERIC
(iii) DOCTOR n
1. ACADEMIC 2. MEDICAL 3. MATERIAL
14
4. MECHANICAL 5. INSECT
(iv) AT
prep
1. SPATIAL 2. TARGET 3. OCCUPATION 4. STATE 5. CAUSE
(v)
HOME n
1. RESIDENCE 2. SOCIAL 3. HABITAT 4. ORIGIN 5. DISABLED 6. OBJECTIVE
The second step is the launch of combinatorial rules [Raskin, 1985] [Attardo, 1994]. It means that each clause is looked at for scripts that are evoked by two or more words. In this example, the script SPATIAL is evoked by words IS and AT. SPATIAL is the only script that is evoked more than once by the five words above. After combinatorial rules are defined, the inferences they trigger are looked at. Since AT is SPATIAL, HOME can be either RESIDENCE or HABITAT, but nothing else. Continue using combinatorial rules and
15
inferences, until possible meanings of the first sentence are found. In the example, the two possible meanings are [Attardo, 1994]:
(i) Question: The unique proprietor of a family residence who is a physician is physically present in the residence. (ii) Question: The unique proprietor of a family residence who has the doctoral degree is physically present in the residence.
Since there is more than one meaning to the first sentence, the combinatorial rules will register the ambiguity.
The second step is repeated until the entire text is analyzed, applying combinatorial rules and inferences. In this example, a script for DOCTOR will be found. The analysis of combinatorial rules will also produce a question: Why does the doctors wife want the patient to come in? To answer this, combinatorial rules will have to be used to go back and search for another possible script. If the second script is found, and the second script is opposite in meaning to the first one, there is a scripts opposition, and the text is a joke. If the second script is not found or it does not oppose in any way the first one, the text is not a joke. In this example, the second script LOVER will be found. Since LOVER is opposite in the meaning to DOCTOR, the text is a joke [Attardo, 1994] [Raskin, 1985].
SSTH is the first formal theory [Ruch, 2001]. A formal theory [Simpson, 2001] is a structure of primitives, axioms, and theorems. Primitives are a small collection of
16
predicates, which are regarded as basic for a given field of study. Primitives delimit the scope of the theory. Axioms are basic or self-evident formulas. Theorems are formulas that are logical consequences of the axioms.
According to Attardo [1994, p.208], SSTH has some drawbacks. They become evident when an attempt is made to apply it to texts other than jokes. The other limitation is that SSTH is a semantic theory only. While it can determine if a short text is a joke, it cannot tell how similar two jokes are. The General Theory of Verbal Humor answers these questions.
2.1.3
General Theory of Verbal Humor
Unlike Raskins humor theory, which is semantic, the General Theory of Verbal Humor (GTVH) is a linguistic theory.
The General Theory of Verbal Humor [Attardo, 1991] is a combination of the Scriptbased Semantic Theory of Humor and Attardos Five-Level joke representation model. The five levels in Five-Level Model are: Surface, Language, Target + Situation, Template, Basic script opposition and Logical mechanism.
GTVH describes each joke in terms of six Knowledge Resources [Attardo, 1994]:
Script Opposition (SO): deals with script opposition presented in SSTH.
17
Logical Mechanism (LM): accounts for the way in which the two senses (scripts, etc) in the joke are brought together, corresponds to the resolution phase of the incongruity/resolution model. Situation (SI): the props of the joke, the textual materials by the scripts of the joke that are not necessarily funny. Target (TA): any individual or group from whom humorous behavior is expected. Target is the only optional parameter among the six KRs. Narrative Strategy (NS): The genre of the joke, such as riddle, 1-2-3 structure, question and answer, etc, it is rhetorical structure of the text. Language (LA): The actual lexical, syntactic, phonological, etc., choices at the linguistic level that instantiate all the other choices. LA is responsible for the position of the punchline.
Having all six Knowledge Resources defined, a joke, according to Attardo [1994, p.226], can be looked at as a 6-tuple, specifying the instantiation of each parameter.
Joke: {LA, SI, NS, TA, SO, LM}
Two jokes are different if at least one parameter of the six above is different in the jokes.
The other very important aspect of GTVH is the ordering of knowledge resources. The Knowledge Resources are ordered in the following manner:
18
Script Opposition Logical Mechanism Situation Target Narrative Strategy Language
The ordering was experimentally tested in human study described in Ruch [1993]. The study concluded that there is a linear increase in similarity between pairs of jokes selected along the Knowledge Resources hierarchy, with exception of Logical Mechanism. This means that jokes that have all the parameters the same but Script Opposition, are less similar than jokes that have all the parameters the same but Situation, are less similar than jokes that have all the parameters the same but Target, are less similar than jokes that have all the parameters the same but Narrative Strategy, are less similar than jokes that have all the parameters the same but Language.
In addition, the larger the number of parameters that the jokes have in common, the more similar they are.
19
Joke ordering does not only effect the similarity of jokes, but also affects the choice of parameters in the hierarchy: Script Opposition limits the choice of Logical Mechanism, which limits the choice if Situation, which limits the choice of Target, which limits the choice of Narrative Strategy, which limits the choice of Language. An example that Attardo [1994] uses is: the choice of the Script Opposition of Dumb/Smart will determine the choice of the Target (in North-America to Poles, etc).
To illustrate GTVH Joke5 and Joke6 are examined:
Joke5: How many Poles does it take to screw in a light bulb? Five. One to hold the light bulb and four to turn the table hes standing on.
Joke6: How many Poles does it take to wash a car? Two. One to hold the sponge and one to move the car back and forth.
Both jokes have the same Script Opposition (Dumb/Smart), same Logical Mechanism (figure-ground reversal), same Target (Poles), same Narrative Strategy (riddle), same Language, but different Situation: in Joke5 the Situation is light bulb, but in Joke6 it is car wash. Conclusion: as the two jokes only differ in one Knowledge Resource, they are considered very similar.
Consider another joke:
20
Joke7: The number of Polacks needed to screw in a light bulb? Five One holds the bulb and four turn the table.
This joke has the same parameters as Joke5 but Language. Conclusion: Joke5 and Joke7 are very similar since they only differ in one Knowledge Resource. However, since Language comes after Situation in the hierarchy, Joke5 and Joke6 are less similar than Joke5 and Joke7. One the other hand, Joke6 and Joke7 have two different parameters, Situation and Language. They have less similarity that Joke5 and Joke7 or Joke5 and Joke6.
2.1.4
Veatchs Theory of Humor
Veatchs Theory of Humor [Veatch, 1998] is based on the concept that humor is a form of pain that does not hurt. It requires three conditions that are, individually, necessary and, jointly, sufficient for humor to occur. The conditions of this theory describe a subjective state of apparent emotional absurdity, where the perceived situation is seen as normal, and where, simultaneously, some affective commitment of the perceiver to the way something in the situation ought to be is violated. The three conditions are violation (V), normal (N), and simultaneity of V and N. They are defined as:
V: The perceiver has in mind a view of the situation as constituting a violation of some affective commitment of the perceiver to the way
21
something in the situation ought to be. That is, a subjective moral principle of the perceiver is violated.
N: The perceiver has in mind a predominating view of the situation as being normal.
Simultaneity: The N and V understandings are present in the mind of the perceiver at the same instant in time.
Veatch [1998] defines a subjective moral principle, as a principle, in which an individual both has an affective commitment to and believes, ought to hold. According to the theory, a person laughs at something if he finds that there is something wrong or if there is a violation of a norm or taboo, yet this wrong is perceived as being ok. If there is no violation, or if the violation creates a conflict that a person is not comfortable with, a situation is not humorous. An example that Veatch provides is a person that has recently suffered a car accident will not think that traffic violation jokes are funny. However, somebody who takes them lightly will. This explains why a joke can be funny to one group and not funny to another or to the same person at different times. The theory offers two predictions:
Prediction 1: If X finds a situation funny where some principle is violated, and Y instead finds it to be offensive, frightening, or threatening,
22
then we should find that Y is more attached to the principle violated than X, not vice versa.
Prediction 2: If on the other hand, some perceiver Z finds the aforementioned situation unremarkable, then we should find that Z has no personal moral attachment to principles violated; we should not find, for example, that Z is more attached to them than the X is who finds it funny.
It is possible to tell something about ones sense of humor, and about ones values, using the above predictions. Having some information about ones moral principles, this theory should provide with an answer to whether a person will find a joke funny. Veatch [1998] constructs a table of perceiver reactions to the joke, given his commitment to violation and some of the principles.
Level Level 1 Level 2 Level 3
Logic Not-V V and N V and not-N
Commitment None Weak Strong
Gets No Yes Yes
Perceiver Is Offended Sees Humor No No No Yes Yes No
Table1: The three-level scale*
Table1 shows that if there is no violation of principles in a joke, a perceiver does not find the joke humorous. A joke is also not humorous if a commitment to a violated principle is strong, and a perceiver does not find the situation normal; although the perceiver
Taken from Veatch [1998]
23
understands that structurally it is a joke, he is offended by it and not see any humor. The only situation when a perceiver sees humor is when there is a violation, yet the perceivers commitment to it is weak; therefore, he is not offended by it and sees humor.
Although this theory is difficult to use as a basis for computational humor, it can be used to predict whether a joke is funny to someone whose moral principles are known.
2.2
Wordplay Jokes
Wordplay jokes, or jokes involving verbal play, are a class of jokes depending on words that are similar in sound, but being used in two different meaning. The difference between the two meanings creates conflict or breaks expectation, and is humorous. The wordplay can be created between two words with the same pronunciation and spelling, with two words with different spelling but the same pronunciation, and with two words with different spelling and similar pronunciation. For example, in Joke8 the same word is the focus of the joke. This means that the conflict is created because the word has two meanings, while the pronunciation and the spelling stay the same. In Joke9 and Joke10 the wordplay is between words that sound nearly alike.
Joke8: Cliford: The Postmaster General will be making the TOAST. Woody: Wow, imagine a person like that helping out in the kitchen!
Joke9: Diane: I want to go to Tibet on our honeymoon.
24
Sam: Of course, we will go to bed.
Joke10: Emil is not something you name your kid; it is something that you eat
Sometimes it takes world knowledge to recognize which word is subject to wordplay. For example, in Joke9, there is a wordplay between Tibet and to bed. However, to understand the joke, the wordplay by itself is not enough, a world knowledge is required to link honeymoon with Tibet and to bed.
Some jokes require more wordplay than others. For example, Joke11 requires an understanding that From Concentrate is written on most of the orange juice containers. Joke12 requires knowledge that patients frequently wait long periods of time before admitted to see a doctor:
Joke11: Butch: Hey, Stupid Steve, why are you staring at that carton of orange juice? Stupid Steve: Because it says Concentrate
Joke12: Nurse: I need to get your weight today. Impatient patient: Three hours and twelve minutes.
Joke8, Joke9, Joke10 are taken from TV show Cheers Joke11 and Joke12 are taken from The Original 365 Jokes, Puns & Riddles Calendar, 2004
25
A focused form of wordplay jokes is Knock Knock jokes. In Knock Knock jokes, wordplay is what leads to the humor. The structure of the Knock Knock jokes provides pointers to the wordplay.
2.3
Structure of Jokes
After looking at numerous theories, an observation can be made that many jokes, but not all, are based on ambiguity. Sometimes ambiguity is based on one word; sometimes an entire sentence is ambiguous. For a joke to exist, there has to be a secondary meaning to it. In script-based theory, the overlap of two scripts is the ambiguity; in the IR theory ambiguity has to exist in the setup.
2.3.1
Structural Ambiguity in Jokes
Oaks [1994] discusses some devices for creating structural ambiguity in jokes. All jokes in the section are examples used by Oaks, taken from Clark [1968].
2.3.1.1 Plural and Non-Count Nouns as Ambiguity Enablers
One of ambiguity enablers is the set of plural and non-count nouns. Since these nouns do not require indefinite articles, it is easy to confuse them with other parts of speech, such as verb and adverbs. An example that Oaks [1994] provides is the headline British Left Waffles On Falkland Island. It is unclear whether the political left in England can't
26
make up its mind on the Falklands or the mess sergeants didn't clean up after breakfast. Depending on whether the word left is a noun or a verb, the word waffles becomes either a noun or a verb, resulting in different meanings of the sentence.
The second reason to use plural nouns is that it is difficult to delineate between Verb + Noun and Adjective + Noun. Consider this example:
Joke13: Question: Whats worse than raining cats and dogs? Answer: Hailing taxis.
If a singular noun was used (taxi), it would be possible to distinguish between hailing a taxi (Verb + Noun), and a hailing taxi(Adjective + Noun).
Non-count nouns are as powerful in creating ambiguity:
Joke14: Diner: This coffee is like mud. Waiter: Well, it was ground this morning!
In Joke14 a non-count noun ground can be either a verb or a noun.
http://www.yourdictionary.com/library/ling004.html
27
2.3.1.2 Conjunctions as Ambiguity Enablers
Conjunctions may also play the role of ambiguity enablers. In the Joke15 it is unclear if the word flies is used as a verb or noun. Notice, that flies is in plural form.
Joke15: Question: What has four wheels and flies? Answer: A garbage truck.
2.3.1.3 Construction A Little as Ambiguity Enabler
The next ambiguity enabler discussed by Oaks [1994] is the construction a little.
Joke16: Romeo (as he threw stones into the stream): I am merely a pebble in your life. Juliet: Why dont you try being a little boulder [bolder]?
In Joke16 a little together with the use of the suffix er creates an ambiguity as to the form of boulder, it can be used as a noun or an adverb.
2.3.1.4 Can, Could, Will, Should as Ambiguity Enablers
Modals such as can, could, will, should, etc are also important enablers. To quote Oaks [1994], they are not only important because they do not themselves carry an inflectional
28
ending even when used with a third person singular, but because they must also be followed by a verb in its bare infinitival form. Tense shifting is another good way to create ambiguity, since the is no difference between past tense singular and past tense plural, except for the forms of be; there is also no difference between future tense singular and plural.
It should be possible to recognize structural ambiguity in texts, using the enablers discussed in this section. However, very few jokes are based on structural ambiguity. While structural ambiguity may be a sufficient condition for a joke, it is not a necessary condition. Therefore, if structural ambiguity is not found it in a text, a conclusion cannot be drawn that the text is not a joke. On the other hand, if structural ambiguity is present, the text contains a setup and a punchline, and ambiguity is found in the setup, the given text is humorous.
2.3.2
The Structure of Punchline
Hetzron [1991] offers an analysis of punchlines in jokes.
Joke17: Russian offices are served beer in an Eastern European tavern. They order beer. The waiter places coasters in the table and serves the beer. Later they order another round. The water returning with the beer finds no coasters. OK, he tells himself, these are collectors, and puts down another set of coasters. When the third round is ordered and
29
brought out, there are again no coasters. Angry, the waiter puts the beer down on the table, but places no more coasters. One of the Russian officers protests: What is it? No more crackers?
According to Hetzron [1991], Joke17 works because at the level of a narrative, the audience does not know why the coasters disappeared. Until the very end of the text the audience is lead to believe that the Russian officers are collectors; and what is more important, until the very end, there is no hint that somebody may find coasters edible. Hetzron argues, that what makes a text a joke is that the tension should reach its highest level at the very end. No continuation relieving this tension should be added [Hetzron, 1991 p.66]. He defines a joke as a short humorous piece of literature in which the funniness culminates in the final sentence, called the punchline.
Jokes can be divided into three categories: straight-line, dual, and rhythmic. A straightline joke is a joke where one successive episode culminates in a punchline [Hetzron, 1991].
Joke18: A woman goes to the rabbi: Rabbi, what shall I do so that I wouldnt become pregnant again? The rabbi says: Drink a glass of cold water! Before or after? The rabbi replies: Instead!
The answer of the rabbi in Joke18 is unexpected. The audience expects to hear some advice about protection; but he suggests water. Water is not something that prevents
30
pregnancies, however, until the very last word of the joke is heard, the audience does not think that the given advice is about abstinence. In this joke the expectation grows in linear fashion: there is no quick drops right before or after the high point of the joke.
The second type of joke is the dual joke: it contains two pulses, often in contrast [Hetzron, 1991].
Joke19: The Parisian Little Moritz is asked in school: How many deciliters are there in a liter of milk? He replies: One deciliter of milk and nine deciliters of water In France, this is a good joke; in Hungary, this is good milk.
This joke has a punchline (the first pulse) and a commentary afterwards (the second pulse), which turns out to be funnier than the punchline. The other kind of dual joke is the Mixing Domain [Hetzron, 1991] joke. Hetzron gives an example of the good newsbad news joke:
Joke20: Mr. Rabinowitzs son studies at Berkeley. When his neighbor goes for a visit in the area, Rabinowitz asks him to look up his son. The neighbor comes back telling: I have good news and bad news. Which one do you want to hear first? Mr. R. first wants the bad news. The neighbor says: I am sorry to say, your son has become gay. This is the
31
bad news. The good news is: he has found for himself such a bekhoved [respectable] Jewish doctor.
Joke20 works because of contrast that occurs between two pulses, which can also occur in other shapes.
The third type of joke is a rhythmic joke [Hetzron, 1991]. Rhythmic jokes contain at least three pulses. The number three is very popular in jokes. Very often three-pulse jokes contain repetitions. The first two pulses set a pattern for the third pulse, while the third pulse breaks the pattern. There are jokes where a sensation of automation is produced [Hetzron, 1991]. For example, the same answer to different questions creates an even rhythm, yet the answers may be contradictory. Sometimes automation is produced by trap-questions, where the answers come from a real audience, and all correct answers but last sound similar. The audience gets into a pattern of answering in rhythm, and gives a rhythmic but incorrect answer to the last question. Sometimes pulses in jokes are less and less expected as the joke progresses: each pulse is funny, but the next one is even funnier. Sometimes pulses are not separate episode, but part of enumeration [Hetzron, 1991]. An example of a joke with separate pulses is Joke21:
Joke21: A newspaper reporter goes around the world with his/her investigation. He/she stops people on the street and asks them: Excuse me Sir/Madam, what is your opinion of the meat shortage? An American asks: What is shortage? The Russian asks: What is
32
opinion? The Pole asks: What is meat? The New York taxidriver asks: What is excuse me?
The other topic discussed in Hetzrons [1991] paper is devices that make punchlines work. Each joke can have one or more of such devices. Jokes can also be divided into two classes: one where an intended absolute meaning is taken to be relative and one where a relative meaning is viewed as if it were absolute [Hetzron, 1991]. Joke22 illustrates an expected absolute identity replaced by a relative one [Hetzron, 1991].
Joke22: Upon seeing a funeral procession: Who died? I believe the one in the hearse.
The next joke illustrates inherently relative used as absolute [Hetzron, 1991].
Joke23: A man approached a policeman walking his beat in the street: Could you tell me please, where is the other side of the street? The policeman points to it over there. The man says: It cant be there, they told me it was over here.
Joke2 is an example of expected relative identity used as absolute [Hetzron, 1991].
Besides being divided into classes, jokes can be divided into different categories [Hetzron, 1991]:
33
Reassignment in time Reassignment to a universe of fulfilled frustration Recombination Dependency reversal Independent existence claimed for dependent element Knowledge of attachment preceded knowledge of base Co-dependent > sub-dependent Retro-communication Foiled expectation Indirect communication, information left unsaid Mixing domains Internal contradiction
Some of the devices are used in wordplay jokes. One of these devices is Recombination. Recombination can be described as change your partner or AB-CD>AD-CB. [Hetzron, 1991]. An example that Hetzron provides is Right wing vs. White ring.
Retro-Communication is another example where wordplay can be used:
Joke24: An American asks a Chinese man: How often do you have elections? The latter answers: Evely day.
34
The devices can be combined in some jokes. For example, here is Hetzrons [1991] analysis of Joke17: First of all, this is a rhythmic joke. Then we have RetroCommunication concerning the anomalous fact that the officers mistake coasters for crackers. Indirect Communication behind this explains that mysterious disappearance of the coasters: they must have been eaten. The communication is also Retroactive.
2.4
Computational Humor
There is no single formalized theory that can be used for a software or computer program to recognize or generate jokes. Several attempts have been made to write programs that can recognize [Yokogawa, 2002] [Takizawa, 1996] or generate [Attardo, 1993] [Binsted, 1996a] [McKay, 2002] [Stock, 2002] [McDonough, 2001] [Lessard, 1992] [Binsted, 1998] humorous text. They are a long way off from a computer making its mark as a standup comedian.
2.4.1
LIBJOG
Attardo and Raskin [Attardo, 1993] created a simple Light Bulb Joke Generator, LIBJOG.
The computation of LIBJOG is theory free. The program uses an entry for commonly stereotyped groups and a Template1:
Template1: How many [lexical entry head] does it take to change a light bulb? [number1]. [number1 number2] to [activity1] and [number2] to [activity2]. 35
Entry1 is an illustration of the algorithm:
Entry1:
(Poles (activity1 hold the light bulb) (number1 five) (activity2 turn the table hes standing on) (number2 four))
This entry produces Joke25.
Joke25: How many Poles does it take to change a light bulb? Five. One to hold the light bulb and four to turn the table hes standing on.
The joke-generating mechanism is very limited: while this joke is technically computer generated, it does not assemble or analyze any features of the joke.
2.4.2 JAPE
JAPE is a computer program created by Binsted [1996a] that generates simple punning riddles. JAPE uses word substitution, syllable substitution and metathesis to create phonological ambiguity. One of the main features of JAPE is its humor-independent lexicon. Each produced joke has a setup and a punchline. The components that make up JAPE are: the lexicon, consisting of fifty-nine words and twenty-one NPs, homophone
36
base, six schematas, fourteen templates, and post-production checker [Ritchie, 2003]. Some of the examples of the jokes produced by JAPE are:
Joke26: What do you call a quirky quantifier? An odd number
Joke27: Whats the difference between money and a bottom? One you spare and bank, the other you bare and spank.
Binsted and Ritchie [1994] note that there is not a lot of humor theory behind JAPE. However, Attardo argues that the program is largely congruent with GTVH [Attardo, 2002b].
2.4.3
Elmo
Elmo [Loehr, 1996] is the natural language robot. Dan Loehr has integrated JAPE pun generator and Elmo. The two are integrated on four different levels [Loehr, 1996]:
Upon a request to tell a joke or a riddle, Elmo issues a query for a pun from JAPE An attempt to make Elmo produce humor relevant to arbitrary user input. Make Elmo produce humor relevant to user input, using a pre-chosen pun
37
Make Elmo return a carefully scripted reply to archive a smoother response.
An example of the third level of integration is given below [Loehr, 1996]:
No more help for me? You say, No more help for me? Elmo says, I dont know what else to say, Dan. Elmo says, What do you call a useless assistant? Elmo says, a lemon aide.
While the integration of the pun generator with natural language interface was successful, it had difficulties producing relevant humor on arbitrary input.
2.4.4 WISCRAIC
Witty Idiomatic Sentence Creation Revealing Ambiguity In Context (WISCRAIC) [McKay, 2002] is a joke generator that focuses on witticisms based around idioms. This program produces jokes and explanations for the created jokes, making it possible for the program to be used as an aid for teaching English idioms to non-native speakers [McKay, 2002].
The program consists of three modules [McKay, 2002]:
38
Joke Constructer the module that contains information about elements of a joke. This module uses dictionary of idioms, dictionary of professions, general dictionary, and lexicon. Surface-form Generator the module that uses grammar to convert an input from Joke Constructor into a joke. Explanation Generator this module takes the elements provided by Joke Constructer and uses grammar to generate an explanation of relations between the elements.
An example of a WISCRAIC joke and explanation given below, is taken from McKay [2002]:
The friendly gardener had thyme for the woman!
The word time, which is part of the idiom [have, time, for, someone] is a homonym of the word thyme. A HOMONYM is a word that sounds like another word. ----| LINK | between thyme and gardener: --------------------------------| thyme is a type of plant | a gardener works with plants
39
friendly, which is associated with the idiom [have, time, for, someone] was selected from other adjectives as it has the highest imagability score: 439
The program was tested with 84% success rate on the WISCRAIC produced jokes.
2.4.5
Ynperfect Pun Selector
Hempelmann [2003a] proposes The Ynperfect Pun Selector, as a complement to general pun generator based on General Theory of Verbal Humor. YPS would use heterophonic puns: puns that use a similar sound sequence. It would take any English word as its input and generates a set of words similar in sound, ordered by their phonological similarity. This output could then be entered into a general pun generator for evaluation of the semantic possibilities of the choices produced by YPS.
For example, dime to denote not just a 10 coin [daym] but paradigmatically also the meaning of [dm] as in the slogan Public transportation: Its a dime good deal. YPSs purpose here is to generate a range of phonologically possible puns given a target word, for example, how we could use not only dam (barrier across waterway) as a homophonic pun to target damn, but also the heterophonic candidates dime (as in the example above), but also damn, dome, dumb, damp, tame, etc. In addition, the selector will evaluate the possible puns in a certain order of phonological proximity to their target [Hempelmann, 2003a].
40
The phonological comparison of a homophonic put to target is the only part of YPS that has been implemented.
2.4.6
HAHAcronym
HAHAcronym: Humorous Agents for Humorous Acronyms is the first computational humor project sponsored by the European Commission. One of the purposes of the project is to show that using standard resources , and suitable linguistic theories of humor , it is possible to implement a working prototype [Stock, 2002]. The main tool used is an incongruity detector/generator. To create a successful output, it is important to detect any variances in meanings between the acronym and its context. A basic resource for the incongruity generator is an independent structure of domain oppositions, such as Religion vs. Technology, Sex vs. Religion, etc. An ANT-based parsing technique was used to parse word sequences.
The project inputs an existing acronym and after comparing the actual meaning and context comes up with humorous parody of it, using the algorithm below [Stock, 2002]:
Acronym parsing and construction of a logical form Choice of what to keep unchanged (typically the head of the highest ranking NP) and what to modify (i.e. adjectives) Look for possible substitutions:
41
o Using semantic field opposition o Keeping the initial letter, rhyme, and rhythm (the modified acronym should sound similar to the original as much as possible) o For adjectives, basing reasoning mainly on WordNet anatomy clustering o Using the a-semantic dictionary.
Some of the examples of the acronym reanalysis are [Stock, 2002]:
MIT (Massachusetts Institute of Technology) Theology
Mythical Institute of
ACM (Association for Computing Machinery) Machinery
Association for Confusing
At the time the paper was published, there was no evaluation of the competed prototype.
2.4.7
MSG
The Mnemonic Sentence Generator (MSG) [McDonough, 2001] is a program built by Craig McDonough that converts any alphanumeric password into a humorous sentence. One of the reasons for creating the program is that passwords are now an integral part of everyday life. The simpler a password is the easier it is to guess. Good passwords consisting of both alphabet and numeric characters are difficult to remember. The system
42
takes an eight character alphanumeric string and turns it into a sentence possibly making it easier to remember than the password itself, which is one of the main requirements. The sentence template consists of two clauses of each four words [McDonough, 2001]:
Template2: (W1 = Person Name) + (W2 = Positive-Verb) + (W3=Person Name + s) + (W4 = Common Noun) + , while + (W5 = Person Name) + (W6 = Negative-Verb) + (W7 = Person-Name + s) + (W8 = Common Noun)
The program combines opposite scripts by using a positive verb in the first clause and a negative verb in the second clause. An example of what the program can do is the following sentence, generated from password AjQA3Jtv: Arafat joined Quayles Ant, while TARAR Jeopardized thurmonds vase [McDonough, 2001].
2.4.8
Tom Swifties
Lessard and Levison [Lessard, 1992] have created an attempt to model a particular type of linguistic humor, Tom Swifties, by means of a sentence generator. Tom Swifties are pun-like utterances ascribed to the character Tom, in which a manner adverb enters into a formal and semantic relation with the other elements in the sentence.
Joke28: I hate seafood, Tom said crabbily.
43
Everything produced by this generator is in the form of Template3:
Template3: SENTENCE said Tom ADV[manner]
The adverb in the Template3 must have a phonetic link to a meaning of at least one word in the SENTENCE, and be semantically related to it. Thus, in Joke28, there is a semantic relationship between the words seafood and crab and a phonetic link between the words crab and crabbily [Lessard, 1992].
2.4.9
Jester
Jester [Goldberg, 2001] is an online-joke recommending system, based on collaborative filtering. A user is given fifteen jokes to rate. After reading and evaluating the jokes, the system uses statistical techniques to recommend jokes based on the users rating of the sample. To rate items, users are asked to click their mouse on a horizontal ratings bar which returns scalar values [Goldberg, 2001]. The system then finds a nearest neighbor to the users ratings; and, recommends the next joke from the list of the nearest neighbor. The program does not use any linguistic theories of humor, but does take into account a users sense of humor.
2.4.10 Applications in Japanese
There has been some work in computational humor in Japanese.
44
Yokogawa [2002] proposes Japanese pun analyzer, which is based on the hypothesis that the similarity of articulation matches similarity of sounds. The system has four steps:
Morphological analysis Connection check Generation of similar expressions Pun candidate check
Experimental results show that the system is able to recognize about 50% of ungrammatical pun sentences.
Binsted and Takizawa [Binsted, 1998] implemented a simple model of puns in a program BOKE, which generates puns in Japanese. BOKE is similar to JAPE: the programs differ in the lexicon and the templates that are used to generate the text, but the punning mechanisms are the same.
Takizawa also have implemented a pun-detecting program for Japanese, which accepts a sequence of phonemic symbols and produces possible analyses of this in terms of sequences of Japanese words, rating each word-sequence with the likelihood that it is a pun, based on various heuristics [Ritchie, 1998].
45
Statistical Measures in Language Processing
A joke generator has to have an ability to construct meaningful sentences, while a joke recognizer has to recognize them. While joke generation involves limited world knowledge, joke recognition requires a much more extensive world knowledge.
To be able to recognize or generate jokes, a computer should be able to process sequences of words. A tool for this activity is the N-gram, one of the oldest and most broadly useful practical tools in language processing [Jurafsky, 2000]. An N-gram is a
model that uses conditional probability to predict Nth word based on N-1 previous words. N-grams can be used to store sequences of words for a joke generator or a recognizer.
3.1
N-grams
N-grams are typically constructed from statistics obtained from a large corpus of text using the co-occurrences of words in the corpus to determine word sequence probabilities [Brown, 2001]. As a text is processed, the probability of the next word N is calculated, taking into account end of sentences, if it occurs before the word N.
The probabilities in a statistical model like an N-gram come from the corpus it is trained on. This training corpus needs to be carefully designed. If the training corpus is too specific to the task or domain, the probabilities may be too narrow and not generalize
46
well to new sentences. If the training corpus is too general, the probabilities may not do a sufficient job of reflecting the task or domain [Jurafsky, 2000].
A bigram is an N-gram with N=2, a trigram is an N-gram with N=3, etc. A bigram model will use one previous word to predict the next word, and a trigram will use two previous words to predict the word.
As bigram probability is conditional, the formula for bigram probability is:
p(A|B) = p(A and B)/p(B)**
To calculate p(B), the following formula can be used:
p(B) = (number of occurrences of B in the text)/(number of words in the text)
Similarly,
p(A and B) = (number of occurrences of A and B in the text)/(number of words in the text)
This means that p(A|B) is:
p(A|B) = (number of occurrences of A and B in the text)/(number of occurrences of B in the text)
**
For more information on conditional probability, see http://www.mathgoodies.com/lessons/vol6/conditional.html
47
To show how N-grams work, consider Joke18 and Joke21 as a training corpus. To simplify the example, the characters ., !, and ? are replaced with tag. Quotes, commas, and colons are dropped. The corpus becomes:
A newspaper reporter goes around the world with his her investigation He she stops people on the street and asks them Excuse me Sir Madam what is your opinion of the meat shortage An American asks What is shortage The Russian asks What is opinion The Pole asks What is meat The New York taxi-driver asks What is excuse me A woman goes to the rabbi Rabbi what shall I do so that I wouldnt become pregnant again The rabbi says Drink a glass of cold water Before or after The rabbi replies Instead
Suppose, the word with the highest probability that can follow the word what should be found. To find this word, bigrams can be used.
After examining the test corpus, following sequences are discovered: what is, what is, what is, what is, what is, what shall. This means that what is occurs 5 times, what shall occurs once, and what occurs 6 times. Plugging it into the formula for p(A|B), the probabilities are: p(is|what) = 5/6; p(shall|what) = 1/6. Therefore, using the corpus of Joke18 and Joke21, the word is should follow the word what.
48
If the above example only used the Joke18 as its training corpus, the answer, of course, would be different as what is would not be present in the corpus. This shows that the choice of a training corpus is crucial to the results of experiments.
3.2
Distant N-grams
The N-gram model is used to predict a word based on the preceding N-1 words because most of the relevant syntactic information can be expected to lie in the immediate past. However, some relevant words may be within a greater distance than a regular N-gram model covers [Huang, 1993]. Distant or skip N-Grams are used to cover long-range dependencies with N-Gram models with a small N. This is done by introducing a gap of a certain length between a word and its history [Brown, 2001].
Consider a word sequence from Joke4: doctors young and pretty wife. Unlike regular N-grams, distant N-grams with a gap of two will be able to capture not only pretty wife dependency, but also young wife.
To illustrate how distant N-grams work, consider the training corpus of Joke18 and Joke21. The task is to find the most probable word after the word what, as it was done with regular N-grams. This time distant bigram with a gap of one will be used.
In addition to what is and what shall, the following sequences will be taken into account: what your, what shortage, what opinion, what meat, what excuse,
49
what I. This means that p(is|what) = 5/12, p(shall|what) = 1/12, p(your|what) = 1/12, p(shortage|what) = 1/12, p(opinion|what) = 1/12, p(meat|what) = 1/12, p(excuse|what) = 1/12, p(I|what) = 1/12. In this example, the result stays the same: the word is should follow what. However, it may not always be the case.
Possible Methods for Joke Recognition
It is shown in the Section 2.4 that there are very few approaches for computational humor generation, and almost none for computational recognition of humor. This may be partly due to the absence of a well-described algorithm.
4.1
Simple Statistical Method
One approach to computation joke recognition is building a simple statistical joke recognizer. Using Suls two-stage model [Suls, 1972], a joke recognizer may be able to tell if texts are jokes by checking for the conflict with prediction. To check if there is a conflict with the prediction, N-grams can be used. For example, in Joke29, the expected reply is the room number, not the lobby. The N-grams may be able to predict the room number, whereas the lobby may have a low probability of occurrence.
Joke29: Hotel clerk: Welcome to our hotel. Max: Thank you. Can you tell me what room Im in?
50
Clerk: The lobby.
The first step is to determine which N will give most accurate result. This can be done by inputting a sample of jokes into a learner. Once N is determined, the N-gram model can be used to check the probability of the next word or the next phase. If the probability is low enough, assume that there is a conflict with prediction, and call this text a joke.
4.2
Punchline Detector
A second possible approach is punchline detector. Once again, Suls [1972] two-stage model is used. However, while the first approach starts with a text, and concludes if the text is or is not a joke, this approach starts with a known joke, and finds a punchline in it.
A human expert checks if the punchline is correctly identified. In this case, a punchline is defined as a sentence that breaks the prediction. Once again, N-gram model can be used. The punchline of a joke will then be a sentence, containing the utterance with the lowest probability.
Results can be tested by comparing the output of a punchline recognizer with results from a program that randomly chooses second or third sentence of a joke to be a punchline. A recognizer is considered successful if the number of correctly identified punchlines by recognizer exceeds the number of punchlines drawn correctly in a random sample.
Taken from The Original 365 Jokes, Puns & Riddles Calendar, 2004
51
4.3
Restricted Context
For an initial investigation, the first two approaches would be overly broad. To restrict the problem, a restricted context approach was used, and the limited domain of Knock Knock jokes was examined.
A typical Knock Knock (KK) joke is a dialog between two people that uses wordplay in the punchline. Recognizing humor in a KK joke arises from recognizing the wordplay. A KK joke can be summarized using the following structure:
Line1: Knock, Knock Line2: Who is there? Line3: any phrase Line4: Line3 followed by who? Line5: One or several sentences containing one of the following: Type1: Line3 Type2: a wordplay on Line3 Type3: a meaningful response to Line3.
Joke30 is an example of Type1, Joke31 is an example of Type2, and Joke32 is an example of Type3.
Joke30: Knock, Knock
52
Who is there? Water Water who? Water you doing tonight?
Joke31: Knock, Knock Who is there? Ashley Ashley who? Actually, I dont know.
Joke32: Knock, Knock Who is there? Tank Tank who? You are welcome.
From theoretical points of view, both Raskins [1985] and Suls [1972] approaches can explain why Joke30 is a joke. From Raskins approach, the explanation is: the two belong to different scripts that overlap in the phonetic representation of water, but also oppose each other. From Suls approach, the explanation is: what are conflicts with the prediction. In this approach, a cognitive rule can be described as a function that finds a
http://www.azkidsnet.com/JSknockjoke.htm
53
phrase that is similar in sound to the word water, and that fits correctly in beginning of the final sentences structure. This phrase is what are for Joke30.
Experimental Design
A further tightening of the focus was to attempt to recognize only one type of KK jokes. In this case, not all Knock Knock jokes were expected to be recognized. The program was only expected to recognize Type1 jokes. This means that it did not recognize a joke unless Line5 contained Line3 when the jokes was read, and a wordplay of Line3 made sense in Line5. In the context of this program, a wordplay is defined as a meaningful phrase that sounds similar to the original phrase, but has different spelling. The original phrase, in this case Line3, is referred to as the keyword. For example, only Joke30 from the set of KK jokes in the Section 4.3 was expected to be recognized.
There are at least three ways of determining sound alike short utterances:
Use a dictionary of sound alike utterances Dynamically access a pronouncing dictionary such as The American Heritage dictionary and search it Computationally build up sounds like utterances as needed
The only feasible method for this project was building up utterances as needed.
54
The joke recognition process has four steps:
Step1: joke format validation Step2: generation of wordplay sequences Step3: wordplay sequence validation Step4: last sentence validation
Once Step1 is completed, the wordplay generator generates utterances, similar in pronunciation to Line3. Step3 only checks if the wordplay makes sense without touching the rest of the punchline. It uses a bigram table for validation. Only meaningful wordplays are passed to Step4 from Step3.
If the wordplay is not in the end of the punchline, Step4 takes the last two words of the wordplay, and checks if they make sense with the first two words of text following the wordplay in the punchline, using two trigram sequences. If the wordplay occurs in the end of the sentence, the last two words before the wordplay and the first two words of the wordplay are used for joke validation. If Step4 fails, go back to Step3 or Step2, and continue the search for another meaningful wordplay.
It is possible that the first three steps return valid results, but Step4 fails; in which case a joke is not considered a joke.
55
The punchline recognizer is designed so that it does not have to validate the grammatical structure of the punchline. Moreover, it is assumed that the Line5 is meaningful when the expected wordplay is found, if it is a joke; and, that Line5 is meaningful as is, if the text is not a joke. In other words, a human expert should be able to either find a wordplay so that the last sentence makes sense, or conclude that the last sentence is meaningful without any wordplay. It is assumed that the last sentence is not a combination of words without any meaning.
The joke recognizer is to be trained on a number of jokes; and, tested on jokes, twice the number of training jokes. The jokes in the test set are previously unseen by the computer. This means that any joke, identical to the joke in the set of training jokes, is not included in the test set.
Generation of Wordplay Sequences
Given a spoken utterance A, it is possible to find an utterance B that is similar in pronunciation by changing letters from A to form B. Sometimes, the corresponding utterances have different meanings. Sometimes, in some contexts, the differing meanings might be humorous if the words were interchanged.
A repetitive replacement process is used for generation of wordplay sequences. Suppose, a letter a1 from A is replaced with b1 to form B. For example, in Joke33 if a letter i in a word kip is replaced with ee, the new word, keep, sounds similar to kip.
56
Joke33: --Knock, Knock! --Whos there? --Kip --Kip who? --Kip your hands off me.
A table, containing combinations of letters that sound similar in some words, and their similarity value was used. In this paper, this table will be referred to as the Similarity Table. Table2 is an example of the Similarity Table. It contains a subset of entries from it. The Similarity Table was derived from a table developed by Frisch [1996]. Frischs table, shown in Appendix E, contained cross-referenced English consonant pairs along with a similarity of the pairs based on the natural classes model. Frischs table was heuristically modified and extended to the Similarity Table by translating phonemes to letters, and adding pairs of vowels that are close in sound. Other phonemes, translated to combinations of letters, were added to the table as needed to recognize wordplay from a set of training jokes, shown in Appendix B.
The resulting Similarity Table approximately shows the similarity of sounds between different letters or between letters and combination of letters. A heuristic metric
indicating how closely they sound to each other was either taken from Frischs table or assigned a value close to the average of Frischs similarity values. The purpose of the
57
Similarity Table is to help computationally develop sound alike utterances that have different meanings.*** The Similarity Table should be taken as a collection of heuristic satisficing values that might be refined through additional iteration.
a a e e e en k l r r t t t w w w
e o a i o e sh r m re d th z m r wh
0.23 0.23 0.23 0.23 0.23 0.23 0.11 0.56 0.44 0.23 0.39 0.32 0.17 0.44 0.42 0.23
Table2: Subset of entries of the Similarity Table, showing similarity of sounds in words between different letters
When an utterance A is read by the wordplay generator, each letter in A is replaced with the corresponding replacement letter from the Similarity Table. Each new string is assigned its similarity with the original word A.
All new words are inserted into a heap, ordered according to their similarity value, greatest on top. When only one letter in a word is replaced, its similarity value is being
***
The complete table can be seen at homepages.uc.edu/~slobodjm/thesis/sim.table.pdf
58
taken from the Similarity Table. The similarity value of the strings is calculated using the following heuristic formula:
similarity of string = number of unchanged letters + sum of similarities of each replaced entry from the table
Note, that the similarity values of letters are taken from the Similarity table. These values differ from the similarity values of strings.
Once all possible one-letter replacement strings are found, and inserted into the heap, according to the string similarity, the first step is complete.
The next step is to remove the top element of the heap. This element has the highest similarity with the original word. If this element can be decomposed into an utterance that makes sense, this step is complete. If the element cannot be decomposed, each letter of the string, except for the letter that was replaced originally, is being replaced again.
Once all possible replacements of a second letter are done, and all newly constructed strings are inserted according to their similarity, the top element is removed. Just as in step one, if the string from the top element can be decomposed into a meaningful phrase, the step is complete. If it cannot, unchanged letter of the top element are replaced. If all letters have been already replaced the next top element is removed.
59
Consider Joke30 as example. The joke fits a typical KK joke pattern. The next step is to generate utterances similar in pronunciation to water.
Table3 shows some of the strings received after one-letter replacements of water in Joke30. The second column shows the similarity of the string in the first table with the original word.
New String
String Similarity to Water
watel mater watem rater wader wather watar watir wator weter whater woter wazer
4.56 4.44 4.44 4.42 4.39 4.32 4.23 4.23 4.23 4.23 4.23 4.23 4.17
similarity value to water
Table3: Examples of strings received after replacing one letter from the word water and their
Suppose, the top element of the heap is watel, with the similarity value of 4.56. Watel cannot be decomposed into a meaningful utterance. This means that each letter of watel except for l will be replace again. The newly formed strings will be inserted into the heap, in the order of their similarity value. The letter l will not be replaced as it not the original letter from water. The string similarity of newly constructed strings will be most likely less than 4. (The only way a similarity of a newly constructed string is 60
greater than 4 is if the similarity of the replaced letter is above 0.44, which is unlikely.) This means that they will be placed below wazer. The next top string, mater, is removed. Mater is a word. However, it does not work in the sentence Mater you doing. (See Sections 7 and 8 for further discussion.) The process continues until whater is the top string. The replacement of e in whater with a will result in whatar. Eventually, whatar will become the top string, at which point r will be replaced with re to produce whatare. Whatare can be decomposed into what are by inserting a space between t and a. The next step will be to check if what are is a valid word sequence.
Generated wordplays that were successfully recognized by the wordplay recognizer, and their corresponding keywords are stored for the future use of the program. When the wordplay generator receives a new request, it first checks if wordplays have been previously found for the requested keyword. The new wordplays will be generated only if there is no wordplay match for the requested keyword, or the already found wordplays do not make sense in the new joke.
Wordplay Recognition
A wordplay sequence is generated by replacing letters in the keyword (as defined in Section 4.3). The keyword is examined because: if there is a joke, based on wordplay, a phrase that the wordplay is based on will be found in Line3. Line3 is the keyword. A wordplay generator generates a string that is similar in pronunciation to the keyword.
61
This string, however, may contain real words that do not make sense together. wordplay recognizer determines if the output of the wordplay generator is meaningful.
A database with the bigram table is used to contain every discovered two-word sequence along with the number of their occurrences, also referred to as count. Any sequence of two words will be referred to as word-pair. Another table in the database, trigram table, contains each three-word sequence, and the count.
The wordplay recognizer queries the bigram table. The joke recognizer, discussed in the Section 8, Joke Recognition, queries the trigram table.
To construct the database several focused large texts were used. The focus was at the core of the training process. Each selected text contained a wordplay on the keyword (Line3) and two words from the punchline that follow the keyword from at least one joke from the set of training jokes. If more than one text containing a given wordplay was found, the text with the closest overall meaning to the punchline was selected. Arbitrary texts were not used, as they did not contain a desired combination of wordplay and part of punchline.
The bigram table was constructed such that every pair of words occurring in the selected text was entered into the table. If the table did not contain the newly input pair, it was inserted with 1 for the count. If the table already contained the pair, the count was incremented.
62
The concept of this wordplay recognizer is similar to an N-gram. An N-gram is a model where the next word depends on a number of previous words. recognizer, the bigram model is used (N-gram with N equals two). For a wordplay
The output from the wordplay generator was used as input for the wordplay recognizer. An utterance produced by the wordplay generator is decomposed into a string of words. Each word, together with the following word, is checked against the database. Suppose, a wordplay to be checked is an m-word string. The first two words, w1 and w2 will be checked first. If the sequence of <w1, w2> occurs in the database, the second and third words are checked. The wordplay is invalid if <wi-1, wi> does not occur in the database for any 0 < i < m in the string with word count m.
An N-gram determines for each string the probability of that string in relation to all other strings of the same length. As a text is examined, the probability of the next word is calculated. The wordplay recognizer keeps the number of occurrences of word sequence, which can be used to calculate the probability. A sequence of words is considered valid if there is at least one occurrence of the sequence anywhere in the text. The count and the probability are used if there is more than possible wordplay. In this case, the wordplay with the highest probability will be considered first.
For example, in Joke30 what are is a valid combination if are occurs immediately after what somewhere in the text.
63
Joke Recognition
A text with valid wordplay is not a joke if the rest of the punchline does not make sense. For example, if the punchline of Joke30 is replaced with Water a text with valid wordplay, the resulting text is not a joke, even though the wordplay is still valid. Therefore, there has to be a mechanism that can validate that the found wordplay is compatible with the rest of the punchline and makes it a meaningful sentence.
A concept similar to a trigram was used to validate the last sentence. A trigram is an Ngram with N equals three. All three-word sequences are stored in the trigram table.
The same training set was used for both the wordplay and joke recognizers.
The
difference between the wordplay recognizer and joke recognizer was that the wordplay recognizer used pairs of words for its validation while the joke recognizer used three words at a time. As the training text was read, the newly read word and the two following words were inserted into the trigram table. If the newly read combination was in the table already, the count was incremented.
As the wordplay recognizer had already determined that the wordplay sequences existed, there was no reason to revalidate the wordplay. The wordplay could occur in the
beginning of the punchline, in the middle of the punchline, or in the end of the punchline.
64
Depending on the locations of the wordplay, different steps were taken to validate the punchline.
8.1
Wordplay in the Beginning of a Punchline
To check if wordplay makes sense in the beginning of a punchline, the last two words of the wordplay, wwp1 and wwp2, are used, for the wordplay that is at least two words long. If the punchline is valid, the sequence of wwp1, wwp2, and the first word of the remainder of the sentence, ws, should be found in the training text. If the sequence <wwp1 wwp2 ws> occurs in the trigram table, this combination is found in the training set, and the three words together make sense. If the sequence is not in the table, either the training set is not accurate, or the wordplay does not make sense in the punchline. In either case, the computer does not recognize the joke. If the previous check was successful, or if the wordplay has only one word, the last check can be performed. The last step involves the last word of the word play, wwp, and the first two words of the remainder of the sentence, ws1 and ws2. If the sequence <wwp ws1 ws2> occurs in the trigram table, the punchline is valid, and the wordplay fits with the rest of the final sentences.
For example, Joke30 has a wordplay in the beginning of the punchline. To examine if the joke is valid, the sequences <what are you> and <are you doing> are checked. If both sequences occur in the trigram table, the joke is valid. It may seem that both sequences are common. However, it is possible that no matter how common the sequences are, they
65
may not be in the text. Then, the joke is considered invalid, regardless of the correctly identified wordplay.
Suppose, the wordplay recognizer returned the word waiter as the wordplay for water. The joke recognizer would examine <waiter you doing> and conclude that it would not produce a meaningful sentence. In this case, the wordplay recognizer would have to search for another wordplay.
If the wordplay recognizer found more than one wordplays that produced a joke, the wordplay resulting in the highest trigram sequence probability was used.
8.2
Wordplay at the End of a Punchline
To check if wordplay makes sense at the end of a punchline, the last two words of the punchline, ws1, ws2, and the first word of the wordplay, wwp, are used. If the sequence <wwp ws1 ws2> occurs in the trigram table, the combination is valid. If the found
wordplay is at least 2 words, the first two words of the wordplay, wwp1, wwp2, and the last word of the punchline, ws, are used to determine if the entire punchline makes sense. If the sequence <wwp1 wwp2 ws> is in the trigram table, this combination was found in the training set, and the punchline is valid. If either the first or the second sequence is not in the table, the training text did not have the combination that was needed. The computer will not recognize the joke.
66
8.3
Wordplay in the Middle of a Punchline
To check if a punchline with wordplay in the middle is valid, the checks for wordplay in the end and in the beginning of the punchline are performed. If any of the checks above fail, the computer does not recognize the joke.
Training Text
A collection of texts is used to populate the database with bigram and trigram tables.
As a text is read, every word and the following word, or two following words, are inserted into the database. The number of words in a sequence depends on whether the text is used for the wordplay or the punchline recognizer. If the newly read combination already exists in the database, the count is incremented. If the combination is not in the database yet, it is entered.
A simple statistical sentence parsing was used. It was decided to ignore the punctuation marks that normally do not terminate sentences.
9.1
First Approach
67
If a new word is followed by one of the following non-sentence terminating characters: , , - , : , -- the word is inserted into the database without the character. If a word is followed by one of the following terminating characters: . , ! , ? , ; , the sentence is considered to be terminated. The next sequence to be inserted will start from the beginning of the next sentence. For example:
A B, C D E. F G H I!
The punchline recognizer uses three-word sequences. The above sentences will result in the following database insertions for the use by punchline recognizer:
(A B C) (B C D) (C D E) (F G H) (G H I)
9.2
Second Approach
An alternative to the above is the following insertions:
(A B ,)
68
(B , C) (C D E) (D E .) (E . F) (F G H) (G H I) (H I !)
The difference between the first and the second approaches is in entering punctuation marks into the database. The first does not enter any punctuation marks into the
database: it ignores them in the middle of the sentence and stops when the end of the sentence is reached. The drawback of the first approach is the absence of a relationship between E and F. The second approach enters punctuation marks into the database as valid words, if it occurs in the second or third position. While it catches the end of a sentence, and in some cases intonation (when ! is entered), it does not enter some of the combinations, entered by the first model. (B C D) is an example of the combination not inserted into the database by the second approach.
9.3
Third Approach
A third alternative is the first and the second combined. It contains the largest number of entries to the database: (A B C)
69
(A B ,) (B C D) (B , C) (C D E) (D E .) (E . F) (F G H) (G H I) (H I !)
As commas are sometimes optional, they can be dropped, resulting in the following insertion: (A B C) (B C D) (C D E) (D E .) (E . F) (F G H) (G H I) (H I !)
The number of combinations makes it more possible to identify valid word combinations, but it also slows down the program.
70
9.4
Fourth Approach
Entering punctuation marks into the database has another drawback.
Sometimes,
different texts used for the bigram and trigram tables generation use do not follow standard punctuation rules. Punctuation rules are not necessarily followed in Knock
Knock jokes. A sequence P Q. R S in one joke or text may become P Q; R S in another joke or text. For this reason, the fourth approach is used. This approach replaces all punctuation marks with an arbitrary chosen symbol x. It also removes entries with x in the middle of a sequence, as it shows dependence that may not hold. There may be sentences Y, Z in a text T that are interchangeable. This means that the relative position of Y and Z will not change the meaning of T. Adding (Ylast word x Xfirst word) does not reflect the actual dependency just as adding (Xlast word x Yfirst word)
After removing sequences with x in the second position, the fourth approach results in the following insertions:
(A B C) (B C D) (C D E) (D E x) (F G H) (G H I) (H I x)
71
The joke recognizer uses the fourth approach for N-gram database, as it seems to be the most flexible and the most reasonable out of the four discussed.
9.5
Fifth Approach
The fifth approach is to use distant N-grams. This approach was used only for the distant N-gram database. The distant N-grams are stored in a different database, which will also have bigram and trigram tables. The distant N-grams used in this program have a gap of two. This means that as the text is read, the sequences are inserted into the database with a possible distance of up to two positions between words of the sequence in the sentence. For example, if the sentence J K, L M. is read, the following will be inserted into the database.
(J K L) (J K M) (J K x) (J L M) (J L x) (J M x)
In addition, if one word from one of the groups below is part of the sequence to be inserted into any of the four tables (for distant and regular N-grams), the sequences with all members of the group will be inserted. The groups are:
72
is, was, are, were, will, shall do, did, shall, will I, she, he, it, we, you, they my, his, her, its, your, our, their a, an, the
For example, if the sequence (N M a) is read from the text, the sequences (N M a), (N M an), (N M the) will be inserted into the database.
10 Experimentation and Analysis
10.1 Training Set
A set of 65 jokes from the 111 Knock Knock Jokes website and one joke taken from The Original 365 Jokes, Puns & Riddles Calendar was used as a training set. (See Appendix B: Training Jokes) The Similarity Table, discussed in the Section 6,
Generation of Wordplay Sequences, was modified with new entries until correct wordplay sequences could be generated for all 66 jokes. The training texts inserted into the bigram and trigram tables were chosen based on the punchlines of jokes from the set of training jokes. The training texts were inserted into the database according to the rules
73
described in the Section 9, Training Text. The jokes were run against the standard Ngrams and distant N-grams with the gap of two.
The length of the keyword in the training jokes was calculated by taking the longest keyword in the sample, not counting the white space character or punctuation marks. The resulting maximum length of the keywords in the training set was eight characters.
The results are described in Table4:
Jokes Recognized Jokes Unrecognized Jokes
Regular N-grams Number Percentage 59 89.39% 7 10.61%

Table4: Training Jokes results
Distant N-grams Number Percentage 59 89.39% 7 10.61%
While Table4 contains results for both regular and distant N-grams, it was unnecessary to run the later. The distant N-grams are usually used to find two or three-word sequences that may be meaningful, but that have words that are not adjacent to each other in the text. This method makes sense for the set of test jokes, as the joke texts are unknown ahead of time. The text of training jokes, on the other hand, was known. Moreover, the texts, used to populate the database, were chosen so they did contain sequence of the words in the first part of punchlines in all jokes. This means that the distant N-grams
will not recognize more jokes if the expected wordplay was found.
74
The program was unable to recognize KKjoke9 from Appendix B: Training Jokes as there was no text in the database to recognize bugs pray sequence. This is the only joke from the training set that did not have a punchline sequence inserted into the database. The sequence was not inserted because a search on bugs pray that sequence did not return any texts that were non-jokes. However, when bugs pray that snakes was inserted into the database (and removed immediately after), the program recognized the joke.
The other six jokes, shown in Table5, were not recognized, as the wordplay recognizer was not able to find wordplays based on their keywords. The possible cause is the length of the Line3 combined with the number of replacements required to find wordplays for these jokes. The first column of Table5 reflects the jokes number from Appendix B: Training Jokes that was not recognized. The second column shows the length of the keyword. The third column shows the number of letter replacements needed to find a wordplay. (See Generation of Wordplay Sequences for explanation of letter replacements.)
Joke Number 17 23 28 51 57 66
Length of Line3 6 6 7 6 7 6
Number of Replacements for Wordplay 3 3 5 5 4 4
Table5: Unrecognized jokes in the training set
If all jokes in Table5 were to be recognized, the Similarity Table would have to be modified so that each wordplay would require at most two replacements of letters in the
75
keyword. If such modifications were made, the joke recognizer would recognize 65 jokes from the training set.
10.2 Alternative Training Set Data Test
A sample of training jokes was also run using Hempelmanns [2003b] cost table. The table, shown in Appendix F, reflects the cost of going from a letter in first column of the table to a letter in the first row. Hempelmanns cost table was modified to include all combinations of letters used in the Similarity Table. Heuristically, the cost values of the newly inserted pairs were adjusted to the average of the cost table values.
The sample set consisted of the first eleven jokes from the Training Set. Using the cost table, eight jokes of the sample set were recognized as jokes, while using the Similarity Table ten jokes were recognized. Both approached did not recognize bugs pray joke as the needed word sequence is not in the database.
10.3 General Joke Testing
The program was run against a test set of 130 KK jokes, and a set of 65 non-jokes that have a similar structure to the KK jokes. The non-jokes are discussed in the Section 10.4.
76
The test jokes were taken from 3650 Jokes, Puns & Riddles [Kostick, 1998]. These jokes had the punchlines corresponding to any of the three KK joke structures discussed in Section 4.3. Namely,
Type1: Punchlines containing Line3 -- To recognize the joke, Line3 will have to be substituted with its wordplay in the punchline. Type2: Punchline containing wordplay on Line3 -- To recognize the joke, the generated wordplay sequence will have to match the punchline version of the wordplay. Type3: Punchline containing a meaningful response to Line3 -- The program is not expected to recognize these jokes.
To test if the program finds the expected wordplay, each joke had an additional line, Line6, added after Line5. Line6 is not a part of any joke. It only existed so that the wordplay found by the joke recognizer could be compared against the expected wordplay. Line6 consists of the punchline with the expected wordplay instead of the punchline with Line3.
The results are categorized as follows:
Jokes identified as jokes o found wordplay matches the expected wordplay o found wordplay does not match the expected wordplay
77
the punchline is meaningful the punchline does not make sense Jokes identified as non-jokes o The structure of the joke is unrecognizable by the program o The structure of the joke should have been recognized correct wordplay found, but the punchline was not recognized correct wordplay not found
As the training set contained only jokes where the length of Line3 did not exceed eight characters, the test set only had jokes with Line3 of at most eight characters long.
As it was discussed in the Section 5, the jokes in the test set were previously unseen by the computer. This means that if the book contained a joke, identical to the joke in the set of training jokes, this joke was not included in the test set.
Some jokes, however, were very similar to the jokes in the training set, but not identical. These jokes were included in the test set, as they were not the same. As it turned out, some jokes to a human may look very similar to jokes in the training set, but treated as completely different jokes by the computer.
An example of this is KKjoke63 from the training set and KKjoke60 from the training set. Both jokes are based on the line Shell be coming round the mountain. The joke in the training test has Shelby as keyword, while the joke from the test set has Sheila. The
78
joke recognizer treats the two jokes differently as the keywords differ. Another example is KKjoke99 from the test jokes and KKjoke53 from the training jokes. Both jokes are based on the keyword cargo, both jokes as car go as their wordplay. But, the punchline of one is Car go beep, beep, and the punchline of the other is Car go hong, hong. Once again, the jokes are treated as different jokes by the computer as is has to find different trigram sequence for them.
The set of test jokes was run using regular N-grams and distant N-grams. The results are shown in Table6:
Jokes Identified As Jokes Expected wordplay found Unexpected Punchline is meaningful wordplay found Punchline is not meaningful Wrong structure Correct structure Wordplay found No wordplay
Non Jokes
Normal N-grams Number % 12 9.23 2 1.54 3 2.31 8 6.15 68 52.31 37 28.46
Distant N-grams Number % 16 12.31 2 1.54 10 7.69 8 6.15 62 47.69 32 24.62
Table6: Results of the Joke Test Set
Table6 shows that out of 130 previously unseen jokes the program was not expected to recognize eight jokes. These jokes are not expected to be recognized because the
program is not expected to recognize their structure.
10.3.1 Jokes in the Test Set with Wordplay in the Beginning of Punchline 79
Using regular N-grams, the program recognized only seventeen jokes as jokes out of 122 that it could potentially recognize. Twelve of these jokes have the punchlines that matched the expected punchlines. Two jokes have meaningful punchlines that were not expected. Three jokes were identified as jokes by the computer, but their punchlines do not make sense to the investigator.
When the program used distant N-grams, twenty-eight jokes were recognized. In the sixteen jokes the expected punchline was found. Note that regular N-grams recognized only 75% of the jokes with the expected punchline that were found by the distant Ngrams. The recognized jokes with the meaningful unexpected punchline found by distant N-grams match the jokes found by regular N-grams. Ten jokes were recognized as jokes by distant N-grams while their punchlines did not make sense to an investigator.
One of the incorrectly identified jokes is KKjoke108. The joke is based on keyword Oscar and the expected wordplay ask her. The wordplay generator failed to find the expected wordplay, but it returned offer instead. The punchline of the joke should read Ask her for a date. However, the joke recognizer uses only two words that follow the keyword for its validation. The two words are for a. The sequence <offer for a> was found in the trigram table; therefore, the joke was identified as joke. Not only was the sequence was found in the table, it is actually a meaningful sequence. Without the word date, it would be difficult for a human to decided between ask her for a and offer for a. This suggests that the trigram model does not carry enough information for the task.
80
The program was able to find wordplay in 85 jokes using regular N-grams, and 90 jokes using distant N-grams. Some of the jokes with found wordplay were not recognized as jokes because the database did not contain the needed sequences. When a wordplay was found, but the needed sequences were not in the database, the program did not recognize the jokes as jokes, as it was indicated in the Section 5. For example, the wordplay, when he, was found in KKjoke1, based on the keyword Winnie. However, the sequences <when he finally> and <he finally shows> were not in the trigram table of either regular N-gram or distant N-gram. Therefore, this joke was not recognized as
joke, while the expected wordplay was wound.
In many cases, the found wordplay matched the intended wordplay. This suggests that the rate of successful joke recognition would be much higher if the database contained all the needed word sequences.
10.3.2 Jokes in the Test Set with Wordplay in the Middle of a Punchline
Out of 130 jokes used for testing only one joke had a wordplay in the middle of the punchline. The joke generator was unable to recognize this joke; as the wordplay
recognizer did not find an acceptable wordplay. The joke is based on the wordplay between Watson and Whats new.
If the entry <on, new, 0.23> was inserted into the Similarity Table, the wordplay generator and the wordplay recognizer would be able to generate and recognize the
81
expected wordplay. If sequences needed for joke recognition, but missing in the trigram table, were inserted, the joke recognizer would be able to recognize this joke as joke.
10.4 Testing Non-Jokes
The program was run with 65 non-jokes. The only difference between jokes and nonjokes was the punchline. The punchlines of non-jokes were intended to make sense with Line3, but not with the wordplay of Line3. The non-jokes were generated from the training joke set. The punchline in each joke was substituted with a meaningful sentence that starts with Line3.
If the keyword was a name, the rest of the sentence was be taken from the texts in the training set. For example, Joke34 became Text1 by replacing time for dinner with awoke in the middle of the night.
Joke34: Knock, Knock Who is there? Justin Justin who? Justin time for dinner.
Text1: Knock, Knock Who is there?
82
Justin Justin who? Justin awoke in the middle if the night.
A segment awoke in the middle of the night was taken from one of the training texts that was inserted into the bigram and trigram tables. Part of the text is:
Eric awoke in the middle of the night. He had been feeling increasingly better every day and had even started walking around and doing little chores. His broken ribs still bothered him a bit, but his burns had basically reduced themselves to faint pink scars.
The name Eric in the first sentence was replaced by Justin, producing Line5 of Text1. By using a sentence that was inserted into the database, the possibility of a text being recognized as non-joke is decreased because a word sequence is unknown to the trigram.
The non-jokes results were categorized as follows:
Non-jokes identified as non-jokes Non-jokes identified as jokes o punchline with wordplay is meaningful o punchline does not make sense with wordplay
http://tisa.stdragon.com/chapters10.htm
83
The non-joke set was run using N-grams and distant N-grams. The results are shown in Table7:
Text Recognized As Non-Jokes Jokes Punchline makes sense with wordplay Punchline does not make sense with wordplay
Regular N-grams Number % 62 93.94% 1 1.52% 3 4.55%
Distant N-grams Number % 58 87.88% 3 4.55% 5 7.58%
Table7: Non-joke results
The non-joke texts incorrectly identified as jokes, using regular N-grams, were based on jokes number 20, 39, 49, 50 from Appendix B: Training Jokes.
The text based on KKjoke39 was identified as a joke, and its punchline made sense with the found wordplay. However, the wordplay did not sound similar to the Line3. Line3 of KKjoke39 is Ken, and the found wordplay was she. This happened because of two entries in the Similarity Table, used during wordplay generation. Recall that the
Similarity Table has three columns: (letters to be replaced, letters to be replaced with, similarity between them). The Similarity Table contains these entries: <k, sh, 0.11> and <en, e, 0.23>. (See Table2.) In some cases, the replacement of k with sh results in valid utterances that sound similar to the original one. This is the reason why <k, sh, 0.11> was entered to the Similarity Table. For the same reason, <en, n, 0.23> is in the Similarity Table. If one the two entries is removed from the Similarity table, the
wordplay generator would not generate she as a wordplay on Ken; and, the text based on KKjoke39 may not be recognized as joke with this wordplay. It should be noted, that then it may be recognized as joke with a different wordplay.
84
The text based on KKjoke20 was incorrectly identified as a joke. In this case, the wordplay, said I was correctly found as it can be argued that it sounds similar to Line3, Sadie. However, the punchline: Said I did not want to hear anything makes no sense. The program counted the punchline as valid as it was able to find the sequences said I did and I did not in the database. Removing at least one of the sequences from the database would result in the text not identified as joke using the wordplay said I. Notice, that I said I did not want to hear anything does make sense. However, if one or both sequences were removed, the program would not be able to tell that I said I did not want to hear anything is a meaningful sentence.
The text based on KKjoke50 was identified as joke by the joke recognizer as well. However, the wordplay, found by the program, does not sound similar to Line3; and punchline does not make sense with the found wordplay. The reasons for incorrectly identifying this text are similar to the ones discussed in the two preceding paragraphs.
It can be argued that the text based on KKjoke49 is a joke with a meaningful punchline and correctly found wordplay. Line3 of KKjoke49 is Amanda, and the wordplay found by the program is a man to. This results in the following punchline: A man to put her hand up to shield her eyes. While the sentence is meaningful, it is not of a typical form for a KK joke punchline.
In addition to the texts that were identified as jokes using regular N-grams, four texts were identified as jokes using distant N-grams. The texts were based on the jokes with
85
the following words in Line3: Wade, Winnie, Ammonia, and Europe. It can be argued that two of the texts identified as joke (Wade and Winnie) have meaningful last sentences. As in the case of KKjoke49, while the sentences are meaningful, they are not of a typical format for a KK joke punchline.
11 Summary
Computational work in natural language has a long history. Areas of interest have included: translation, understanding, database queries, summarization, indexing, and retrieval. There has been very limited success in achieving true computational
understanding.
A focused area within natural language is verbally expressed humor.
Some work has
been achieved in computational generation of humor. Little has been accomplished in understanding. There are many linguistic descriptive tools such as formal grammars. But, so far, there are not robust understanding tools and methodologies. True success will probably first come in a narrowly focused area.
The KK joke recognizer is a first step towards computational recognitions of jokes. It is intended to recognize Knock Knock jokes that are based on wordplay. The recognizers theoretical foundation is based on Raskins Script-based Semantic Theory of Verbal Humor that states that each joke is compatible with two scripts that oppose each other. The Line3 and the wordplay of Line3 are the two scripts. The scripts overlap in pronunciation, but differ in meaning.
86
The joke recognition process can be summarized in four steps:
Step1: joke format validation Step2: generation of wordplay sequences Step3: wordplay sequence validation Step4: last sentence validation
The result of KK joke recognizer heavily depends on:
the choice of appropriate letter-pairs for the Similarity Table the selection of training texts.
The KK joke recognizer learns from the previously recognized wordplays when it considers the next joke. Unfortunately, unless the needed (keyword, wordplay) pair is an exact match with one of the found (keyword, wordplay) pairs, the previously found wordplays will not be used for the joke. Moreover, if one of the previously recognized jokes contains (keyword, wordplay) pair that is needed for the new joke, but the two words that follow or precede the keyword in the punchline differ, the new joke may not be recognized regardless of how close the new joke and the previously recognized jokes are.
The program successfully found and recognized wordplay in most of the jokes. It also successfully recognized texts that are not jokes, but have the format of a KK joke. It was
87
not successful in recognizing most punchlines in jokes.
The failure to recognize
punchline is due to the limited size of texts used to build the trigram table of the N-gram database.
While the program checks the format of the first four lines of a joke, it assumes that all jokes that are entered have a grammatically correct punchline, or at least that the punchline is meaningful. It is unable to discard jokes with a poorly formed punchline. It may recognize a joke with a poorly formed punchline as a meaningful joke because it at most checked four words of the punchline that differ from Line3.
12 Possible Extensions
The results suggest that most jokes were not recognized either because the texts entered did not contain the necessary information for the jokes to work; or because N-grams are not suitable for true understanding of text. One of the simpler experiments may be to test to see if more jokes are recognized if the databases contain more sequences. This would require inserting a much larger text into the trigram table. A larger text may contain more word sequences, which would mean more data for N-grams to recognize some jokes.
It is possible that no matter how large the inserted texts are, the simple N-grams will not be able to understand jokes. The simple N-grams were used to understand or to analyze the punchline. Most jokes were not recognized due to failures in sentence
88
understanding. A more sophisticated tool for analyzing a sentence may be needed to improving the joke recognizer. Some of the options for the sentence analyzer are:
an N-gram with stemming a sentence parser.
A simple parser that can recognize, for example, nouns and verbs; and analyze the sentence based on parts of speech, rather than exact spelling, may significantly improve the results. On the other hand, giving N-grams the stemming ability would make them treat, for example, color and colors as one entity, which may significantly help too.
The wordplay generator produced the desired wordplay in most jokes, but not all. After the steps are taken to improve the sentence understander, the next improvement should be a more sophisticated wordplay generator. The existing wordplay generator is unable to find wordplay that is based word longer than six characters, and requires more that three substitutions. A better answer to letter substitution is phoneme comparison and
substitution. Using phonemes, the wordplay generator will be able to find matches that are more accurate.
The joke recognizer may be able to recognize jokes other than KK jokes, if the new jokes are based on wordplay, and their structure can be defined. However, it is unclear if
recognizing jokes with other structures will be successful with N-grams.
89
13 Conclusion
An effort was made to computationally understand humor in a tightly focused domain. The domain was a particular aspect of wordplay. The experimental joke recognizer was designed to recognize jokes based on wordplay in the focused domain of Knock Knock jokes.
The investigation needed to do three things: generate wordplay based on a phrase recognize meaningful wordplay recognize wordplay that made sense in the punchline
A statistics based method was used. Strings were acquired from test corpora, counted and placed into databases.
The joke recognizer was trained on 66 Knock Knock jokes; and tested on 130 Knock Knock jokes and 66 non-jokes with a structure similar to Knock Knock jokes.
The results show that the program was successful in correctly recognizing non-jokes and wordplay. However, it was not successful in recognizing most jokes after the wordplay was correctly found.
90
In conclusion, the method was reasonably successful in recognizing wordplay. However, it was less successful in recognizing when an utterance might be valid.
91
Bibliography
Salvatore Attardo and Victor Raskin [1991] Script Theory Revis(it)ed: Joke Similarity And Joke Representation Model, HUMOR: International Journal of Humor Research, 4:3-4, pp. 293-347 Salvatore Attardo and Victor Raskin [1993] Nonliteralness And Non-bona-fide In Language: Approaches To Formal And Computational Treatments Of Humour And Irony, Unpublished paper Salvatore Attardo [1994] Linguistic Theories of Humor, Mouton de Gruyter, Berlin Salvatore Attardo, Donalee Attardo, Paul Baltes and Marnie Petray [1994] "The Linear Organization Of Jokes: Statistical Analysis Of Two Thousand Texts," HUMOR: International Journal of Humor Research, 7:1, pp. 27-54 Salvatore Attardo [1997] The Semantic Foundations Of Cognitive Theories Of Humor, HUMOR: International Journal of Humor Research, 10:4, pp. 395-420 Salvatore Attardo [1998] The Analysis Of Humorous Narratives, HUMOR: International Journal of Humor Research, 11:3, pp. 231-260 Salvatore Attardo [2002a] Cognitive Stylistics Of Humorous Texts, In Elena Semino and Jonathan Culpeper (Eds.) Cognitive Stylistics Language And Cognition In Text Analysis, Benjamins, Amsterdam, pp. 231-250 Salvatore Attardo [2002b] Formalizing Humor Theory, Proceedings of Twente Workshop on Language Technology 20, Enschede, University of Twente, The Netherlands, pp. 1-10 Salvatore Attardo, Christian F. Hempelmann, and Sara DiMaio [2002c] Script Oppositions And Logical Mechanisms: Modeling Incongruities And Their Resolutions, HUMOR: International Journal of Humor Research, 15:1, pp. 3-46 Kim Binsted and Graeme Ritchie [1994] An Implemented Model Of Punning Riddles, Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA, pp. 633-638 Kim Binsted [1996a] Machine Humour: An Implemented Model Of Puns, Doctoral dissertation, University of Edinburgh, Scotland, UK Kim Binsted and Graeme Ritchie [1996b] Speculations On Story Puns, Proceedings of Twente Workshop on Language Technology 12, Enschede, University of Twente, The Netherlands, pp. 151-160
92
Kim Binsted and Graeme Ritchie [1997] Computational Rules For Punning Riddles, HUMOR: International Journal of Humor Research, 10:1, pp. 25-76 Kim Binsted and Osamu Takizawa [1998] BOKE: A Japanese Punning Riddle Generator, The Journal of the Japanese Society for Artificial Intelligence, 13(6), (in Japanese) Michael K. Brown, Andreas Kellner, Dave Raggett [2001] Stochastic Language Models (N-Gram) Specification, W3C Working Draft 3, http://www.w3.org/TR/ngram-spec/ David Allen Clark [1968] Jokes, Puns and Riddles, Doubleday, New York Delia Chiaro [1992] The Language Of Jokes: Analyzing Verbal Play, Routledge, London Paul De Palma and Judith Weiner [1992] Riddles: Accessibility And Knowledge Representation, In Proceedings of the 15th International Conference on Computational Linguistics, pp. 11211125 Matt Freedman and Paul Hofman [1980] How Many Zen Buddhists Does It Take To Screw In A Light Bulb? St Martins Press, New York Sigmund Freud [1905] Der Witz Und Seine Beziehung Zum Unbewussten, Durticke, Leipzig and Vienna, translated by James Strachey and reprented as Jokes And Their Relation To The Unconscious, 1960, W. W. Norton, New York Stefan Frisch [1996] Similarity And Frequency In Phonology, Doctoral dissertation, Northwestern University Ken Goldberg, Theresa Roeder, Dhruv Gupta and Chris Perkins [2001] Eigentaste: A Constant Time Collaborative Filtering Algorithm, Information Retrieval Journal, pp. 133-151 Christian F. Hempelmann [2003a] The Ynperfect Pun Selector for Computational Humor, Workshop at CHI 2003, Fort Lauderdale, Florida Christian F. Hempelmann [2003b] Paronomasic Puns: Target Recoverability Towards Automatic Generation, Doctoral dissertation, Purdue University, Indiana Robert Hetzron [1991] On The Structure Of Punchlines, HUMOR: International Journal of Humor Research, 4:1, pp. 61108 Xuedong Huang, Fileno Alleva, Hsiao-Wuen Hon, Mei-Yuh Hwang, Ronald Rosenfeld [1992] The SPHINX-II Speech Recognition System: An Overview, Computer Speech and Language, 7:2, pp. 137-148
93
Daniel Jurafsky, James Martin [2000] Speech and Language Processing, Prentice-Hall, New Jersey Patricia Keith-Spiegel [1972] Early Concepts Of humor: Varieties And Issues, In Jeffrey H Goldstein and Paul E. McGhee (Eds.) The Psychology Of Humor: Theoretical Perspectives and Empirical Issues, Academic Press, New York and London, pp. 4-39 Anne Kostick, Charles Foxgrover and Michael Pellowski [1998] 3650 Jokes, Puns & Riddles, Black Dog & Leventhal Publishers, New York Robert Latta [1999] The Basic Humor Process, Mouton de Gruyter, Berlin Greg Lessard and Michael Levison [1992] Computational Modelling Of Linguistic Humour: Tom Swifties, ALLC/ACH Joint Annual Conference, Oxford Dan Loehr [1996] An Integration Of A Pun Generator With A Natural Language Robot, Proceedings of Twente Workshop on Language Technology 12, Enschede, University of Twente, The Netherlands, pp. 161-172 Craig J. McDonough [2001] Mnemonic String Generator: Software To Aid Memory Of Random Passwords, CERIAS Technical report, West Lafayette, IN, 9 pp. Justin McKay [2002] Generation Of Idiom-based Witticisms To Aid Second Language Learning, Proceedings of Twente Workshop on Language Technology 20, Enschede, University of Twente, The Netherlands, pp. 77-88 Dallin D. Oaks [1994] Creating Structural Ambiguities In Humor: Getting English Grammar To Cooperate, HUMOR: International Journal of Humor Research, 7:4, pp. 377401 Daniel Perlmutter [2000] Tracing The Origin Of Humor, HUMOR: International Journal of Humor Research, 13:4, pp. 457-468 Daniel Perlmutter [2002] On Incongruities And Logical Inconsistencies In Humor: The Delicate Balance, HUMOR: International Journal of Humor Research, 15:2, pp. 155-168 Victor Raskin [1985] The Semantic Mechanisms Of Humour, Reidel, Dordrecht, The Netherlands Victor Raskin [1996] Computer Implementation Of The General Theory Of Verbal Humor, Proceedings of Twente Workshop on Language Technology 12, Enschede, University of Twente, The Netherlands, pp. 9-19
94
Victor Raskin [1999] Laughing At And Laughing With: The Linguistics Of Humor And Humor In Literature, In Rebecca S. Wheeler (Ed.) The Workings Of Language From Prescriptions To Perspectives, Praeger, Westport, CT Willibald Ruch, Salvatore Attardo and Victor Raskin [1993] Towards An Empirical Verification Of The General Theory Of Verbal Humor, HUMOR: International Journal of Humor Research, 6:2, pp. 123136 Willibald Ruch [2001] The Perception Of Humor, In A.W. Kaszniak (Ed.), Emotion, Qualia, And Consciousness, Word Scientific Publisher, Tokyo, pp. 410-425. Graeme Ritchie [1998] Prospects For Computational Humour, Proceedings of 7th IEEE International Workshop on Robot and Human Communication, Takamatsu, Japan, pp. 283-291 Graeme Ritchie [1999] Developing The Incongruity-Resolution Theory, Proceedings of AISB 99 Symposium on Creative Language: Humour and Stories, Edinburgh, Scotland, pp. 7885 Graeme Ritchie [2000] Describing Verbally Expressed Humour, Proceedings of AISB Symposium on Creative and Cultural Aspects and Applications of AI and Cognitive Science, Birmingham, pp. 71-78 Graeme Ritchie [2001] Current Directions In Computational Humour, Artificial Intelligence Review, 16:2, pp. 119-135 Graeme Ritchie [2003] The JAPE Riddle Generator: Technical Specification, Technical Report, EDI-INF-RR0158 Mary K. Rothbart and Diane Plen [1976] Elephants And Marshmallows: A Theoretical Synthesis Of Incongruity-Resolution And Arousal Theories Of Humour, In Anthony J. Chapman and Hugh C. Foot (Eds.) It's A Funny Thing Humour, Pergamon Press, Oxford; New York Oliviero Stock and Carlo Strapparava [2002] Humorous Agent For Humorous Acronyms: The HAHAcronym Project, Proceedings of Twente Workshop on Language Technology 20, Enschede, University of Twente, The Netherlands, pp. 125-136 Jerry M. Suls [1972] A Two-Stage Model For The Appreciation Of Jokes And Cartoons: An Information-Processing Analysis, In Jeffrey H. Goldstein and Paul E. McGhee (Eds.) The Psychology Of Humor, Academic Press, New York, pp. 81100 Jerry M. Suls [1976] Cognitive And Disparagement Theories Of Humour: A Theoretical And Empirical Synthesis, In Anthony J. Chapman and Hugh C. Foot (Eds.) It's A Funny Thing Humour, Pergamon Press, Oxford; New York
95
Osamu Takizawa, Masuzo Yanagida, Akira Ito, Hitoshi Isahara [1996] On Computational Processing Of Rhetorical Expressions - Puns, Ironies And Tautologies, Proceedings of Twente Workshop on Language Technology 12, Enschede, University of Twente, The Netherlands, pp. 39-52 Thomas C. Veatch [1998] A Theory Of Humor, Humor: The International Journal of Humor Research, 11:2, pp. 161-175 T. Yokogawa [2002] Japanese Pun Analyzer Using Articulation Similarities, Proceedings of FUZZ-IEEE, Honolulu, HI
96
Appendix A: Training texts
Joke Number 1 2 3
Text Used From the Tropical Belize Tours http://www.tropicalbelize.com/adv_cayo.asp Jerusalem Quarterly File. Issue 5, 1999. http://www.jqfjerusalem.org/1999/jqf5/sarsar.html Mihoko Takahashi Mathis: Exploring Gender Issues in the Foreign Language Classroom: An Ethnographic Approach http://members.at.infoseek.co.jp/gender_lang_ed/articles/mathis.html http://www.mste.uiuc.edu/courses/ci332fa03/folders/cohort5/eyanaki/math%20lesson %201.htm Letters of Anton Chekhov. http://etext.library.adelaide.edu.au/c/c51lt/chap72.html Ben Myatt How Can I Go On Without You? http://mooseofdoom.freewebpage.org/hgwy_2.htm Case Log: Dr. Kip Redford.Patient: Ralph Fitzpatrick. http://andrewsfantasy.homestead.com/files/orpheus3.htm Kyle Meador Mirrors, In Reflections of Christ June 2003 http://reflectionsofchrist.org/Articles/Mirrors%20-%20June%202003.htm Voices Of Youth: Ismaels Story http://www.hrwcalifornia.org/Student%20Task%20Force/BeahSTORY.htm Not found http://www.angelfire.com/boybands/nyckyshosted/chrissymoffatts1-5.html Fish Griwkowsky No pants? Even funnier http://www.canoe.ca/JamEdmontonFringe/review99_bushed.html http://www.angelfire.com/oh/Acie/page7.html http://www.guidelines.org/commentaries/mar07_03.asp Theria Kamikaze Slayer Luna http://slayers.aaanime.net/~linazel/fanfics/fanfic.asp?fanfic=kamikaze&part=1 Bronte, Emily : Wuthering Heights http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=25704290&textr eg=1&query=I+advised+her+to+value+him+the+more+for+his+affection&id=BroWut h http://www.angelfire.com/oh/Acie/page7.html Tisas Online Stories http://tisa.stdragon.com/chapters10.htm Nanaki The Aftermath Of Mount Woe Chapter 31. The Courtship Of Princess Schala www.icybrian.com/fanfic/nanaki/omw31.html http://www.kididdles.com/mouseum/r003.html http://www.tatianart.com/prose_sketches.htm Linda Larsen Say the Magic Words http://www.lindalarsen.com/2003/html/articles/pt61.html SVSY Books http://www.springnet.cc/~cinemakatie/price10.html Tim Wildmon Call 'Em Like You See 'Em http://www.family.org/fofmag/pf/a0026173.cfm Mihoko Takahashi Mathis: Exploring Gender Issues in the Foreign Language Classroom: An Ethnographic Approach
4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
97
24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41
42 43 44
45
http://members.at.infoseek.co.jp/gender_lang_ed/articles/mathis.html http://www.tatianart.com/prose_sketches.htm Eating Contests from Tales Of Old Winster http://users2.ev1.net/~earthlings/Tales_of_Old_Winster_2.htm http://www.recmusic.org/lieder/get_text.html?TextId=12516 http://www.greenmanreview.com/tradruss.html Karen Bliss Seinfeld stand-up not just for laughs http://www.canoe.ca/TheatreReviewsI/imtellingyouforthelast.html http://www.adequacy.net/reviews/t/texlahoma.shtml http://www.epinions.com/content_103569329796 Albany, George : The Queen's Musketeer; or, Thisbe, the Princess Palmist. Dime Library No. 76, Chapter I http://www.niulib.niu.edu/badndp/albany_george.html Western New York Railroad Archive http://wnyrails.railfan.net/news/c0000037.htm http://www.mystery-pages.de/takethat/ttrev4.htm Mark L. Warner Locks and Wandering http://www.econline.net/Knowledge/Articles/wandering1.html Amy Argetsinger Loss of Innocence On U.S. Campuses Washington Post February 5, 2001; http://gutches.net/deafhotnews/murder2/washpost_feb5a.pdf Julie Ovenell-Carter Hurry up and slow down http://virtualu.sfu.ca/mediapr/sfu_news/archives/sfunews03040402.htm http://www.mystery-pages.de/takethat/ttrev4.htm Hermie Harmsen Microbes are everywhere and they will follow us into space http://www.desc.med.vu.nl/NL-taxi/SAMPLE/SAM-page1.htm Bill Simmons Stop me before I get nostalgic http://espn.go.com/page2/s/simmons/021113.html http://www.angolotesti.it/testi/sum41/viewlyrics.asp?id=5 http://www.sing365.com/music/lyric.nsf/Since-The-Last-Time-I-Saw-You-lyricsYolanda-Adams/5C6351D7D817994948256C6B000ECF0E http://curious.astro.cornell.edu/question.php?number=5 http://www.adequacy.net/reviews/t/texlahoma.shtml Freeman, Mary Eleanor Wilkins: The Copy-Cat http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=surround&offset=108933057 &tag=+Freeman,+Mary+Eleanor+Wilkins,+1852-1930+:+The+CopyCat,+&+Other+Stories+/+Mary+E.+Wilkins+Freeman++1910+&query=Lucy+took+re fuge+in+her+little+harbor+of+ignorance&id=FreCopy Jerusalem Quarterly File. Issue 5, 1999. http://www.jqfjerusalem.org/1999/jqf5/sarsar.html Twain, Mark : Innocents Abroad http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=272460371&text reg=1&query=The+island+in+sight+was+Flores&id=TwaInno Delany, Martin : Blake; or the huts of America, Part I. http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=78100673&textr eg=1&query=Fastened+by+the+unyielding+links+of+the+iron+cable+of+despotism&i d=DelBlak Twain, Mark : Innocents Abroad http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=surround&offset=272815237 &tag=+Twain,+Mark,+18351910+:+Innocents+Abroad++&query=At+Pisa+we+climbed+up+to+the+top+of+the+s
98
46
47
48
49
50
51
52
53 54
trangest+structure+the+world+has+any+knowledge&id=TwaInno Pete Harrison Divings for losers who says otherwise? http://www.divernet.com/safety/talkph0100.htm Verne, Jules : Off on a Comet http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=279365028&text reg=1&query=Servadac+listened+attentively&id=VerOffo SVSY Books http://www.springnet.cc/~cinemakatie/price10.html Alcott, Louisa May : Little Women http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=2132056&textre g=1&query=Like+bees+swarming+after+their+queen&id=AlcLitt Twain, Mark : A Connecticut Yankee in King Arthur's Court http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=276551787&text reg=1&query=seeing+me+pacific+and+unresentful,+no+doubt+judged+that&id=Twa Yank FanFics: Harry Potter and The Queen of Ice Jenn http://www.fronskiefeint.com/queenoficefic4.html Alcott, Louisa May : Little Women http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=1842138&textre g=1&query=Gondola+after+gondola+swept+up&id=AlcLitt Doyle, Arthur Conan : The White Company http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=92581864&textr eg=1&query=The+highway+had+lain+through+the+swelling+vineyard&id=DoyWhit Austen, Jane : Pride and Prejudice http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=11400964&textr eg=1&query=Elizabeth,+as+they+drove+along,+watched+for+the+first+appearance&i d=AusPrid Potter, Beatrix : The Tale of Mr. Jeremy Fisher http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=211976476&text reg=1&query=boat+was+round&id=PotFish http://music.hyperreal.org/artists/brian_eno/HCTWJlyrics.html Dreiser, Theodore : Sister Carrie http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=94098182&textr eg=1&query=Drouet+took+on+a+slightly+more+serious+tone&id=DreSist Trollope, Anthony : Can You Forgive Her? http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=266515793&text reg=1&query=said+the+old+man+to+her+when&id=TroForg http://judy.jteers.net/koalition/change.html Dickens, Charles : Oliver Twist http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=82641735&textr eg=1&query=Another+morning.+The+sun+shone+brightly:+as+brightly+as+if+it+loo ked&id=DicOliv http://www.bananacafe.ca/0312/fr-life-1c-0312.htm Austen, Jane : Mansfield Park http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=9716482&textre g=1&query=Sir+Thomas,+meanwhile,+went+on+with+his+own+hopes&id=AusMans Trollope, Anthony : Can You Forgive Her? http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=266515793&text reg=1&query=said+the+old+man+to+her+when&id=TroForg
99
55 56
57 58
59
60 61 62 63 64 65 66
http://goanna.info/tall_women's_clothing.html Doyle, Arthur Conan : The Captain of the Polestar and other Tales http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=86297933&textr eg=1&query=It+was+nine+o'clock+on+a+Wednesday+morning&id=DoyCapt www.blackmask.com/books80c/silfox.htm Shakespeare, William : The First Part of King Henry IV http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=185061198&text reg=1&query=Nay,+then+I+cannot+blame+his+cousin+king&id=Mob1He4 www.is1.org/dreaming.html Davis, Richard Harding. : The Red Cross Girl http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=75849285&textr eg=1&query=Latimer+went+on+his+way+without+asking+any+sympathy&id=DavRe dC Sahar Huneidi The Holiday Season: Ready Or Not, Here It Comes! http://www.psychicsahar.com/artman/publish/article_102.shtml Burnett, Frances Hodgson : The Lost Prince http://etext.virginia.edu/etcbin/ot2wwwebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=31608129&textr eg=1&query=A+note+of+terror+broke+into+his+voice&id=BurLoPr www.anandamarga.or.id/contents.asp?cntn=360 www.sibs.org.uk/sians_story.html Jane Austen Northanger Abbey Chapter 11, http://etext.library.adelaide.edu.au/a/a93n/chap11.html http://www.hendersonville-pd.org/nurseryround.html Martin Overby What's on the Tube? http://www.worldinvisible.com/newsltr/yr1998/05-07/9805over.htm http://gibby16.diaryland.com/030614_47.html Willa Sibert Cather The Professor's Commencement http://etext.lib.virginia.edu/etcbin/browse-mixednew?id=CatComm&tag=public&images=images/modeng&data=/texts/english/modeng /parsed www.unl.edu/Cather/works/short/professor.htm
100
Appendix B: Jokes Used in the Training Set

1. Knock, Knock Who is there? Justin Justin who Justin time for dinner. Knock, Knock Who is there? Jess Jess who? Jess me, open the door. Knock, Knock Who is there? Howie Howie who? Howie going to figure this out? Knock, Knock Who is there? Ima Ima who? Ima coming in, so open up. Knock, Knock Who is there? Juana Juana who? Juana come out and play? Knock, Knock Who is there? Thumping Thumping who? Thumping slimy is on your leg. Knock, Knock Who is there? Disease Disease who? Disease pants fit you? Knock, Knock Who is there? Oliver Oliver who? Oliver you there are bugs. Knock, Knock Who is there? Bugspray Bugspray who? Bugs pray that snakes won't eat them 10. Knock, Knock Who is there? Holden Holden who? Holden I'll go see. 11. Knock, Knock Who is there? Panther Panther who? Panther no pants, I am going swimming. 12. Knock, Knock Who is there? Butcher Butcher who? Butcher arms around me. 13. Knock, Knock Who is there? Jamaica Jamaica who? Jamaica good grade on your test? 14. Knock, Knock Who is there? Olive Olive who? Olive right next door to you. 15. Knock, Knock Who is there? Olive Olive who? Olive you darling. 16. Knock, Knock Who is there? Irish Irish who? Irish you would let me in. 17. Knock, Knock Who is there? Atunna Atunna who? Atunna trouble if you don't let me in. 18. Knock, Knock Who is there? Wayne Wayne who? Wayne, wayne, go away, come again another day.
2.
3.
4.
5.
6.
7.
8.
9.
101
19. Knock, Knock Who is there? Canary Canary who? Canary come out and play? 20. Knock, Knock Who is there? Sadie Sadie who? Sadie magic words, and I'll tell you. 21. Knock, Knock Who is there? Alltell Alltell who? Alltell mom if you don't let me in. 22. Knock, Knock Who is there? Butter Butter who? Butter up! I am throwing a fast ball. 23. Knock, Knock Who is there? Arthur Arthur who? Arthur more jokes here? 24. Knock, Knock Who is there? Candy Candy who? Candy come out and play? 25. Knock, Knock Who is there? Butter Butter who? Butter not tell you. 26. Knock, Knock Who is there? Keith Keith who? Keith me sweetheart. 27. Knock, Knock Who is there? Samoa Samoa who? Samoa of these bad jokes and I am gone.
28. Knock, Knock Who is there? Italian Italian who? Italian you for the last time: open up. 29. Knock, Knock Who is there? Avenue Avenue who? Avenue heard this joke before? 30. Knock, Knock Who is there? Dwayne Dwayne who? Dwayne the bathtub, I am dwowning. 31. Knock, Knock Who is there? Stan Stan who? Stan back, I think I am going to sneeze. 32. Knock, Knock Who is there? Freeze Freeze who? Freeze a jolly good fellow. 33. Knock, Knock Who is there? Lettuce Lettuce who? Lettuce in, we are freezing. 34. Knock, Knock Who is there? Doris Doris who? Doris locked, that's why I am knocking. 35. Knock, Knock Who is there? Harry Harry who? Harry up and answer the door. 36. Knock, Knock Who is there? Beef Beef who? Beef-ore I get mad you better let me in.
102
37. Knock, Knock Who is there? Max Max who? Max no difference to you, just let me in. 38. Knock, Knock Who is there? Celeste Celeste who? Celeste time I am telling you open up. 39. Knock, Knock Who is there? Ken Ken who? Ken you tell me some good jokes? 40. Knock, Knock Who is there? Heaven Heaven who? Heaven you heard enough Knock Knock jokes? 41. Knock, Knock Who is there? Police Police who? Police tell me some Knock Knock jokes. 42. Knock, Knock Who is there? Luke Luke who? Luke out! Here's another one! 43. Knock, Knock Who is there? Dawn Dawn who? Dawn by the station, early in the morning. 44. Knock, Knock Who is there? Wade Wade who? Wade down upon the Swanee River. 45. Knock, Knock Who is there? Isabel Isabel who? Isabel out of order? I had to knock.
46. Knock, Knock Who is there? Alison Alison who? Alison to you after you listen to me. 47. Knock, Knock Who is there? Dismay Dismay who? Dismay not be a funny joke. 48. Knock, Knock Who is there? Waiter Waiter who? Waiter dress is unbutton. 49. Knock, Knock Who is there? Amanda Amanda who? Amanda fix the refrigerator is here. 50. Knock, Knock Who is there? Noah Noah who? Noah good place to find more jokes? 51. Knock, Knock Who is there? Althea Althea who? Althea later, Alligator. 52. Knock, Knock Who is there? Winnie Winnie who? Winnie is good, he is very, very good. 53. Knock, Knock Who is there? Cargo Cargo who? Cargo beep beep, varoom. 54. Knock, Knock Who is there? Wendy Wendy who? Wendy last time you took a bath?
103
55. Knock, Knock Who is there? Anna Anna who? Anna body knows some more jokes? 56. Knock, Knock Who is there? Statue Statue who? Statue that laughed a minute ago? 57. Knock, Knock Who is there? Mary Lee Mary Lee who? Mary Lee, Mary Lee, life is but a dream. Row 58. Knock, Knock Who is there? Ammonia Ammonia who? Ammonia trying to be funny. 59. Knock, Knock Who is there? Radio Radio who? Radio not, here I come. 60. Knock, Knock Who is there? Amos Amos who? Amos-quito bit me. 61. Knock, Knock Who is there? Andy Andy who? Andy bit me again. 62. Knock, Knock Who is there? Vera Vera who? Vera few people think these jokes are funny. 63. Knock, Knock Who is there? Shelby Shelby who? Shelby coming round the mountain when
64. Knock, Knock Who is there? Les Les who? Les hear another jokes 65. Knock, Knock Who is there? Alaska Alaska who? Alaska one more time, then jokes start over. 66. Knock, Knock Who is there? Europe Europe who? Europe early this morning, aren't you?
104
Appendix C: Jokes Used in the Test Set

The last line in each joke shows the expected punchline
1. Knock, Knock Who is there? Winnie Winnie who? Winnie finally shows up for work, tell him hes fired. When he finally shows up for work, tell him hes fired. Knock, Knock Who is there? Freddy Freddy who? Freddy or not, here I come! Ready or not, here I come! Knock, Knock Who is there? Hughs Hughs who? Hughs cars arent brand new Used cars arent brand new Knock, Knock Who is there? Texas Texas who? Texas are high in this country. Taxes are high in this country. Knock, Knock Who is there? Hank Hank who? You are welcome. Knock, Knock Who is there? A.C. A.C. who? A.C. come A.C. go Easy come easy go 7. Knock, Knock Who is there? Luke Luke who? Luke out the window and see. Look out the window and see. Knock, Knock Who is there? Recycle Recycle who? Recycle around town on our bikes. We cycle around town on our bikes. Knock, knock Who is there? Thelma Thelma who? Thelma I went out for pizza Tell ma I went out for pizza
8.
2.
9.
3.
4.
10. Knock, Knock Who is there? Kenya Kenya who? Kenya give me a hand? Can you give me a hand? 11. Knock, Knock Who is there? Avis Avis who? Avis-itor from Mars! A visitor from Mars! 12. Knock, Knock Who is there? Sari Sari who? Sari I was sarong! Sorry I was sarong! 13. Knock, Knock Who is there? Sikkim Sikkim who? Sikkim and youll find him. Seek him and youll find him.
5.
6.
The joke is not expected to be recognized
105
14. Knock, Knock Who is there? Olive Olive who? Olive me, why not take olive me. All of me, why not take olive me. 15. Knock, Knock Who is there? Ammonia Ammonia who? Ammonia bird in a gilded cage. Im only a bird in a gilded cage. 16. Knock, Knock Who is there? Samoa Samoa who? Samoa coffee, please! Some more coffee, please! 17. Knock, Knock Who is there? Uganda Uganda who? Uganda come in without knocking You cant a come in without knocking 18. Knock, Knock Who is there? Chuck Chuck who? Chuck-ago, Chuckago, that wonderful town. Chicago, Chicago, that wonderful town. 19. Knock, Knock Who is there? Jose Jose who? Jose, can you see? Oh, say, can you see? 20. Knock, Knock Who is there? Amazon Amazon who? Amazon of a gun. Im a son of a gun. 21. Knock, Knock Who is there? Ptolemy Ptolemy who? Ptolemy that you love me. Tell me that you love me.
22. Knock, Knock Who is there? Wanda Way Wanda Way who? Wanda Way, and youll be lost. Wander away, and youll be lost. 23. Knock, Knock Who is there? Jewel Jewel who? Jewel know who when you open the door. You will know who when you open the door. 24. Knock, Knock Who is there? Fido Fido who? Fido known you were coming, Idve baked a cake. If Id known you were coming, Idve baked a cake. 25. Knock, Knock Who is there? Irish Irish who? Irish you a merry Christmas. I wish you a merry Christmas. 26. Knock, Knock Who is there? Atlas Atlas who? Atlas the suns come out. Ill just stay out here. At last the suns come out. Ill just stay out here. 27. Knock, Knock Who is there? Eileen Eileen who? Eileen too hard on this door and itll break better open up! I lean too hard on this door and itll break better open up! 28. Knock, Knock Who is there? Marjorie Marjorie who? Marjorie found me guilty and now Im in jail. My jury found me guilty and now Im in jail.
106
29. Knock, Knock Who is there? Theodore Theodore who? Theodore is locked and I cant get in. The door is locked and I cant get in. 30. Knock, Knock Who is there? Your maid Your maid who? Your maid your bed, now lie in it. You made your bed, now lie in it. 31. Knock, Knock Who is there? Euell Euell who? Euell miss out on a big opportunity if you dont open the door soon. Youll miss out on a big opportunity if you dont open the door soon. 32. Knock, Knock Who is there? Al B. Al B. who? Al B. back. Ill be back. 33. Knock, Knock Who is there? Tarzan Tarzan who? Tarzan stripes forever. Stars and stripes forever. 34. Knock, Knock Who is there? Irish Irish who? Irish I could carry a tune. I wish I could carry a tune. 35. Knock, Knock Who is there? Gnu Gnu who? Gnu Zealand is a cool place to visit. New Zealand is a cool place to visit.
36. Knock, Knock Who is there? Amish Amish who? Thats funny. You dont look like a shoe.**** 37. Knock, Knock Who is there? Sweden Sweden who? Sweden the lemonade, its bitter. Sweeten the lemonade, its bitter. 38. Knock, Knock Who is there? Decanter Decanter who? Decanter at my temple is almost eighty years old. The cantor at my temple is almost eighty years old. 39. Knock, Knock Who is there? Dishes Dishes who? Dishes the end of the world. Good bye to all! This is the end of the world. Good bye to all! 40. Knock, Knock Who is there? Amanda Amanda who? Amanda fix your TV set! A man to fix your TV set! 41. Knock, Knock Who is there? C.D. C.D. who? C.D. badge Im holding? This is police. Open up! See the badge Im holing? This is police. Open up! 42. Knock, Knock Who is there? Aussie Aussie who? Aussie you later, mate. Ill see you later, mate.
****
107
43. Knock, Knock Who is there? Candice Candice who? Candice be true love at long last? Can this be true love at long last? 44. Knock, Knock Who is there? Hour Hour who? Hour you today? Im pretty good myself. How are you today? Im pretty good myself. 45. Knock, Knock Who is there? Marie Marie who? Marie Christmas to all! Merry Christmas to all! 46. Knock, Knock Who is there? Water Water who? Water our chances of winning the lottery? What are our chances of winning the lottery? 47. Knock, Knock Who is there? Hatch Hatch who? Bless you. 48. Knock, Knock Who is there? Army Army who? Army and my friends invited to your Halloween party Are me and my friends invited to your Halloween party 49. Knock, Knock Who is there? Demure Demure who? Demure I get, demure I want The more I get, the more I want
50. Knock, Knock Who is there? Whale Whale who? Whale meet you in the bar around the corner. Well meet you in the bar around the corner. 51. Knock, Knock Who is there? Cairo Cairo who? Cairo the boat for awhile? Can I row the boat for awhile? 52. Knock, Knock Who is there? Hugo Hugo who? Hugo and see for yourself. You go and see for yourself. 53. Knock, Knock Who is there? Wooden Wooden who? Wooden it be nice to have Mondays off? Wouldnt it be nice to have Mondays off? 54. Knock, Knock Who is there? Abby Abby who? Abby Birthday to you! Happy Birthday to you! 55. Knock, Knock Who is there? Thesis Thesis who? Thesis a stickup! This is a stickup! 56. Knock, Knock Who is there? Ale Ale who? Ale! Ale! The gangs all here. Hale! Hale! The gangs all here.
108
57. Knock, Knock Who is there? Ox Ox who? Ox me for a date and I may say yes. Ask me for a date and I may say yes. 58. Knock, Knock Who is there? Mary Mary who? Mary me, my darling. Marry me, my darling. 59. Knock, Knock Who is there? Frosting Frosting who? Frosting in the morning brush your teeth. First thing in the morning brush your teeth. 60. Knock, Knock Who is there? Oil Oil who? Oil change, just give me a chance Ill change, just give me a chance. 61. Knock, Knock Who is there? Gnats Gnats who? Gnats not funny! Open up! Thats not funny! Open up! 62. Knock, Knock Who is there? Vericose Vericose who? Vericose knit family. We stick together. Were a close knit family. We stick together. 63. Knock, Knock Who is there? Police Police who? Police open the door. Im tired of knocking. Please open the door. Im tired of knocking.
64. Knock, Knock Who is there? Dee Dee who? Dee-livery. Open-up your pizzas getting cold. Delivery. Open-up your pizzas getting cold. 65. Knock, Knock Who is there? Zeus Zeus who? Zeus house is this anyway? Whose house is this anyway? 66. Knock, Knock Who is there? Ivan Ivan who? Ivan to come in. Its cold out here. I want to come in. Its cold out here. 67. Knock, Knock Who is there? Asia Asia who? Asia father home? He owes me money. Is your father home? He owes me money. 68. Knock, Knock Who is there? Selma Selma who? Selma shares in the company. The stock is going down. Sell me shares in the company. The stock is going down. 69. Knock, Knock Who is there? Harriet Harrier who? Harriet too much. Theres nothing left for me. Harry ate too much. Theres nothing left for me. 70. Knock, Knock Who is there? Dewey Dewey who? Dewey have to go to the dentist? Do we have to go to the dentist?
109
71. Knock, Knock Who is there? Stan Stan who? Stan up straight and stop slouching! Stand up straight and stop slouching! 72. Knock, Knock Who is there? Irving Irving who? Irving a good time on vacation. Wish you were here. Having a good time on vacation. Wish you were here. 73. Knock, Knock Who is there? Maryanne Maryanne who? Maryanne and live happily ever after. Marry Anne and live happily ever after. 74. Knock, Knock Who is there? Mandy Mandy who? Mandy lifeboats. The ship is sinking! Man the lifeboats. The ship is sinking! 75. Knock, Knock Who is there? Wayne Wayne who? Wayne are we gonna eat? Im starving. When are we gonna eat? Im starving.
78. Knock, Knock Who is there? Watson Watson who? Nothing much. Watson with you? Nothing much. Whats who with you? 79. Knock, Knock Who is there? Thermos Thermos who? Thermos be someone home, I see a light on. There must be someone home, I see a light on. 80. Knock, Knock Who is there? Sahara Sahara who? Sahara you dune? What in the hell are you doing? 81. Knock, Knock Who is there? Shannon Shannon who? Shannon, Shannon harvest moon, up in the sky Shine on, Shine on harvest moon, up in the sky 82. Knock, Knock Who is there? Iowa Iowa who? Iowa lot to my brother. I owe a lot to my brother.
76. Knock, Knock 83. Knock, Knock Who is there? Who is there? Wiley Hugh Wiley who? Hugh who? Wiley was sleeping my wife packed my things and Hi, there moved me out of the apartment. While I was sleeping my wife packed my things and 84. Knock, Knock moved me out of the apartment. Who is there? Wallabee 77. Knock, Knock Wallabee who? Who is there? Wallabee sting if you sit on it? Boo Will a bee string if you sit on it? Boo who? Dont cry, sweetie pie.
110
85. Knock, Knock Who is there? Catch Catch who? Gesundheit! 86. Knock, Knock Who is there? Sheby Shelby who? Shelby coming round the mountain when she comes Shell be coming round the mountain when she comes... 87. Knock, Knock Who is there? Yah Yah who? Gosh, I am glad to see you too 88. Knock, Knock Who is there? Elsie Elsie who? Elsie you later! Ill see you later! 89. Knock, Knock Who is there? Annie Annie who? Annie-body home? Anybody home? 90. Knock, Knock Who is there? Dots Dots who? Dots for me you know, and for you to find out! Thats for me you know, and for you to find out! 91. Knock, Knock Who is there? Demons Demons who? Demons are a ghouls best friend. Diamonds are a girls best friend.
92. Knock, Knock Who is there? Surreal Surreal who? Surreal pleasure to be here. Its a real pleasure to be here. 93. Knock, Knock Who is there? Harmony Harmony who? Harmony times do I have to knock before you let me in? How many times do I have to knock before you let me in? 94. Knock, Knock Who is there? Detail Detail who? Detail-aphone man! The telephone man! 95. Knock, Knock Who is there? Pencil Pencil who? Pencil fall down without suspenders. Pants will fall down without suspenders. 96. Knock, Knock Who is there? Wayne Wayne who? Wayne drops keep falling on my head... Rain drops keep falling on my head... 97. Knock, Knock Who is there? Avoid Avoid who? Avoid to the vise is sufficient. A word to the wise is sufficient. 98. Knock, Knock Who is there? Avenue Avenue who? Avenue met me somewhere before? Havent you met me somewhere before?
111
99. Knock, Knock Who is there? Cargo Cargo who? Cargo honk, honk. Car go honk, honk. 100. Knock, Knock Who is there? Tarzan Tarzan who? Tarzan tripes forever! Stars and stripes forever! 101. Knock, Knock Who is there? Turnip Turnip who? Turnip the stereo. I love this song. Turn up the stereo. I love this song. 102. Knock, Knock Who is there? Canada Canada who? Canada boys comma over to play poker? Can the boys comma over to play poker? 103. Knock, Knock Who is there? Recent Recent who? Recent you a bill the first of the month We sent you a bill the first of the month 104. Knock, Knock Who is there? Heavenly Heavenly who? Heavenly met somewhere before? Havent we met somewhere before? 105. Knock, Knock Who is there? Omar Omar who? Omar darling Clementine Oh my darling Clementine 106. Knock, Knock Who is there? Barn Barn who? Barn to be wild! Born to be wild!
107. Knock, Knock Who is there? Tamara Tamara who? Tamara I have an important meeting. Tomorrow I have an important meeting. 108. Knock, Knock Who is there? Oscar Oscar who? Oscar for a date and maybe shell go out with you! Ask her for a date and maybe shell go out with you! 109. Knock, Knock Who is there? Otto Otto who? Otto theft is a serious crime. Auto theft is a serious crime. 110. Knock, Knock Who is there? Im Helen Im Helen who? Im Helen wheels. VAROOM! Im hell on wheels. VAROOM! 111. Knock, Knock Who is there? Vision Vision who? Vision you a happy New Year! Wishing you a happy New Year! 112. Knock, Knock Who is there? Rabbit Rabbit who? Rabbit up nice. Its a Christmas gift. Wrap it up nice. Its a Christmas gift. 113. Knock, Knock Who is there? I, Felix I, Felix who? I, Felix-cited. I feel excited.
112
114. Knock, Knock Who is there? Urn Urn who? Urn your keep by finding a job. Earn your keep by finding a job. 115. Knock, Knock Who is there? Venice Venice who? Venice pay day. Im broke. When is pay day. Im broke. 116. Knock, Knock Who is there? Market Market who? Market paid in full. Mark it paid in full. 117. Knock, Knock Who is there? Laurie Laurie who? Laurie, Laurie hallelujah. Glory, glory hallelujah. 118. Knock, Knock Who is there? Butcher Butcher who? Butcher arms around me and give me a big hug. Put your arms around me and give me a big hug. 119. Knock, Knock Who is there? Ferris Ferris who? Ferris fair, so dont cheat. Fair is fair so dont cheat. 120. Knock, Knock Who is there? Ammonia Ammonia who? Ammonia lost person looking for directions. I am only a lost person looking for directions
121. Knock, Knock Who is there? Midas Midas who? Midas well sit down and relax. Might as well sit down and relax. 122. Knock, Knock Who is there? Gopher Gopher who? Gopher a swim. It will refresh you. Go for a swim. It will refresh you. 123. Knock, Knock Who is there? Rice Rice who? Rice and shine, Sleepyhead! Rise and shine, Sleepyhead! 124. Knock, Knock Who is there? Jonas Jonas who? Jonas for a cocktail after work. Join us for a cocktail after work. 125. Knock, Knock Who is there? Turnip Turnip who? Turnip the TV. I cant hear the news. Turn up the TV. I cant hear the news. 126. Knock, Knock Who is there? Ken Ken who? Ken you open the door already? Can you open the door already? 127. Knock, Knock Who is there? Myron Myron who? Myron around the park made me tired. My run around the park made me tired. 128. Knock, Knock Who is there? Rita Rita who? Rita good book lately? Read a good book lately?
113
129. Knock, Knock Who is there? Ida Ida who? Ida written sooner, but I lost your address. Id have written sooner, but I lost your address. 130. Knock, Knock Who is there? Menu Menu who? Menu wish upon a star, good things happen. When you wish upon a star, good things happen.
114
Appendix D: KK Recognizer Algorithm Description

IsItJoke 1 if ValidateJokeStructure returns that the joke has correct structure 1.1 if the keyword used in Line3 is known to the program, i.e. the program has known wordplay for Line3 1.1.1 for each known wordplay based one Line3 1.1.1.1 if wordplay is at least two words long 1.1.1.1.1 if pairs of words in wordplay exists in bigram table 1.1.1.1.1.1 if ValidatePunchline (wordplay) 1.1.1.1.1.2 return JOKE FOUND 1.1.1.2 else 1.1.1.2.1 if ValidatePunchline (wordplay) 1.1.1.2.1.1 return JOKE FOUND 1.2 call GenerateWordplay(Keyword) 1.3 for each generated wordplay 1.3.1 if wordplay is at least two words long 1.3.1.1 if pairs of words in wordplay exists in bigram table 1.3.1.1.1 if ValidatePunchline (wordplay) 1.3.1.1.1.1 return JOKE FOUND 1.3.2 else 1.3.2.1 if ValidatePunchline (wordplay) 1.3.2.1.1 return JOKE FOUND 2 return JOKE NOT FOUND
ValidateJokeStructure 1 if Line1 OR Line2 are not valid 1.1 return false 2 read Line3 3 set Keyword to Line3 without spaces or punctuation marks 4 if Line4 is not valid OR Line5 does not contain Line3 4.1 return false 5 return true
GenerateWordplay input TopPhrase: an element with a structure containing a string and a similarity value 1 for each Letter in the string of TopPhrase 1.1 if Letter is not replaced from original keyword 1.1.1 for each line in the similarity table 1.1.1.1 if Letter is the same as entry in first column of similarity table 1.1.1.1.1 copy TopPhrase into NewPhrase 1.1.1.1.2 replace Letter from string of NewPhrase with entry from second column of similarity table 1.1.1.1.3 similarity of NewPhrase = similarity of TopPhrase 1 + entry from third column of similarity table 1.1.1.1.4 insert NewPhrase into heap
115
ValidatePunchline input: phrase a sequence of words 1 set firstWord to first word in phrase 2 set secondWord to second word in phrase 3 set lastWord to last word in phrase 4 set sndToLast to second to last word in phrase 5 set firstAfter to first word in punchline after phrase 6 set secondAfter to second word in punchline after phrase 7 set firstBefore to first word in punchline before phrase (if exists) 8 set secondBefore to second word in punchline before phrase (if exists) //phrase in the beginning or middle of punchline 9 if (sndToLast, lastWord, firstAfter) is not the database and sndToLast exists 9.1 return false 10 if (lastWord, firstAfter, secondAfter) is not the database 10.1 return false //phrase in the end or middle of punchline 11 if (firstBefore, firstWord, secondWord) is not the database and secondWord exists 11.1 return false 12 if (secondBefore, firstBefore, firstWord) is not the database 12.1 return false 13 return true
116
Appendix E: A table of Similarity of English consonant pairs using the natural classes model, developed by Stefan Frisch
117
Appendix F: Cost Table developed by Christian Hemplemann
118

Knock Knock Jokes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Knock Knock Jokes

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF CINCINNATI

May 24, 2004 Date:___________________

Julia Michelle Taylor I, _________________________________________________________,

This work and its defense approved by:

Computational Recognition Of Humor In A Focused Domain

3 Statistical Measures in Language Processing........................................ 46

4 Possible Methods for Joke Recognition ................................................. 50

4.2 Punchline Detector................................................................................................. 51 4.3 Restricted Context ................................................................................................. 52

9 Training Text ............................................................................................ 67

10 Experimentation and Analysis........................................................... 73

There has been very limited success in achieving true computational

To illustrate this idea, consider Joke3:

The authors provide this explanation:

Script-based Semantic Theory of Humor

To show how SSTH works, Raskin [1985] analyzes Joke4.

1. EQUAL OR BELONG TO A SET 2. EXIST 3. SPATIAL 4. MUST

(ii) THE det

1. DEFINITE 2. UNIQUE 3. GENERIC

1. ACADEMIC 2. MEDICAL 3. MATERIAL

1. SPATIAL 2. TARGET 3. OCCUPATION 4. STATE 5. CAUSE

1. RESIDENCE 2. SOCIAL 3. HABITAT 4. ORIGIN 5. DISABLED 6. OBJECTIVE

General Theory of Verbal Humor

Script Opposition (SO): deals with script opposition presented in SSTH.

Joke: {LA, SI, NS, TA, SO, LM}

Script Opposition Logical Mechanism Situation Target Narrative Strategy Language

To illustrate GTVH Joke5 and Joke6 are examined:

Consider another joke:

Veatchs Theory of Humor

Level Level 1 Level 2 Level 3

Logic Not-V V and N V and not-N

Commitment None Weak Strong

Gets No Yes Yes

Perceiver Is Offended Sees Humor No No No Yes Yes No

Table1: The three-level scale*

Taken from Veatch [1998]

Joke9: Diane: I want to go to Tibet on our honeymoon.

Sam: Of course, we will go to bed.

Structural Ambiguity in Jokes

2.3.1.1 Plural and Non-Count Nouns as Ambiguity Enablers

Non-count nouns are as powerful in creating ambiguity:

In Joke14 a non-count noun ground can be either a verb or a noun.

2.3.1.2 Conjunctions as Ambiguity Enablers

2.3.1.3 Construction A Little as Ambiguity Enabler

2.3.1.4 Can, Could, Will, Should as Ambiguity Enablers

The Structure of Punchline

Hetzron [1991] offers an analysis of punchlines in jokes.

Joke2 is an example of expected relative identity used as absolute [Hetzron, 1991].

Retro-Communication is another example where wordplay can be used:

Entry1 is an illustration of the algorithm:

This entry produces Joke25.

Joke26: What do you call a quirky quantifier? An odd number

Make Elmo return a carefully scripted reply to archive a smoother response.

An example of the third level of integration is given below [Loehr, 1996]:

The program consists of three modules [McKay, 2002]:

The friendly gardener had thyme for the woman!

Ynperfect Pun Selector

Some of the examples of the acronym reanalysis are [Stock, 2002]:

MIT (Massachusetts Institute of Technology) Theology

ACM (Association for Computing Machinery) Machinery

Association for Confusing

Joke28: I hate seafood, Tom said crabbily.

Everything produced by this generator is in the form of Template3:

Template3: SENTENCE said Tom ADV[manner]