You are on page 1of 32

phonological phenomena * multilingual text extraction * topic tracking * machine translation

digital libraries * text mining * tutorial dialogue systems * mulltimodal interfaces * speech synthesis * dialog systems * minority languages
computer-assisted language learning * question answering * information theory * technology supported education * language modeling

Language Technologies Institute

Carnegie Mellon

information retrieval * computational linguistics * machine learning * biosequence modeling


ランゲージ テクノロジー Sprachtechnologie

Tecnologías del lenguage


Technologies de la langue

“Thus it may be true that the way to translate from Chinese to Arabic, or
from Russian to Portuguese, is not to attempt the direct route... Perhaps the
way is to descend, from each language, down to the common base of human
communication--the real but as yet undiscovered universal language--and
then re-emerge by whatever particular route is convenient.”
- Warren Weaver
2
Contents
Overview 4

Ongoing Research 6

Academic Programs 18

LTI Courses and 20


Admissions

Faculty 22

Contents
3

Language Technologies Institute

© 2004 Carnegie Mellon University


Edited by C. Adèle Weitz
Overview

Language Technologies Institute


The Language Technologies Institute (LTI) in the School of Computer Science at Carnegie Mellon University conducts
research and provides graduate education in all aspects of language technologies, including computational linguistics,
machine translation, speech recognition and synthesis, statistical language modeling, information retrieval and web search
engines, text mining, information management, digital libraries, intelligent tutoring, and more recently bio-sequence/bio-
language, structure and function analysis (genome, proteome). The LTI combines linguistic approaches with machine
learning and corpus-based methods, depending on the scientific questions investigated and project needs.

The LTI was established in 1996, combining the Center for Machine Translation (CMT), which was founded in 1986, and
other areas of computational language research. The LTI contains a unique mix of theoretical and systems-building re-
searchers specializing in various aspects of computer science, artificial intelligence, computational linguistics and machine
learning, and provides a rich and diverse environment for collaboration among faculty, graduate students, visiting scholars,
and research staff. As part of the School of Computer Science at CMU, LTI faculty and students collaborate closely with
members of the Computer Science Department, the Center for Automated Learning and Discovery, the Robotics Institute,
The Institute for Software Research International, and the Human-Computer Interaction Institute. Collaborative research
areas include mobile computing, computational biology, multi-agent systems, cognitive modeling, intelligent tutoring sys-
tems, multi-media interfaces, text and data mining, artificial intelligence systems, and machine learning theory and algo-
rithms.

The LTI offers both a Masters and PhD in Language Technologies.


The curricula of the graduate programs are based on a set of core courses
that include linguistic and statistical methods for language analysis,
fundamental computer science, and in-depth coverage of focus areas
in language technology such as machine translation, information re-
trieval, and speech recognition. Students benefit from a modular set of
laboratory courses, in which they learn the basics of natural language
technology through intensive hands-on practice. In addition, students
have the opportunity to expand their education with courses from the
other institutes in the School of Computer Science (listed above), in-
cluding courses in algorithms, artificial intelligence, computer systems,
machine learning, statistics, and computational biology.

In addition to fundamental and theoretical research, the LTI very much


focuses on large-scale challenges of consequence to industry, govern-
ment or society in the large, often spawning start-up companies, or
Overview

international-scale projects. The original LYCOS search engine, for


instance, was created at CMU, as was the original VIVISIMO meta-
search engine. The C-STAR international speech-translation consor-
tium, and the Universal Library project were also initiated at CMU –
the goal of the latter being the dissemination of the collected works of
humankind worldwide with free universal access.

The LTI offers and encourages collaboration with industry, national or


international, ranging from industrial affiliates and in-residence visit-
ing researchers, to targeted industrial education programs and longer-
term joint R&D projects. Such projects have produced practical re-
sults including high-accuracy machine translations for Caterpillar Inc,
and the Condor search engine used in Korea.
4
Carnegie Mellon University and Pittsburgh
Located in Pittsburgh, Pennsylvania, Carnegie Mellon Univer-
sity is well known for its interdisciplinary research centers and
institutes, such as the Pittsburgh Supercomputing Center, the Soft-
ware Engineering Institute, the Data Storage Systems Center, The
Information Networking Institute, the Engineering Design Re-
search Center, the Institute for Complex Engineered Systems,
the Robotics Institute, the Human-Computer Interaction Insti-
tute, the Language Technologies Institute, The Systems and Se-
curity Center and the Entertainment Technology Center. In these
organizations, researchers and faculty from diverse disciplines
collaborate on problems, benefiting from varied viewpoints and
scientific approaches.

Once the greatest steel-producing city in the United States, to-


day Pittsburgh is a medium-size modern city. With growing high-
technology industries such as computer science and biotechnol-
ogy, Pittsburgh’s energy reflects the vision of the future more
than the shadow of the past. Many high-technology industries
are locating research labs in Pittsburgh, including Intel, Seagate,
Rand, Hyundai, and others. Some might be surprised to learn
that Pittsburgh is economically diverse, visually exciting, archi-
tecturally active, ethnically rich and educationally innovative.

Carnegie Mellon is well integrated into its Pittsburgh surround-


ings. Ten minutes east of the downtown business district, the
103-acre campus is situated in Oakland, the educational and
medical mecca of the city. In addition to Carnegie Mellon, there
are four other institutions of higher education in this section of
the city, which provide a wide range of educational opportuni-
ties. Adjacent to campus is the 500-acre Schenley Park, com-
plete with public golf course, tennis courts, outdoor pool, ice-skating rink, and numerous jogging, mountain biking and
cross-country skiing trails.

Overview
5

Language Technologies Institute


Ongoing Research

Research Methodology

The modeling of human language lies at A Linguistic Model that is refined with
Research Methodology

A Statistical Model that can be refined


the confluence of linguistics, artificial in- with expert knowledge Machine Learning
telligence, statistics, machine learning,
and cognitive science, and has been the As one example, sequential models such Another example is the framework devel-
focus of intense research in the past few as hidden Markov models are of great oped within the Avenue project for infer-
decades. use in language technologies such as ring transfer rules for Machine Transla-
speech recognition and information ex- tion from bilingual data. The learned rules
Research efforts at the LTI pursue a vari- traction from text and web documents. are mappings between syntactic structures
ety of approaches, from linguistic and Although often effective, hidden Markov between the two languages, and are used
knowledge-based methods, to supervised models make strong independence as- in a runtime Machine Translation system
and unsupervised statistical, corpus- sumptions that can limit their predictive to produce translations. Researchers at the
driven machine learning techniques. performance. Research by LTI faculty LTI have created a theoretical framework
has led to alternative frameworks, such that defines the kinds of models, i.e. gram-
In all of these approaches, domain as Conditional Random Fields, that make mar rules, that can be learned from data.
knowledge and theory (linguistic or sta- fewer independence assumptions and A collection of specific instances of such
tistical) inspire the general technical ap- allow more expert knowledge to be in- rules, constituting a “transfer grammar”
proach to solving a language technology corporated as features into the model. for a particular pair of languages, is then
problem, and suggest the overall structure These models have been applied to in- inferred from a balanced word-aligned
of the model. Data is then used to esti- formation extraction, parsing, and even small corpus of human translated sen-
mate model parameters, and to further image analysis, and have more recently tences. The framework addresses ques-
refine the model, suggest salient features, been adopted to biological sequence tions about how rules can combine to
optimize parameters, and ultimately, as- analysis problems such as gene finding cover more complex structures, what
sess the quality and viability of the ap- and protein super-secondary structure types of linguistic features can be ex-
proach. The general research paradigm prediction. pressed in the rules and passed from one
is much like that used in other areas of rule to another, and so forth. The result-
science and engineering: a model is for- ing “model” (i.e set of transfer rules) for
mulated and used to make predictions, and a particular language pair can be refined
those predictions are then evaluated in Amino Protein in an active learning feedback loop where
an effort to improve the model in an it- Acid Structural users correct the output of the Machine
erative process. Sequence Knowledge Translation system, and the system auto-
Ongoing Research

matically uses the corrections augmented


with active learning methods to refine the
Learned
underlying rule set.
Structural
Data Knowledge Mapping
Model Bilingual Theoretical
Data Learning
Framework

Structural
Model Predictions Learned
Translation
Rules

Fig 2: Biosequence to 3D structure pre-


dictions in Biolinguistics
Predictions Machine
Translation
Output

Fig 1:Research methology integrating Fig 3: Data, knowledge, model and pre-
6

data, knowledge and model. dictions in AVENUE’s translation rule


learning system.
Machine Translation

Translation of human language was one of projects such as BABYLON and Diplomat/ are investigating methods such as unsu-

Machine Translation
the very first tasks attempted by the devel- Tongues, successful multi-lingual commu- pervised learning of complex morphol-
opers of first digital computers in the nication is achieved by augmenting this ogy, and transfer-rule induction from lim-
1950s. Over fifty years later, fully auto- limited-quality MT with human interaction ited numbers of selected word-aligned
matic Machine Translation (MT) remains in order to help resolve translation errors. phrases and sentences, via machine learn-
one of the most difficult and challenging ing methods such as seeded version
topics of research within Artificial Intelli- There is currently active research being spaces. Although we have had initial suc-
gence. With the emergence of universal conducted within the LTI on all of the ma- cess, much of the challenge remains be-
access to information enabled by today’s jor approaches to MT. Each of these ap- fore us.
internet, language has become a critical proaches has some unique strengths but
barrier to global information access and also inherent weaknesses and limitations. Another aspect in which MT systems dif-
communication, and the need for MT is Consequently, different approaches are fer is in their input/output modes: text ver-
greater than ever before. The LTI origi- suitable for different scenarios. Our main sus speech. The JANUS project, for in-
nated as the Center for Machine Transla- research thrusts are in machine learning stance, combines speech recognition with
tion in the mid-1980s, and MT continues approaches to MT, including corpus-based language translation in the large. Other
to be a prominent sub-discipline of research approaches such as Generalized Example- projects, such as the Speechalator, pro-
with the LTI. The LTI is unique in the Based MT and Statistical MT systems that duced limited-scope, speech-to-speech
breadth of MT problems and approaches have focused primarily on Chinese-to-En- MT on a hand-held device that also in-
that are being investigated and pursued in glish and Arabic-to-English MT. Addition- cludes fluent speech synthesis. The area
the context of a variety of research projects, ally, we conduct ongoing work on Multi- of speech-to-speech MT is still young and
and in the number of faculty and research- Engine MT (MEMT), combining the re- growing, with technical difficulties, such
ers involved in MT research. sults of different MT techniques in order as how to translate from a lattice of sen-
to exploit each technique’s strong points. tence recognition hypotheses, produced
Part of the excitement of MT as a research by the speech recognizer, rather than sim-
field lies in its wide range of challenges, Since traditional rule-based approaches to ply a single known sentence as input.
from cutting-edge applications that are MT require lengthy development cycles, Human factors issues include clarifica-
commercially feasible today, through tech- and corpus driven MT requires large tion dialogs when the recognition or the
niques that could have practical applica- amounts of pre-translated parallel text for translation is problematic, and how to
tion within a few years, to problems that training, the LTI is investigating alterna- train users implicitly to use the system
will not be fully solved until the advent of tive MT paradigms for minority languages, for maximum effectiveness.

Ongoing Research
true Artificial Intelligence. The ultimate such as Quechua and Mapudungún. The
goal of this area can be characterized as goal of the AVENUE project is to produce Finally, we have some non-traditional
machine translation that is: (1) general pur- MT systems requiring neither extended projects, such as investigating Dolphin
pose (any topic or domain); (2) high qual- human development cycles (too costly) or language (we kid thee not), and whether
ity (human quality or better); and (3) fully huge parallel corpora (not available). We it can be interpreted.
automatic. Remarkably, our current MT
capabilities can reasonably satisfy any two
of these three criteria, but we cannot yet
meet all three at once. Our KANT project Types of Machine Translation
produces fully automatic, high quality
translations for information dissemination Interlingua
in well-defined technical domains such as
Semantic Sentence
electric power utility management and Analysis Planning
heavy equipment technical documentation
(as in the Catalyst application for Cater- Transfer Rules
Syntactic Text
pillar). Our Example-based MT and Sta- Generation
Parsing
tistical MT systems can produce fully au-
tomatic translation in broad or unlimited Source Target
domains, but have not yet approached or (Arabic) Direct: SMT, EBMT (English)
surpassed human quality levels. In other
7

Language Technologies Institute


Ongoing Research
Tutoring Systems
In collaboration with the Human-Com- Tutorial dialogue is a unique, intensely In order for tutorial dialogue systems to
puter Interaction Institute, the LTI fac- dynamic form of instruction that allows for become widespread and have a real impact
ulty carries out research in intelligent tu- a high degree of expressiveness and sensi- on education, it is imperative that they
toring, focusing on explorations of the tivity both in terms of the tutor’s adapta- become easier to build. Another active area
role of language in learning and learning tion of material to the individual needs of of research at the LTI is building tools, such
research, and specifically how language students as well as in the opportunity it cre- as Carmel-Tools for authoring domain
Tutoring Systems

technology can be used to support that ates for students to make their thinking specific knowledge sources for robust
endeavor. A major thrust of this research transparent to the tutor. State-of-the-art processing of student explanations, to
is to explore issues related to eliciting and tutorial dialogue systems have focused on facilitate this process. Thus, beyond
responding to productive student expla- leading students through directed lines of producing technology to be used to address
nation behavior. It involves many reasoning to support conceptual under- our own theoretical questions about
broader issues such as influencing student standing, clarifying procedures, or coach- learning interactions, we aim to produce
expectations, motivation, and learning ing the generation of explanations for jus- reusable technology that can facilitate the
orientation. This interdisciplinary re- tifying solutions, problem solving steps, work of other researchers pursuing their
search agenda involves five primary foci: predictions about complex systems, or un- own related questions. Our ultimate goal
derstanding of computer architecture. is to produce resources that are simple
*Controlled experimentation and analy- Evaluations of state-of-the-art tutorial dia- enough to be used by non-AI researchers
sis of student interactions with each other logue systems provide a powerful proof- and practitioners, such as education
as well as with human tutors and com- of-concept, demonstrating conclusively researchers, domain experts, and
puter tutors in order to explore the stimuli that the language technology exists for sup- instructors, and thus to put the power of
that encourage productive student behav- porting productive learning interactions in tutorial dialogue in the hands of those with
ior, appropriate learning orientation, and natural language. Carnegie Mellon re- the pedagogical expertise to maximize its
ultimately effective learning searchers are at the forefront of this move- effectiveness and meet the real needs of
ment, both in terms of producing landmark students.
*Analysis of think aloud protocols in systems and widely used
learning scenarios in order to better un- resources.
derstand the process of learning
* Basic research in language technology A current thrust of this
to support the semi-automatic analysis of work at the LTI involves
language interactions in learning sce- pushing this technology
narios (text classification, automatic es- into new areas, such as
say grading, etc.) supporting design
activities in an exploratory
Ongoing Research

* Basic research in dialogue technology learning environment. An


to enable interaction in natural language important part of this work
in tutorial environments between humans involves the development
and computer tutors or to support inter- of the DReSDeN tutorial
actions in natural language between hu- dialogue planner to
man learners in collaborative settings (ro- manage a range of mixed-
bust language understanding, dialogue initiative tutorial dialogue
management, etc.) strategies including
negotiation dialogue for
* Development of easy-to-use tools for encouraging students to
building scalable language interaction develop the skills to ask
interfaces and tutorial environments more themselves important
generally (semantic knowledge source questions leading to a
authoring tools, etc.) thoughtful, reflective
decision making process.
Another current thrust of
this work involves the
optimization of time
management in tutorial
dialogue interactions.
8
CALL - Computer Aided Language Learning
Learning a new language is a process of trial powerful pinpointing techniques to dis- faculty have projectss with experts in
and error where the student observes lan- cover where errors lie in elicited speech. Intelligent Tutoring from the Human
guage, tries to imitate it, and finds out how The use of natural grammars allow us to Computer Interaction Institute as well as
good the imitation was. Until recently, the analyze student writing and suggest cor- with psychologists and language learn-
best ways to learn a new language were ei- rections. Natural language grammars also ing specialists at Carnegie Mellon and the

CALL
ther by going to the country or by having a allow us to read along with a student and University of Pittsburgh.
personal tutor. First attempts to create lan- give help in understanding a passage upon
guage learning software lacked feedback and request. By using information retrieval,
authenticity. This is where language tech- we can find appropriate texts for a
nologies are starting to provide a potent al- student’s level of reading and lexical and
ternative. The use of speech recognition has grammar knowledge and can give other
enabled students to speak to a system and researchers the tools to determine how
find out exactly what phonetic and prosodic hard a new text can be and still be effec-
errors were made and how to correct them. tive, for example, what percentage of new
Using modeling of native and non-native words can be in a text which still allows
speech and knowledge of the native lan- the student to generalize the meaning. For
guage of the student, we have developed learning research, members of the LTI

The Universal Library

Universal Library
The central goal of the Universal Library is uted to centers in those countries, where than specific terms, which suggests a
to digitize, index and make universally avail- personnel furnished by their governments multimodal rather than a pure text-based
able all published works of humankind, in- scan books for two shifts per day. As of interface.
cluding books, periodicals, artwork and fall 2004 about 100,000 volumes have
music. A further goal is to provide value- been scanned. These are passed through Copyright is major barrier to free distri-
added information services such as auto- optical character recognition software, bution of content. The vast majority of
mated summarization, reading assistants, full indexed, and added to the Universal Li- works ever published are still in copy-
content search and translation over the brary. Approximately half of the scanned right. Of these, more than 90% are out
internet in any language. Imagine the situa- books are in English; the remainder are of print, which means that they produce
tion in which every researcher, every teacher, in a wide range of other languages. The no revenue either for the author or the
every citizen and even every schoolchild first million books are expected to be publisher. We are endeavoring to encour-
would have everything ever written at her complete by the end of 2006, at which age publishers to allow the Universal Li-
fingertips, regardless of what country she point we will embark on the Ten Million brary to scan their out of print books and
lives in, her economic status, the school she Book Project. permit them to be viewed on the Internet
attended, or her native language. The am- and retrieved through search engines.
plification of human potential would be vast, The Language Technologies Institute and Publishers who do this often find an in-

Ongoing Research
and would lead to the ultimate democrati- ISRI provide indexing software and stor- creased demand for their books. We are
zation of information and knowledge. Like age infrastructure for the Million Book also working with the government of In-
Rome, the Universal Library is not built in a Project. The CMU Libraries furnish dia to develop a new copyright statute that
day, but rather it is a project for the ages, metadata, archiving and copyright clear- would provide funds, analogous to the
requiring constant improvement and enrich- ance support. A number of research UK Public Lending Right, to be distrib-
ment. projects are underway to explore appli- uted to copyright owners whose works
cations and uses for the Universal Library. are accessed on the Internet, with micro-
We estimate that approximately 100 million One of them, the Universal Dictionary, is payments to be provided through public
different books have been published in the an effort to build a database of every word funds. Eventually the availability of such
history of the world, but only a tiny fraction in every language. This will serve as a payments could remove further obstacles
are available digitally. The remainder must basic resource for machine translation and to offering copyrighted material.
first be scanned – a labor-intensive chore. multilingual searching. We are also ex-
Toward that end, with the support of the ploring new methods of navigating huge The Universal Library enjoys coopera-
National Science Foundation, we have or- text spaces. As the size of the collection tive relationships with other institutions,
ganized the Million Book Project, a joint grows, the limitations of keyword search, including the Internet Archive, the Digi-
effort of CMU and the governments of In- particularly for multilingual queries, be- tal Library Federation, and the National
dia and China, with other countries such as come severe. What is needed is a lan- Academy Press.
9

Egypt and Turkey joining the cause. Hun- guage-independent search method that is
dreds of digital scanners have been distrib- able to retrieve based on concepts rather

Language Technologies Institute


Ongoing Research
Computational Biology

Large amounts of genomic and protein The goal is to derive new hypotheses to at each position in the alignment. We de-
sequence data for homo sapiens and correlate these building blocks with struc- rived an algorithm for systematically iden-
other organisms have become available, tural, dynamic and functional “meaning” tifying the conservation of specific physi-
together with a growing body of corre- for different living organisms in terms of cal-chemical properties in individual posi-
lated protein structure and function data folding, activity, interactions, and path- tions in a multiple sequence alignment. We
creating an opportunity for addressing ways. For example, one of the important have applied our method to the diverse
the sequence mapping and structure challenges we try to tackle is the predic- GPCR family and demonstrate the compu-
folding problems with increasingly so- tion of super secondary structures such as tational significance of the properties we
phisticated data-driven (statistical and the beta helix (Figure). We work with su- have identified by successfully using them
computational) methods to discover, per secondary protein structure experimen- to predict whether specific amino acids will
Computational Biology

characterize and model regularities and talists so that the hypotheses generated occur in particular positions in the align-
outliers in the biological data. Machine from this computational approach can be ment. We have also used our method to
learning methods, with large amounts of tested by wet lab experiments. annotate Rhodopsin, a well characterized
data, led to multiple breakthroughs in member of the GPCR family, with a
language technologies such as automatic In another attempt to relate protein primary selectional pressure profile, which allowed
speech recognition, document classifi- sequence to its structure and function, we us to biologically interpret our findings.
cation, information extraction, statisti- have engaged in a project to infer evolu- We further applied our method to a mul-
cal machine translation and other chal- tionary selectional pressure from residue tiple sequence alignment of an HIV-I pro-
lenging natural language processing conservation in multiple sequence align- tein, and are gearing to apply it to a large
tasks. Our research exploits the analogy ment of proteins families. Many measures set of protein families, including crystallins
between mapping words to meaning via have been proposed for quantifying the and various globins. Looking ahead, we
syntax, in order to decipher the funda- overall degree of sequence conservation in plan to refine our method by incorporating
mental meaningful building blocks of a multiple sequence alignment of protein. phylogenetic histories, and separating mu-
biological sequence language, via its However, these measures fail to identify tation.
structure to its underlying function. which particular properties are conserved
The above examples illustrate the power
of combining computational linguistics,
statistical machine learning and bio-se-
quence/structure/function discovery; we
expect to tackle other interesting problems
in this field, such as protein-protein inter-
action predictions and aspects of immune
system modeling at the molecular level.
Ongoing Research

Figure: Crystal structure of pectate lyase


(pdb id 1EE6), a triple beta helix protein
(www.rcsb.org).
10
RADAR: Reflective Agent with Distributed Adaptive Reasoning
The RADAR project is an example of a web site, producing coherent reports from plans to create a new one that fits the
large-scale interdisciplinary research disorganized snippets of information, and current situation. Perhaps it can simply
project conducted at the LTI. RADAR is a dealing with the constant flood of e-mail. ask for advice. But even turning free-
five-year research project that spans many Additionally, all of the capabilities of RA- form advice into an executable plan of
units in Carnegie Mellon’s School of Com- DAR will be tested jointly in a crisis man- action is a difficult research problem.
puter Science, including many LTI re- agement task, such as re-organizing a con- The technologies underlying RADAR
searchers. The overall goal is to develop a ference on very short notice after losing its range from planning and problem solv-
software-based “cognitive personal assis- planned venue. ing (e.g. unifying hard and soft time and
tant” who could anticipate the needs of his space constraints), to natural language
superiors. RADAR will help busy manag- The key scientific challenge is to endow processing (e.g. extracting useful infor-
ers work more effectively, by coping RADAR with enough flexibility, learning mation from email streams), to dynami-
adaptively with various tasks ranging from capabilities and general knowledge to cally adaptable user interaction (e.g.

RADAR
routine to complex problem solving. This handle all these diverse tasks, including when and how to ask for advice or offer
new technology should be equally valuable requests and situations that were not an- suggestions). All of these capabilities
to managers in industry, academia, and gov- ticipated by RADAR’s designers. When depend on machine learning technology:
ernment. RADAR will help its human mas- faced with a surprising new request, RA- learning from human advice, learning by
ter in many ways: scheduling meetings, al- DAR might not know how to proceed, but observation, learning by active experi-
locating resources, maintaining a project it should do something sensible. Perhaps mentation, and transferring the results of
it can weave together fragments of old learning across problems and domains.

Ongoing Research
11

Language Technologies Institute


Ongoing Research
Speech Processing

Speech Processing fact that the speech signal can be heavily has several pronunciations depending on
Speech is the most natural way for humans affected by background noises, channel dis- whether it is a year”nineteen eighty four,”
to communicate and we find it so easy to tortions, or cross-talk, but also that spoken a quantity “one thousand nine hundred
use, that most of us are surprised to learn speech varies in speaking style, speed, and (and) eighty four,” or a telephone number
how complex the processing of spoken lan- content. More difficulties arise in speech which can be pronounced “one nine eight
guage actually is. As one of our goals at recognition because different words might four.”
LTI is to make speech communication with be pronounced the same (as in “two”, “to”,
and through computers more useful, we and “too”), one word might be pronounced • Linguistic analysis: Once given the
work on improving the fundamental tech- differently (such as “the” in “the teen” vs words, we still require the pronunciations.
Speech Processing

nologies of automatic speech processing, “the adult”) and also because speech is This can be done by a pronunciation dic-
i.e. speech recognition and speech synthe- spoken continuously, so it provides no natu- tionary. However, no matter how large the
sis. Also, we develop new technologies ral segmentation. For instance, the same dictionary is, we will still encounter words
using those components, such as speech- phonetic sequence can be segmented into outside of its vocabulary due to neologisms,
to-speech translation, spoken dialog sys- two different word sequences: “This ma- names, etc. Therefore, a letter-to-sound
tems, audio-based information extraction chine can recognize speech,” or “This ma- rules system is also required. Prosody, in-
and retrieval, and computer aided language chine can wreck a nice beach.” Which se- cluding tune, duration, and phrasing are the
learning. quence will be picked depends on the ex- components that make speech interesting.
pectation of the listener. There are many ways to pronounce words;
Speech Recognition In order to learn the knowledge in the com- recreating the prosody and style (i.e. po-
Automatic Speech Recognition is the pro- ponents of the automatic speech recognizer, lite or urgent) makes the speech more un-
cess of decoding a spoken speech signal namely the acoustic models, the pronun- derstandable and more acceptable to us-
into a written form, that is, a sequence of ciation dictionary, and the language model, ers.
words. To do this, the analog speech sig- today’s speech recognition algorithms must
nal needs to be digitized and then –for ef- use data from which those models are •Waveform generation: Currently, the most
ficiency reasons-- reduced to its essential trained. Thus, the acoustic model learns the common form of constructing waveforms
relevant information, which is mainly done most likely way people pronounce particu- from phonetic and prosodic descriptions is
by a form of frequency decomposition (the lar phonemes in particular contexts. The by concatenating short pieces of pre-re-
picture below shows a spectrogram repre- pronunciation dictionary models the most corded natural speech and modifying their
sentation of a speech waveform from con- likely sequences of phonemes to build prosody to match the desired form. Tradi-
versationally spoken speech. The bright- words, and the language model learns the tional approaches record all phone-phone
ness of the colors indicate the energy level most likely sequences of words to build transitions in a language (called diphones).
present at a given frequency. The final rep- sentences. The language model is statisti- Although this technique is robust, more
resentation of speech in the computer is a cally trained and scores all of the possible general unit selection synthesis, in which
stream of parameter vectors over time. phrases that could have been spoken. It the database contains more varied speech
These vectors will be classified into pho- differs from more traditional parsing tech- with multiple examples of phones in vari-
nemes – the smallest linguistically distinct niques, although they may overlap, since ous contexts, and an appropriate selection
sounds of a language. For this purpose pro- speech is less likely to be in traditional lin- algorithm, seem to offer promise of much
higher quality speech.
Ongoing Research

totypes of these phonemes (so-called guistic sentences.


acoustic models) will be trained before-
Speech Synthesis Speech Processing Research Projects
hand. With the help of the pronunciation
Speech synthesis is the process of generat- Throughout the LTI’s speech processing
dictionary that relates each word to a con-
ing natural sounding and appropriate research there are always two directions
catenation of phonemes, the speech de-
speech from text or other computer-read- that influence the work: knowledge driven
coder can find possible word candidates
able formats. The task can be viewed in and data driven. A substantial amount of
and in combination with the language
three parts: knowledge is required in order to build
model the most likely sequence of words
such systems. Knowledge of acoustic pho-
is chosen to transcribe the spoken speech
• Text Analysis: General text may contain netics, pronunciation, linguistics, signal
signal.
numbers, abbreviations, and other non- processing etc is needed to define the
Humans are able to understand equally well
standard words that require proper treat- framework within which we are working.
the articulated read speech of a TV news
ment if they are to be pronounced intelli- For example how to find the phoneme set
anchor speaker and our over-excited friend
gibly. In English, the string of digits “1984” for languages that have not yet been stud-
12

calling from a loud party. What seems


so easy for us, is very difficult for ma-
chines: part of the difficulty lies in the
•Language Modeling:
ied, or how to find pronun- Classical language
ciation of words that are modeling techniques
not found in a dictionary, collect statistics on
how to include knowledge word tri-grams (three
of syntax and semantics to word sequences), but
aid speech processing. because there are
Human language complex- many words in the lan-
ity and variability is such that no hand-
written rules can cover all cases, thus our • Multilingual Speech Recognition: guage and even more tri-grams, col-

Speech Processing
Since speech recognition is the most lecting enough data to find all ex-
knowledge based techniques are also amples is difficult. Thus, smoothing
closely coupled with statistically based natural means to allow communication
across language and culture barriers, and back-off techniques are often re-
methods. A common theme appearing quired. A number of new language
throughout all the LTI’s work is develop- speech recognizers in many different
languages are an essential prerequisite modeling techniques are being in-
ing and applying machine learning tech- vestigated within the LTI, including
niques to appropriately defined knowledge for making speech-driven communi-
cation applications attractive and class-based language models that
driven frameworks to improve the useful- consider whole word classes, not
ness of the work. These interdisciplinary available to the public. The project
GlobalPhone focuses on the rapid de- just individual words, for n-grams.
approaches encourage sharing of tech- Additionally, new statistical model-
niques over different projects: language ployment of speech recognizers in
many languages, i.e. by reducing the ing techniques such as maximum
modeling techniques may also be used in entropy are being developed to give
text summarization; novel machine learn- required effort in terms of time and
cost to build such recognizers, thus en- better estimates of the probability
ing techniques may be applied to speech distribution of word sequences.
problems. The LTI’s speech research al- abling support for languages for which
lows standard components to be used in few or no resources are available.
• Synthesis using Festival, FestVox,
other larger projects thus making them
more useful, but also offering greater chal- • Meeting Summarization: A micro- and Flite: To ensure that our speech
phone records a multi-person meeting. synthesis work is available to the
lenges in the application of speech and widest range of users, we work with
language that lead to more fundamental Off-line speech recognition technol-
ogy transcribes the meeting, including the University of Edinburgh’s Fes-
research. In the following we list a selec- tival Speech Synthesis System, a
tion of projects and applications which the difficult task of separating the
voices and identifying the speaker. free software synthesis toolkit and
are currently under development in LTI: engine. We have also produced the
Information retrieval technology is
then used to index the data so we can FestVox tools for building new
• Speech-to-Speech Translation: The voices in new languages allowing
Consortium for Speech Translation answers queries such as “Find the part
where Bob and Jane talked about next the construction of both general
Research (C-STAR) is a speech-to- voices, and domain specific voices.
speech translation system developed years’ budget.”
Also with the small footprint CMU

Ongoing Research
jointly between CMU and a interna-
tional partners from Japan, Korea, • Dialog systems: The CMU DARPA Flite system, synthesis can be used
Communicator project allows experi- on any platform.
Italy, France, and China. Here,
speech recognition must deal with ments in mixed initiative dialog be-
tween humans and machines via the • Robust speech recognition: current
many languages, recognize and trans-
late in real-time, and handle many dif- telephone in the domain of flight, ho- automatic speech recognition sys-
ferent users. Recent work investi- tel and car rental information. This tems are limited in their ability to
gates deploying such systems on re- requires accurate, real-time recogni- adapt to the effects of new speak-
source constrained mobile devices, tion across the reduced bandwidth and ers, difficult acoustical environ-
and improve the robustness and qual- potentially noisy telephone, real-time ments, non-native accents, and
ity of domain dependent translation. access to networked information, natu- spontaneous speech production.
In the project STR-Dust (“Speech ral language generation, parsing, syn- Researchers at the LTI are carrying
Translation for Domain Unlimited thesis, and a dialog manager. out a broad program of research to
Spontaneous Communication Tasks”) improve the robustness of automatic
13

we push the limits of today’s speech • Computer aided language learning: speech recognition using a variety
translation coverage to rather unlim- The Fluency project applies speech of techniques.
ited domains, such as in meetings, lec- recognition techniques to aid non-na-
tures, and news. tives in pronunciation.

Language Technologies Institute


Ongoing Research
Information Retrieval and Text Mining
Web search engines are a well-known type of Information Retrieval (IR) system that uses statistical inference to locate documents
that satisfy an information need. Greater search engine accuracy is always useful, but finding information is no longer the most
important IR problem. Today’s tools often deliver much more information than a person can read easily. IR and Text Mining
research at the LTI involves a wide range of issues related to finding, validating, organizing, summarizing, analyzing, and commu-
nicating large amounts of information. Our goal is to enable people to routinely base decisions on far more information than is
practical today.

Machines cannot understand the meaning of a multimedia document in the way that a human can, but many useful tasks can be
accomplished with limited forms of understanding. Statistical corpus analysis, probabilistic inference, and machine learning are the
tools of IR and Text Mining research. Research at the LTI is grounded in theory, and tested in large-scale applications. Conse-
quently, research projects focus on everything from basic theory to software engineering. Several representative examples are
Information Retrieval and Text Mining

described below.

Distributed IR (federated search) in the


real world: Peer to peer networks con-
tain searchable digital libraries (leaf
nodes), and directory services (hub
nodes) that route messages and merge
results from different sites.

Research on Advanced IR Architec- Translingual Information Retrieval The Distributed Information Retrieval
tures develops systems that combine uses queries in one language (e.g., En- project studies environments, such as the
standard search queries with detailed, glish) to find documents in other lan- Web, large corporate networks, and peer-
long-term user and task models and guages (e.g., German, Chinese and Ara- to-peer networks, in which thousands of
highly structured documents. Document bic). Traditional machine translation search engines are available. Cooperation
structure may indicate how the document methods do not work well when queries cannot be assumed, so robust techniques
is organized (e.g., XML), or it may be are short, out of context, and not sen- are required for automatically character-
provided by language analysis tools (e.g., tences. Our research focuses on corpus- izing search engines, selecting among
named-entities, part-of-speech, syntactic based translation of query terms by learn- them, searching them, and integrating re-
parsing). This research supports LTI ing empirical associations among multi- sults retrieved from different sources.
projects on open-corpus language tutor- lingual lexicons from translation mates
ing, such as the REAP reading compre- (documents, paragraphs, passages or sen- The Language Technologies Institute pio-
hension project. Much of this research tences), and by mapping queries and neered research in automated Text Sum-
is distributed via the open-source Lemur documents to conceptual ‘interlingua’ that marization with the Maximum Marginal
Ongoing Research

Toolkit . bridge the language barrier. Relevance Metric and its application to
user-profile-relevant document summari-
zation. Research also focuses on summa-
rizing dialog clusters of topically related
documents and automated generation of
briefings from corpora.
14

The REAP project uses word histo-


grams to model how children use lan-
guage at different ages. These allow
the Lemur search engine to select texts
that use vocabulary a particular child
is likely to understand.
The Lemur Toolkit for Language Model- * Information extraction from free text, * Large public comment databases
ing and Information Retrieval is an open- involves tasks such as detecting entities are a feature of modern democratic so-
source softwate toolkit developed at the (people, places, organizations, etc.), de- cieties. Email and the Web make it easy
LTI and the University of Massachusetts. tecting roles (e.g. Clinton as senator, or for people to express their opinions
The heart of Lemur is a set of indexing Carter as peace envoy), and detecting re- about proposed government policies
methods that support a wide range of IR lations (who does what to whom). One and regulations, consumer products,
capabilities. Lemur includes multilingual concrete accomplishment is a software and a wide range of other topics. Popu-
search engines that are based on several package called Minor Third for learning- lar and controversial topics can quickly
probabilistic and vector-space retrieval based information extraction (using cause hundreds of thousands of com-
models. Its powerful query language and HMMs, CRFs and other techniques). ments. Text mining research at the LTI

Information Retrieval and Text Mining


support for text annotations and document Information extraction is used in ques- includes interactive methods of orga-
structure make it particularly useful for tion answering, in topic detection and nizing, summarizing, and exploring
question answering, language tutoring and tracking, and in cognitive agents (e.g. in large databases of unstructured text
other research at the LTI. It also includes the RADAR project). Information extrac- comments. Tools of this type can in-
a variety of other IR capabilities, such as tion may also be used to instantiate tables crease the responsiveness and transpar-
federated search, text summarization and (e.g. what product is sold by whom at ency of democratic governments, and
document clustering. Lemur is used in what price) for classical data mining. allow companies to better track cus-
universities and research laboratories tomer opinions about products and ser-
around the world. For more information see * Topic detection and tracking (TDT) vices.
www.lemurproject.org utilizes supervised learning to track top-
ics or events defined by examples texts, Part of text mining relates strongly to
and unsupervised learning to detect the information retrieval, using notions such
Text mining research at the LTI consists as inverted indexing and text-to-text
of four major related categories, listed emergence of new topics or events in
news streams. The latter entails detect- similarity metrics: cosine similarity in
below. high-dimensional vector spaces, and
ing novelty – i.e. discovering in which
ways to detect that a textual description generative similarity in statistical lan-
* Text categorization and filtering is a guage modeling approaches. The LTI
form of supervised machine learning, of an event indicates a novel one, even if
it may contain points in common with is active in all aspects of text mining,
where texts are assigned categories – e.g. including computational methods that
emails to folders, web pages to taxonomic earlier different events of like type. TDT
is inherently a dynamic time-series learn- scale to large real-world challenges, and
classes, books to catalog codes – first by that apply to different languages.
training statistical classifiers from ex- ing challenge, where topics may drift over
ample corpora, and classifying new texts time, events initiate and fade from the
with the trained classifiers. Filtering is a news, or morph into other events.
form of dynamic text categorization,
where the categories are defined implic-
itly and evolve over time.

15 Ongoing Research

Language Technologies Institute


Ongoing Research

Knowledge-Based NLP and Question Answering

The LTI has a long history of research Knowledge Acquisition from Natural at helping users to find answers in on-
in knowledge-based natural language Language Text line documents. More advanced ques-
processing and computational linguis- tion-an-answering (QA) systems, such as
tics, dating back to Carbonell’s work The knowledge acquisition bottleneck has LTI’s JAVELIN project, use NLP tech-
on knowledge-based interlingual ma- long been decried as one of the limiting niques (segmentation, stemming, pars-
chine translation and Tomita’s work on factors for applications of artificial intelli- ing, semantic interpretation, unification,
efficient natural language parsing tech- gence – how can we get all of the appro- etc.) to a) understand the underlying
niques, when the precursor of the LTI priate world knowledge into the computer meaning of the questions they are posed,
Knowledge-Based NLP

was Carnegie Mellon’s “Center for so that it can solve problems of practical and b) find the most likely answers in
Machine Translation.” Of particular significance in a new domain? In our re- the target collection(s). Information gath-
note are the KANT and KANTOO search on knowledge acquisition from text, ering becomes a collaborative process,
systems developed by Nyberg, we are working to define a formal map- where the system and user work together
Mitamura and Carbonell that brought ping between specific structures in natural in an ongoing dialog to refine the search
high-accuracy interlingua machine language and corresponding meaning rep- for ever-better answers. In addition to
translation to large-scale practical use resentations in a formal representation (e.g. research on basic parsing and interpre-
for translating technical literature at frame logic). The goal of CMU’s contri- tation of unrestricted text, we are also
Caterpillar Inc. to several languages. bution to the HALO-II project is to reduce actively working on: a) information gath-
This line of work is characterized by the cost of encoding knowledge for a prob- ering dialogs; b) sophisticated multi-
careful linguistic analysis, large-scale lem-solving system by making it possible layer retrieval strategies; c) a range of
knowledge engineering, and solid sys- to acquire knowledge directly from an ex- approaches to information extraction
tem building. More recently, knowl- isting text (e.g., from a textbook). Current (pattern-based, statistical, etc.); and d)
edge-based systems are combined with work focuses on acquiring various types synthesis of answers from multiple an-
machine learning, such as in the AV- of knowledge (ontologies, rules, processes, swer candidates. The team is also inves-
ENUE project where translation trans- etc.) from college textbooks in domains tigating the use of dynamic planning in
fer rules are learned from a minimal such as Biology, Chemistry, and Physics. QA (e.g., to explore first the strategies
number of word-aligned translation that are most likely to yield a good an-
pairs via new techniques such as Open-Domain Question Answering for swer when processing time is limited),
seeded-version-space learning. The Multi-Lingual Text Collections and use of reasoning and belief networks
current “pure” knowledge-based to piece together individual pieces of in-
projects at LTI are: As the size of available on-line text collec- formation from several documents.
tions grows ever larger, simple search en-
gines are becoming less and less effective
Ongoing Research

New Bill of Rights

Get the right information


To the right people
At the right time
On the right medium
In the right language
With the right level of detail
16
Industrial Programs

At the LTI we welcome industrial partici- tric, and ATR research. We engage in the Customized intensive courses –
pation in our research, in the form of in- following types of relations: ranging from executive one- or two-
dustrial affiliates, customized education day sessions to much longer and
programs, guest researchers participating Affiliateships – where industry participates more technical targeted offerings on
in common projects, and funded R&D ef- in the results of LTI research, including pub- selected topics in Language Tech-
forts. The LTI collaborates with many in- lications, data sets, software, and briefings. nologies and its applications.
dustrial partners, national and international,
with several of these partnerships actively In-residence fellows – where industrial re- Sponsored R&D projects – where
LTI technologies are extended, in-

Industrial Programs
ongoing and others having successfully searchers or engineers join the LTI for a
concluded. LTI industrial partners, past specified duration to receive full-immersion tegrated and/or applied to address
and present, include: IBM, Intel, SRI, training in new technologies, to work on joint challenging problems in industry or
Hitachi, Fujitsu, Siemens, Denso, Vulcan, projects, or both. government.
Ontoprise, Lycos, Northrup Grunman,
Hewlett Packard, Boeing, Justsystems,
CNRI, Dong-A-Seetech, Caterpillar,
Carnegie Group,Dynamix, General Elec-

Ongoing Research

Countries represented in the Language Technologies Institute (languages of study, faculty, staff,
students and/or visiting researchers)
17

Language Technologies Institute


Academic Programs
The LTI currently offers two graduate degree programs: a PhD in Language Technologies and a Masters in Language
Technologies. We also participate in an undergraduate Linguistics minor (language technologies track). The LTI also offers
shorter certificate programs in Language Technologies as well; please contact us for further information on these.
Financial Support Student Evaluation
All PhD and most Masters students accepted into the Lan- Following the long-standing SCS tradition, the LTI does
guage Technologies Institute are awarded a Research Fel- not focus only on courses or exams, and does not have a
lowship for the academic year, covering full tuition and a fixed timeline for completion of the PhD degree, although
living allowance, usually renewable for the duration of the our target is five years. Instead, we carry out an individual-
program, as long as the student maintains good standing. ized student evaluation at the end of each semester based
on research performance, classes and other contributions.
Students are encouraged to apply for support from outside For each student, we write a letter indicating to them whether
Carnegie Mellon (fellowships, foreign government grants, they are making “satisfactory progress” towards complet-
etc.) As an incentive to seek funding from other sources, a ing their degree. Students are in good standing as long as
supplement is provided to the stipend of any student who
they are making satisfactory progress.
obtains outside support.

PhD in Language & Information Technologies

The PhD in Language and Information Technologies is a research-oriented degree program consisting of the following
components: successful completion of a set of courses, mastery of certain proficiencies, and a program of research, directed
by a faculty advisor, culminating in a PhD thesis.

Ph.D. Curriculum Presentation: Satisfied via a public presentation of good


quality, such as an external conference presentation or an
A student working towards a PhD in Language and Infor- internal seminar presentation reviewed by several faculty
mation Technologies must successfully complete at least members.
six courses in the LTI and two courses from any depart-
ment in the School of Computer Science (For more course Programming: Satisfied by demonstrating competence in
information, see “LTI Courses”). computer programming of language technology; this is nor-
mally satisfied in the course of the student’s research, but
Of these eight courses, the student must take at least one could also be satisfied via explicit apprenticeship if desired.
Academic Programs

from each of the LTI Focus Areas (Linguistic, Computer


Science, Task Orientation, and Statistical/Learning) and Teaching: Satisfied by two successful Teaching Assistant-
must take at least two lab courses, which involve hands-on ships (TA), as determined by the faculty members for whom
work in one of four different areas (Speech, Machine Trans- the student serves as TA. Typical TA responsibilities in-
lation, Information Retrieval, and Natural Language Pro- clude planning a portion of the syllabus, developing exer-
cessing). The lab modules are self-paced, with Teaching cises, and delivering some lectures under faculty supervi-
Assistant and faculty guidance. Students are encouraged sion. Of the two TA-ships, typically one will be for an un-
to consider taking additional elective courses beyond the dergraduate class and one will be for a graduate class.
eight required. Students may select additional courses from
the LTI, from related courses in the Computer Science De-
partment, or from other related CMU or University of Pitts-
burgh departments. Areas of possible interest include
Speech, Linguistics, Statistics, and Human-Computer In-
teraction.

Proficiencies
The following skills must be demonstrated in the course of
graduate study, with flexibility in the form and timing of
their demonstration:
Writing: Satisfied by producing a peer-reviewed confer-
18

ence paper or a written report that at least two SCS faculty


certify as being of conference-paper quality. The topic of
the paper may be the student’s research results, a compre-
hensive survey of a research area, a linguistic analysis pa-
per, or any other pertinent topic.
Research and PhD Thesis
It is expected that all PhD students engage in active re- describes the general area of investigation and the specific
search from their first semester. Moreover, advisor selec- problem(s) to be addressed, a clear argument for the signifi-
tion should occur within 1-2 months of entering the PhD cance of the problem, relevant past work, expected scientific
program, with the option to change at a later time. Roughly contributions of the proposed work, and a projected timeline
half of a student’s time should be allocated to research and for completion. A dissertation committee consisting of the
half to courses until the coursework is completed. advisor, at least two other CMU faculty in language technolo-
gies, and at least one external member should be formed prior
Once the coursework is completed, the student should be- to the proposal. The dissertation itself, normally completed
gin to move towards a thesis topic, in consultation with the during the fifth year, includes a detailed description of all the
student’s advisor. Once a suitable topic is defined, the stu- work done, including its clear evaluation and the final scien-
dent prepares and presents a dissertation proposal, based tific contributions. The thesis is then defended in a public
on their initial work on that topic. The dissertation pro- oral presentation. A successful defense results in the award-
posal is normally expected at the end of the third year, and ing of the PhD degree.

Master of Language Technologies

The Master of Language Technologies (MLT) is a professional degree that is normally completed in two years. Students
choose an individualized curriculum from a flexible set of courses and self-paced laboratory modules that cover linguistic
and statistical approaches and basic computer science. The curriculum is usually tailored to emphasize a specialty in one of
three language technology areas: Machine Translation, Information Retrieval, or Speech Technology. Directed research is
an integral part of the MLT program; each MLT student carries out research under the guidance of a faculty advisor.
With some modifications and enhancements, the MLT curriculum also forms the course-based component of the PhD Pro-
gram. The more research-oriented MLT students are encouraged to apply for continuing studies in the PhD program, with
most of their MLT courses and hands-on work being credited towards the PhD.

Master of LT Curriculum

Academic Programs
The curriculum for the MLT consists of a minimum of
120 course units at a senior or graduate level. From these
120 units, six courses must be LTI courses and two other
courses must be SCS courses. There are additional con-
straints on course selection, required in order to meet SCS-
wide Masters requirements. A concentrated form of this
degree may be completed in one year without the research
component.

Master of LT Thesis Option


A Masters Thesis Option is available for students who
wish to demonstrate independent research ability during
their enrollment in the LTI Masters program. Students
who choose the Masters Thesis Option will be expected
to follow thesis guidelines that are similar in character to
19

those for the LTI PhD. The Masters thesis requirements


are less rigorous, however, since the Masters disserta-
tion is expected to be defined, completed, and publicly
defended in less than one year.

Language Technologies Institute


Courses and Admissions

LTI Courses

Sample Course Descriptions


We briefly describe here the main focus of a sample of our courses; we list these in numerical order. To see a complete list
of current courses and course descriptions, please see: http://www.lti.cs.cmu.edu/Courses/

11-731 Machine Translation: A graduate-level course


surveying the history, techniques, and research topics in
the field of Machine Translation.

11-741 Information Retrieval: This course studies the


theory, design, and implementation of text-based informa-
tion systems. The IR core components of the course in-
clude important retrieval models (Boolean, vector space,
probabilistic, inference net, language modeling), cluster-
ing algorithms, automatic text categorization, and experi-
mental evaluation. A variety of current research topics are
also covered, including cross-lingual retrieval, document
summarization, machine learning and topic detection and
tracking.

11-743 Advanced IR Seminar/Lab: This is a seminar


that focuses on current research in Information Retrieval.
The seminar covers recent research on subjects such as
11-682 Human Language Technologies (“Words for
retrieval models, text classification, information gathering,
Nerds”): During the last decade computers have begun
fact extraction, information visualization, summarization,
to understand human languages. Web search engines, lan-
text data-mining, information filtering, collaborative fil-
guage analysis programs, machine translation systems,
tering, question answering systems, and portable informa-
speech recognition, and speech synthesis are used every
tion systems.
day by tens of millions of people in a wide range of situa-
LTI Courses and Admission

tions and applications. This course covers the fundamen-


11-751 Speech Recognition: This course provides an in-
tal statistical and symbolic algorithms that enable com-
troduction to the theoretical foundations, essential algo-
puters to work with human language, from text process-
rithms, major approaches, experimental strategies and cur-
ing to understanding speech and language.
rent state-of-the-art systems in speech recognition.
11-711 Algorithms for NLP: A graduate-level course on
11-752 11-752: Phonetics, Prosody, Perception and
the computational properties of natural languages and the
Synthesis: This course offers insight into how human
fundamental algorithms for the symbolic processing of
perception of speech relates to the physical properties of
natural languages.
the signals. It covers practical aspects of speech includ-
ing projects. The second half of this course concentrates
11-717 LT for Computer-Aided Language Learning:
on speech synthesis and building of synthetic voices.
This course studies the design and implementation of
CALL systems that use Language Technologies such as
11-761 Language and Statistics: This course covers some
Speech Synthesis and Recognition, Machine Translation,
of the central themes and techniques that have emerged in
and Information Retrieval.
statistical methods for language technologies and natural
language processing.
11-721 Grammar and Lexicon: A graduate-level course
on linguistic data analysis and theory, focusing on meth-
11-791/792 Software Engineering for Language Tech-
odologies that are suitable for computational implemen-
nologies I/II: This two-course sequence combines class-
tations. The course covers major syntactic and morpho-
room material and assignments in the fundamentals of soft-
logical phenomena in a variety of languages. The empha-
ware engineering (11-791) with a self-paced, faculty-su-
sis is on examining both the diversity of linguistic struc-
pervised directed project (11-792). The two courses cover
tures and the constraints on variation across languages.
all elements of project design, implementation, evaluation
and documentation.
20

11-723 Formal Semantics: A graduate-level course on


formal linguistic semantics: Given a syntactic analysis of
a natural language utterance, how can one assign the cor-
rect meaning representation to it, using a formal logical
system?
Example Course Sequences
For purposes of illustration, the following tables give possible curriculums for a PhD student specializing in Machine
Translation, a MLT student following the two-year track, and an MLT student following the condensed one-year track. For
other PhD students, specializations in Speech, Information Retrieval, and Multimedia Systems might be similar in structure,
with appropriate course substitutions. The two-year MLT track allows a student to work on a research project, to work as a
Teaching Assistant, or to engage in other compensated work. The one-year MLT track requires full concentration and does
not usually allow the student to engage in any other work. Note that these are only examples and are not meant to apply to
every case, or even most cases.

Year 1 Year 2

Grammar & Lexicon Software Engineering for LT 1


Algorithms for NLP Elective
First
Directed Study Principles of Translation
Semester Semester 1 Semester 2 Semester 3
Laboratory Directed Study (Spring) (Summer)
(Fall)
Laboratory
Linguistic Basis of NLP Speech Understanding Software Engineering for
Speech Recognition Information Retrieval IT (2)
Artificial Intelligence Machine translation
Machine Translation Elective Elective (Directed Study)
S eco n d Statistical Methods in Elective
Directed Study Software Engineering for LT 2 NLP Laboratory (2 modules)
Semester Information Retrieval
Laboratory Directed Study Software Engineering for Laboratory (1 module)
Laboratory IT (1)
Laboratory (1 module)
MLT Two-Year Track
MLT Concentrated 12-Month Track

Year 1 Year 2 Year 3 Year 4 Year 5

Grammar and Software Teaching (TA) Elective or Seminar Research


Lexicon Engineering for LT 1 Research Research
First Algorithms for NLP Language and
Semester Statistics
Self-paced Lab
Research Research PhD Sample Track for MT
Concentration

LTI Courses and Admission


Machine Translation Software Thesis Proposal Elective or Seminar Thesis Defense
Artificial Intelligence Engineering for LT 2 Research Research
S eco n d Principles of
Self-paced Lab
Semester Translation
Research
Research

Admissions

The ideal applicant to either the Masters or PhD


program would have a strong background in com-
puter science and either linguistics or statistics,
depending on the specific area of interest. Many
successful students are of course stronger in one
area than another, but admission to our programs
is highly competitive, so any lack of background
in one area must be compensated for by excel-
lence in other areas. A clearly articulated inter-
est in some specific aspect of language technolo-
21

gies is also important. For detailed information


on application requirements and procedures, visit:
http://www.lti.cs.cmu.edu/About/how-to-
apply.html

Language Technologies Institute


Faculty

Alan Black has created practical implementations of computational theories of speech and language.
After a wide background in morphology, language modeling in speech recognition, and computation
semantics, he now works in all aspects of speech generation. As an author of the free software Festival
Speech Synthesis System, he researched text analysis, prosodic modeling, waveform generation, and ar-
chitectural issues in synthesis systems. His work targets data-driven computational models that allow
synthesizers to capture speaker style. Specifically, he studies data-driven prosodic models, automatic
building of voices in English and other languages. To allow spoken output anywhere, he also deploys this
work on handheld computers, specifically addressing rapid development of voices in new languages,
Alan W Black modeling of speaker individuality, and evaluation of voice quality.
Associate Research
Professor Professor Black teaching is very practical, thus his courses involve significant exercises that allow stu-
BS Computer Science dents to gain experience in building synthetic voices, statistically trained models, etc. After some practical
Coventry Polytechnic 1984 experience it is easier to understand the underlying theoretical issues and their relative importance.
MS Knowledge Based
Systems
University of Edinburgh www.cs.cmu.edu/~awb
1986
PhD Artificial Intelligence
University of Edinburgh speech synthesis speech to speech translation spoken dialog systems
1993

Ralf Brown's research interests cover several areas of language technology, such as reference resolution,
disambiguation, corpus-based machine translation, cross-language information retrieval, and topic track-
ing in news. His recent research has focused on Example-Based Machine Translation and its applications,
particularly in the context of multi-engine translation systems, and on topic tracking in news. He also
works with machine-learning techniques for extracting patterns from parallel text in order to build trans-
lation systems with less training material.

Current and recent projects include RADD (Rapidly-Adaptable Data-Driven Machine Translation), AV-
ENUE (machine translation for languages with few resources), Topical Novelty Detection in the TDT
(Topic Detection and Tracking) program for detecting new events in the news and tracking their evolu-
Ralf Brown tion, TONGUES (rapid development of bi-directional speech-to-speech translation systems), and
Senior Systems MUCHMORE (cross-language information retrieval in the medical domain).
Scientist
BS Computer Science www.cs.cmu.edu/~ralf
Towson University 1986
PhD Computer Science
Carnegie Mellon
University 1993

machine translation cross-language information retrieval topic tracking

Jamie Callan is interested in a wide range of information retrieval and text mining topics. In recent years
his research has focused on four problems listed below.
• Federated Search (“Distributed IR”): Provide access to many search engines through a single
search interface; includes peer-to-peer search. Research topics include learning what each engine
contains, selecting which to search, searching them, and integrating results from different sources.
Faculty

• Adaptive Document Filtering: Monitor information streams to find documents that satisfy an infor-
mation need. The system should learn a person’s information needs, rapidly identify desired docu-
ments, and distinguish between novel and redundant information.
Jamie Callan • Large-Scale Text Analysis: Develop tools for rapidly analyzing large text datasets. For example,
Associate Professor when a government agency receives 100,000 comments about a new regulation, it needs to know
22

BA Applications of which groups commented, what topics were discussed, and what supporting evidence was cited.
Computer Science
Univ. of Connecticut 1984 • IR for Language Applications: Search engines are increasingly used in question answering and
MS Computer & Information language tutoring systems. Such applications require rich text annotation (e.g., syntax, named en-
Science tity), complex queries, and retrieval models that combine varied forms of evidence.
Univ. of Massachusetts
1987 His students initially work closely with him to study specific ideas while learning research skills and IR.
PhD Computer Science As students gain expertise, they develop their own interests and have more freedom in exploring them.
Univ. of Massachusetts
1993
www.cs.cmu.edu/~callan

information retrieval adaptive information filtering text data mining


Jaime Carbonell is the Director of the Language Technologies Institute and Allen Newell Professor of
Computer Science at Carnegie Mellon University. He received his BS degrees in Physics and Math-
ematics from MIT, and his MS and PhD degrees in Computer Science from Yale University. His
current research interests span several areas of artificial intelligence and language technologies, in-
cluding: machine learning, data and text mining, natural language processing, machine translation,
and automated summarization (where he invented MMR search-diversity technology). Professor
Carbonell's most recent research directions are structural computational biology, using machine learning
and language technologies to predict structure and function from protein sequences and biophysical
knowledge, and autonomous assistant agents that learn via observation, experience, and NL commu-
Jaime Carbonell nication.
Allen Newell Professor
Director, LTI Professor Carbonell is a key participant in the RADAR project and serves in key advisory positions
BS Physics and Mathematics external to CMU.
MIT 1975
MS Computer Science
Yale University 1976 www.cs.cmu.edu/~jgc
PhD Computer Science
Yale University 1978
artificial intelligence natural language processing machine learning/translation

Maxine Eskenazi's research interests lie in the variability of the speech signal whether it be to aid non-
native speakers to learn a language, to enable systems to dialogue with the elderly, or to process the of
speech of any other group of speakers whose production differs greatly from the average. At the LTI,
she has created the Fluency project, which develops basic algorithms and systems for language learn-
ing. The systems are used for foreign language learning as well as for learning American English
dialects. They are also used to test pedagogical theories about language learning. Work on this project
has spun off a company, Carnegie SpeechTM, which has created products based on Fluency algo-
rithms. She also works on the use of authentic materials for language learning. Here we characterize
the language learners knowledge and the knowledge to be acquired (curriculum) and then determine
which texts, from a very large database of texts taken off the Web, should be shown to the learner next.
Maxine Eskenazi Teacher-created curriculum and learners interests add to the power of adaptation to the individuals
Associate Teaching
needs.
Professor
BA Modern Languages
Carnegie Mellon University
Dr. Eskenazi views teaching as a constant dialogue. It is an occasion for all individuals concerned to
1973 come together and learn something, question something, change something.
DEA Linguistics
University of Paris VII 1981
Doctorat de Troisieme Cycle www.lti.cs.cmu.edu/~max
Computer Science
University of Paris XI 1984 computer-aided language learning speech processing speech recognition

Scott Fahlman is responsible for the knowledge-representation research effort on the RADAR Project,
a large DARPA-funded research effort whose goal is to build an automated cognitive assistant for
busy managers, making extensive use of AI and machine-learning techniques.

As a researcher, he is primarily interested in Artificial Intelligence and its applications. Currently, he Faculty
is working on SCONE, a practical system that can represent a large body of real-world knowledge
and that can efficiently perform the kinds of search and inference that seem so effortless for us hu-
mans. He believes that such "knowledge base" systems will be important tools in the future, perhaps
used in even more ways than database systems are used today.

With respect to natural language understanding, the field has made considerable progress focusing on
Scott E. Fahlman superficial aspects of language, but Professor Fahlman believes that future progress depends on our
23

Research Professor
ability to extract and represent the actual meaning of a piece of text, and to use large amounts of
BS Electrical Engineering
background knowledge in understanding the text,using powerful new tools for knowledge represen-
and Computer Science tation and inference.
MIT 1973
MS Electrical Engineering www.cs.cmu.edu/~sef
and Computer Science
MIT 1973
PhD Artificial Intelligence
MIT 1977

artificial intelligence knowledge representation machine learning


Faculty
Eugene Fink's research interests are in various aspects of artificial intelligence, including machine
learning, planning, problem solving, automated problem reformulation, e-commerce applications,
medical applications, and theoretical foundations of artificial intelligence. His interests also include
computational geometry and algorithm theory.

He is currently working on an intelligent system for automated allocation of offices and related re-
sources, in both crisis and routine situations. This work is part of the RADAR project, aimed at
creating a general-purpose assistant for office managers. He is also working on techniques for identi-
fication of both known and surprising patterns in large-scale databases, and applying these tech-
niques to homeland security. This work is part of the ARGUS project, which is a joint research project
Eugene Fink involving Carnegie Mellon and Dynamix Technologies.
Systems Scientist
BS Mount Allison University www.cs.cmu.edu/~eugene
1991
MS University of Waterloo
1992
PhD Carnegie Mellon
University
1999
artificial intellegence machine learning computational geometry

Bob Frederking's primary research area has been machine translation applications that do not currently
permit the use of purely knowledge-based techniques. This includes rapidly developing Machine Trans-
lation (MT) for new languages and translating text and speech that are not limited to a narrow, well-
defined domain. Our main technical approach in this area is Multi-Engine MT (MEMT). MEMT applies
several different MT techniques to the same text, and then attempts to select the best results from each
technique. He developed and implemented the initial chart-based dynamic-programming technique for
merging the results from the different engines and our current merging technique, which uses statistical
language modeling to select among the different technique outputs. He has also been involved in LTI
projects in Cross-Language Information Retrieval, Question Answering, and Information Extraction
Robert from email, among other things.
Frederking Professor Frederking believes that successful advising and teaching hinge largely on successful commu-
Senior Systems Scientist, nication: presenting advice (or a lecture), understanding what (if anything) the student is having trouble
Director LTI Graduate
with, and then providing the information or guidance that he or she needs to resolve any difficulties. As
Programs
BS Computer Engineering
the Chair of the LTI's graduate programs, he is the default advisor for students who are not project-
Case Western Reserve supported.
University 1977
PhD Computer Science, AI www.cs.cmu.edu/~ref/
Carnegie Mellon University
1986 speech-to-speech MT rapid-development wide-coverage MT question answering

Alex Hauptman's research aims to design and build intelligent programs that process data from large
volumes of multimedia data, including text, image, video, and audio and make the data useful for other
applications, so as to improve speech recognition, image understanding, NLP, machine learning, ques-
tion answering and IR. The challenge is to find the right data, to process it into a suitable form for
Faculty

training, learning, or re-use, and to build mechanisms that can successfully utilize this data.

This work takes part in the context of the Informedia digital video project, which aims to achieve ma-
chine understanding of video and film media, including all aspects of search, retrieval, visualization and
Alex Hauptmann summarization in both current and archival content collections. The base technology developed under
Informedia combines speech, image and natural language understanding to automatically transcribe,
Senior Systems Scientist
segment and index linear video for intelligent search and image retrieval.
24

BA Psychology
Johns Hopkins University
1982
MA Psychology
Johns Hopkins University www.cs.cmu.edu/~alex
1982
Diplom Computer Science
Technische Universität Berlin
1984
PhD Computer Science
Carnegie Mellon University
1991
multimedia analysis multimedia interfaces Informedia digital video library
How does sequence map to structure and function of proteins in different organisms? Dr. Klein-
Seetharaman takes a linguistically inspired view of this question in analogy to “How do words map to
meaning in natural languages?” using stochastic language modeling technologies. Computational models
are validated experimentally by interdisciplinary (biochemical and biophysical, in particular NMR
spectroscopic) studies of purified proteins and model peptide sequences. The emphasis lies on testing
predicted sequence dependence on structural and dynamic aspects of folding/misfolding and func-
tional properties of proteins. Specific proteins that are expressed, purified and studied experimentally
in Dr. Klein-Seetharaman’s laboratory include the G-protein coupled receptor rhodopsin, the glutamate
Judith Klein- receptors and the epidermal growth factor receptor. These systems function in diverse signal transduc-
Seetharaman tion pathways, but resemble each other in their mechanism of action. Each receptor undergoes sub-
Assistant Professor, Depart. stantial conformational changes during the signaling process and the investigation of the precise mo-
of Pharmacology, Univ. of lecular details of these changes is instrumental to elucidating the molecular mechanism of signaling by
Pittsburgh School of Med. these molecules.
Research Scientist, LTI
Diplom in Biology, Univ. of www.cs.cmu.edu/~judithks
Cologne, Germany 1995
Diplom in Chemistry, Univ.
of Cologne, Germany 1996
PhD Biological Chemistry,
MIT 2000 Computational biology/bioinformatics biochemistry/biophysics structural biology

The central focus of John Lafferty's research is machine learning, including algorithms, theory, and
statistical methods for learning from data. The motivating applications for this work most often comes
from text and natural language processing, information retrieval, and other areas of language tech-
nologies. For example, in recent work with his colleagues he has studied approximate inference
algorithms for a family of mixture models appropriate for document collections, and applied the
algorithms to automatically extract the subtopic structure of scientific articles. Over several years
Professor Lafferty has been involved in the development of a language modeling approach to infor-
mation retrieval, including a general approach to IR based on decision theory. In other work he is
researching learning algorithms for sequential and graph-structured data, using a framework called
conditional random fields for combining the strengths of graphical models with discriminative classi-
fication methods such as support vector machines and logistic regression.
John Lafferty
Professor (CSD, LTI) www.cs.cmu.edu/~lafferty
BA
Middlebury College 1982
MS
Princeton University 1984
PhD Mathematics
Princeton University 1986
natural language processing machine learning information theory

Alon Lavie's main areas of research are Machine Translation (MT) of both text and speech, and
Spoken Language Understanding (SLU). His current most active research is on the design and devel-
opment of new approaches to Machine Translation, for languages with limited amounts of data re-
sources. He has also worked extensively on the design and development of Speech-to-Speech Ma-
chine Translation systems and on robust parsing algorithms for analysis of spoken language.
Faculty
Faculty
Professor Lavie is co-PI of the AVENUE project (funded by NSF/ITR), where we are developing a
general framework for building prototype MT systems for languages for which only scarce amounts
of data and linguistic resources are available. He also works on parsing algorithms for spoken lan-
guage analysis of databases of transcribed spoken language (such as CHILDES). He was co-PI of the
Nespole! and C-STAR speech translation projects and of the LingWear and Babylon mobile speech
Alon Lavie translation projects, where he directed the design and development of the analysis and translation
25

Associate Research components.


Professor
BA Computer Science He is the principal instructor of the graduate-level course on "Algorithms for NLP". He also teaches
Israel Institute of the section on "Natural Language Processing" for the "Introduction to Human Language Technolo-
Technology 1987 gies" course, and supervise the Lab in NLP (11-712) lab course at the LTI.
MS Computer Science
Carnegie Mellon University
1993
www.cs.cmu.edu/~alavie
PhD Computer Science
Carnegie Mellon University
1996
machine translation spoken language understanding machine learning
Faculty
Lori Levin works on linguistic issues in machine translation of spoken and written language. Her
career-long research interest is the design of multi-lingual systems that accommodate typologically
diverse languages. LTI's AVENUE project focuses on translation of languages with scarce data re-
sources. Developing a machine translation system typically requires a level of economic and human
resources that may not be available for all languages. Research on MT for minor languages combines
linguistic typology and machine learning to automate the production of machine translation systems
for new languages.

She is also part of a consortium for designing semantic interlingual representations of text meaning.
We are using multi-parallel corpora (multiple versions of the same text) to center in on what is com-
mon among sentences that are supposed to convey the same meaning. In addition to the interlingua
Lori Levin design, the consortium is producing annotated multi-parallel corpora, tools for annotation, and evalu-
Associate Research ation metrics.
Professor
BA Linguistics Her other interests include computer-assisted language learning, especially tools to assist second
University of Pennsylvania language readers with comprehension of authentic texts.
1979
PhD Linguistics www.cs.cmu.edu/~lsl
MIT 1986
minority languages machine translation interlingua representations lexicons

Professor Mitamura’s research focuses on the following projects:

• JAVELIN-II (open-domain, multilingual question answering): A system which combines NLP,


planning, IR and MT to answer natural language questions and refine the search strategy in consul-
tation with the user.

• CAMMIA (Conversational Agent for Multilingual Mobile Information Access): A system which
extends VoiceXML with NLP and dialog management to support dynamic multi-task dialogs in
Japanese and English.

Teruko Mitamura • KANT (Knowledge-based Accurate Natural Language Translation): A project founded in 1991 for
the research and development of large-scale, practical translation systems for technical documenta-
Associate Research
tion. KANT uses a controlled vocabulary and grammar for each source language, and explicit, yet
Professor
LTI Finance Director
focused semantic models for each technical domain to achieve very high accuracy in translation.
MA Linguistics
University of Pittsburgh Teruko Mitamura teaches the courses Machine Translation, Grammars and Lexicons, and LT for CALL.
1985
PhD Linguistics www.cs.cmu.edu/~teruko
University of Pittsburgh
1989
knowledge-based MT question answering Japanese NLP and dialog systems

Eric Nyberg's research at LTI is currently focused on three main areas:

• Open-Domain Question Answering. The JAVELIN project combines natural language dialog, in-
formation retrieval, text understanding, fact extraction, and probabilistic reasoning to answer com-
Faculty

plex questions about entities, relationships and events expressed in unstructured text.

• Conversational Agents for Mobile Multilingual Information Access. The CAMMIA project is cre-
ating speech dialog systems for robust, multi-task dialogs in mobile environments such as car
navigation systems.
Eric Nyberg
Associate Professor • Knowledge-Based Machine Translation. Since the late 1980’s he has worked on controlled lan-
guage, document checking and machine translation for technical documentation; the current sys-
26

BA Computer Science
Boston University 1983 tem, KANTOO, is now in use at Caterpillar, Inc.
PhD Computational
Linguistics Professor Nyberg also teaches a two-course series on software engineering and information technol-
Carnegie Mellon ogy, where students learn about software analysis, design, and construction in the context of real-world
University 1992 team projects.

www.cs.cmu.edu/~ehn

machine translation integrated information management software engineering


Carolyn Penstein Rose's primary research objective is to develop and apply language technology
(i.e., robust language understanding technology and dialogue management technology) to enable
effective computer based and computer supported instruction. The important role of students making
their thinking explicit through verbal explanation is well established. Thus, a major thrust of her
current research is to explore issues related to eliciting and responding to student explanation behav-
ior. However, many of the underlying issues, such as influencing student expectations, motivation,
and learning orientation, transcend the specific input modality. She is the PI for two tutorial dialogue
projects, namely CycleTalk for thermodynamics tutoring, and Calculategy for calculus tutoring. She
is also co-PI for a physics tutoring project headed up by Kurt VanLehn at the University of Pittsburgh.
Carolyn Penstein
Rose She has served as a co-instructor for Grammar Formalisms and the Master's of HCI Project Course.
Research Scientist Professor Penstein Rose is also the primary instructor of the Conversational Interfaces course, which
BS Computer Science is jointly listed in LTI and HCI.
University of California at
Irvine, 1992
MS Computational
www.cs.cmu.edu/~cprose/
Linguistics Carnegie Mellon
University 1994
PhD Language and
Information Technologies
Carnegie Mellon robust language understanding technology supported education
University 1997

Professor Rosenfeld’s research spans two key areas:

• Computational Molecular Biology and more specifically Computational Biolinguistics. Many


of the problems in this area involve statistical modeling of long sequences of building blocks
(nucleotides or amino acids) and their relationship to proteins and their function. This is very
similar to the problem of modeling natural language: long sequences of letters or words, and
their relationship to the deep structure and meaning of sentences. He is currently working to
detect and characterize specific selectional pressure in proteins.

• Speech interaction with PDAs, web portals, and robots is now feasible. But what is the ideal
style for human-machine speech communication? Natural language interfaces are easy for people,
Roni Rosenfeld yet they are brittle, difficult to develop, and they strain recognition technology. Furthermore, by
Professor
trying to emulate people, they fail to communicate the functional limitations of the machine. Are
BS Mathematics and Physics there better alternatives? The Speech Graffiti (aka USI) project is designing and evaluating new
Tel-Aviv University 1985 speech-based interaction paradigms.
MS Computer Science
Carnegie Mellon University www.cs.cmu.edu/~roni
1991
PhD Computer Science
Carnegie Mellon University statistical language modeling speech recognition/interfaces machine learning
1994

Alex Rudnicky's research centers on interactive systems that use speech. He is interested in the following
problems:
• Speech systems that learn: his research attempts to develop a process that, given an abstract
specification of capabilities, supports the automatic configuration of a speech system for an inter-
active task, and then supports incremental learning over the life of the application.
Faculty
Faculty
• Automatic detection and recovery from error: Automatic systems cannot easily detect and recover
from communication breakdowns. We can, however, use features of recognition, understanding,
and dialog to predict the likelihood of misunderstanding at a given instance, and then apply heu-
ristic strategies for guiding the conversation back onto track.
Alex Rudnicky • A theory of language design for speech-based interactive systems: Speech-mode communication
Principal Systems predisposes the user to choose certain words and grammatical preferences. Understanding the
27

Scientist underlying principles of these preferences(and how these are influenced by the system's language)
BS Psychology leads to better language design for interactive systems.
McGill University 1975
MS Psychology • The role of speech in the computer interface: We can analyze an interface in terms of its intended
Carnegie Mellon University task(s), costs of interactions, and the perceived user value. We've studied models based on time,
1976 system error, and task structure, which are useful for simple systems and appear to be extensible to
PhD Psychology more complex systems.
Carnegie Mellon University
www.cs.cmu.edu/~rudnicky
1980

spoken language interaction speech recognition interface design


Faculty
Tanja Schultz believes that automatic speech recognition systems are the most natural front-end for
applications which allow human communication across language and culture barriers. Therefore, her
research focuses on revealing techniques and algorithms to construct human-human communication
and human-machine interaction applications that can robustly function in multilingual environments.
Furthermore, her work involves the rapid deployment of speech recognizers in new tasks and lan-
guages. The massive reduction of effort in terms of time and costs is necessary to speed up the develop-
ment of recognizers in new tasks and languages. It is her belief that this is an essential prerequisite in
order to make speech-driven applications attractive and available to the public and also to include
speakers of languages in which only few or no resources are available.

In her teaching, Tanja combines an introduction to theoretical foundations, essential algorithms, and
Tanja Schultz state-of-the-art systems strategies with experimental practice. Combining these two facets, students
Research Scientist have the opportunity to gain a deep understanding of the theory as well as hands-on expertise to develop
MS Mathematics and
Physical Education
speech recognition and understanding systems. Her goal is to provide a foundation and help them to
University of Heidelberg explore their own ideas.
MS Computer Science
University of Karlsruhe www.cs.cmu.edu/~tanja
1995
PhD Computer Science
University of Karlsruhe speech recognition human-human communication human-machine communication
2000

Michael Shamos' research interests include digital libraries, language identification, electronic voting,
electronic negotiation, Internet law and policy, and experimental mathematics. As Co-Director of the
Institute for eCommerce, he runs the technology side of the Masters of Science in Electronic Commerce
program. Additionally, he teaches the courses Ecommerce Technology, Electronic Payment Systems,
and Ecommerce Law and Regulation. He is also the Director of the Universal Library project.

Professor Shamos' business and consulting experience includes serving as an expert witness in com-
puter software and electronic voting cases, as an examiner of electronic voting systems, as a consultant
Michael Shamos on electronic voting, and as an arbitrator in computer-related disputes for the American Arbitration
Distinguished Career Profes-
Association. Additionally, he was a Supervisory Programmer with the National Cancer Institute from
sor, Co-Director, Institute for 1970 to 1972 while a commissioned officer in the United States Public Health.
eCommerce, Director, Univer-
sal Library www.ecom.cmu.edu/shamos.html
AB Physics
Princeton University 1968
MA Physics
Vassar College 1970
PhD Computer Science
Yale University 1978
JD Duquense University digital libraries language identification Internet policy electronic negotiation
1981

Rich Stern's research group develops techniques that improve the accuracy of speech recognition sys-
tems in difficult acoustical environments. They deal with problems in recognition accuracy resulting
from additive noise sources, background music, competing talkers, room reverberation, and other sources
of degradation such as non-native accents or spontaneous speech production. His group has been
Faculty

developing creative solutions to these problems using classical statistical compensation techniques,
microphone arrays, and signal processing based on auditory physiology and perception. He has also
worked in the areas of language modeling, the integration of phonetic, syntactic, and semantic informa-
tion, and multimodal fusion of information. In addition to his speech recognition work, he has also
maintained an active research program in psychoacoustics, where he is best known for theoretical work
Richard Stern in binaural perception.
Professor of Electrical and
Computer Engineering
28

BS Electrical Engineering Professor Stern has taught the ECE courses in digital signal processing and signals and systems for
MIT 1970 many years, along with other courses in the general areas of communication theory and acoustics. He
MS Electrical Engineering and frequently lectures on speech recognition, signal processing, speech perception, and speech production
Computer Sciences for various LTI courses.
University of California,
Berkeley 1972 www.ece.cmu.edu/~rms
Ph.D Electrical Engineering
and Computer Science
MIT 1977

robust automatic speech recognition auditory perception signal processing


Alex Waibel holds joint appointments in LTI, the Human-Computer Interaction Institute (HCII), the
Computer Science Department and the Robotics Institute. He also directs the Interactive Systems
Laboratories (ISL), both here at Carnegie Mellon University and at Karlsruhe University in Germany.
In the ISL, they aim at building more flexible and natural Multimodal Human-Computer Interfaces
and Computer Mediated Human-Human Interfaces. His speech related research interests include
language processing, language translation and machine learning. His efforts led to the creation of
JANUS, one of the first and most advanced speech-to-speech translation programs, C-STAR (Consor-
tium of Speech Advanced Research). His former multimodal and speech projects included the Meet-
ing Room, the Genoa Meeting recognizer, the Meeting Browser, LingWear, a wearable tourist assis-
Alex Waibel tant and medical translation system, and NESPOLE!, a multi-language E-commerce application. Some
Associate Director, LTI of his latest related research projects are TC-Star and Str-Dust (unlimited domain speech translation)
Director, Interactive and CHIL (Computers in the Human Interaction Loop)
Systems Lab, Professor
BS Electrical Engineering www-2.cs.cmu.edu/~ahw/
MIT 1979
MS Computer Science
Carnegie Mellon University
1980
PhD Computer Science
Carnegie Mellon University perceptual user interfaces speech recognition multimodal interaction
1986

The major theme of Professor Xing's research is understanding and modeling how living systems
function and evolve based on mathematical principles, and developing probabilistic inference and
learning algorithms for computational biology and for generic intelligent systems of a wide range of
applications such as vision, IR and NLP. Currently, his projects largely fall into two categories:

• Computational Biology, with an emphasis on developing formal models and algorithms that
address problems of practical biological and medical concerns, such as, 1) modeling genome-
microenvironment interactions in cancer development and embryogenesis via joint analysis of
genomic, proteomic, cytogenetic and pathway signaling data; 2) statistical inference of haplotype,
Eric Xing linkage and pedigree for genetic, clinical and forensic applications; and 3) modeling substitution,
Assitant Professor recombination, selection and genome rearrangement for comparative genomic analysis.
• Statistical Machine Learning, emphasizing theory and algorithms for learning complex probabi-
MS Computer Science, listic models, learning with prior knowledge, and reasoning under uncertainty. We focus on, 1)
Rutgers University,1998
variational inference/learning theory and algorithms; 2) algorithms and applications of Bayesian
Ph.D Molecular Biology and
Biochemistry,
nonparametrics and hierarchical Bayesian models in data mining; and 3) probabilistic and optimi-
Rutgers University, 1999 zation-theoretic methods for semi-unsupervised learning and kernel machines.
Ph.D Computer Science,
University of California, www.cs.cmu.edu/~epxing
Berkeley, 2004
computational biology machine learning computational statistics

Yiming Yang is a Professor of the Language Technology Institute and the Computer Science Depart-
ment at Carnegie Mellon University. Professor Yang received her B.S. degree in Electrical Engineer-
ing, and her Ph.D. degree in Computer Science (Kyoto University, Japan). Her research has been
centered on statistical classification methods and their applications to a variety of challenging prob- Faculty
Faculty
lems in the real world, including automated text classification, novelty detection and event tracking,
protein sequence analysis, cross-language information retrieval, web-mining for multimedia question
answering, and intelligent email filtering and organization.

Professor Yang teaches Information Retrieval and Advanced Statistical Learning courses.
Yiming Yang
29

Professor www.cs.cmu.edu/~yiming
MS Computer Science
Kyoto University 1982
PhD Computer Science
Kyoto University 1986

text categorization translingual information retrieval event detection and tracking


Adjunct Faculty

David Evans
founder Clairvoyance
former Director of the Laboratory for Computational Linguistics, CMU
former Director of the Academic Computational Linguistics Program, CMU

Phil Hayes
co-founder Carnegie Software Partners
former faculty member Computer Science Department, CMU

Michael “Fuzzy” Mauldin


creator of LYCOS
Chairman of the Board of Conversive, Inc.
former faculty member LTI, CMU

Vibhu O. Mittal
Senior Scientist, Google, Inc.

Raúl E. Valdés-Pérez
co-founder and President of Vivisimo Inc
former faculty member Computer Science Department, CMU
30
31

You might also like