Acl 2015 Handbook

CNCC Grand Hotel
Best Western OL CNCC

Stadium Hotel
InterContinetal
Beijing Beichen
Handbook assembled by Xianpei Han, Kang Liu and Zhuoyu Wei
Cover designed by XinXing Deng
Contents
Table of Contents i
In Memoriam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1 Conference Information 19
Message from the General Chair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Message from the Program Committee Co-Chairs . . . . . . . . . . . . . . . . . . . . . . 21
Organizing Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Program Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Meal Info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Tutorials: Sunday, July 26 27

T1: Successful Data Mining Methods for NLP . . . . . . . . . . . . . . . . . . . . . . . 29
T2: Structured Belief Propagation for NLP . . . . . . . . . . . . . . . . . . . . . . . . . 31
T3: Sentiment and Belief: How to Think about, Represent, and Annotate Private States . . 32
T4: Corpus Patterns for Semantic Processing . . . . . . . . . . . . . . . . . . . . . . . . 34
T5: Matrix and Tensor Factorization Methods for Natural Language Processing . . . . . . 36
T6: Scalable Large-Margin Structured Learning: Theory and Algorithms . . . . . . . . . 38
T7: Detecting Deceptive Opinion Spam using Linguistics, Behavioral and Statistical Mod-
eling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
T8: What You Need to Know about Chinese for Chinese Language Processing . . . . . . 41
Welcome Reception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3 Main Conference: Monday, July 27 45

Session 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Student Lunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Session 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Session 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Session 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Poster session P1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
System Demonstrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4 Main Conference: Tuesday, July 28 91

Keynote Address: Marti A. Hearst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Session 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Session 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Session 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Poster session P2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Student Research Workshop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
i
Social Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5 Main Conference: Wednesday, July 29 133

Keynote Address: Jiawei Han . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Session 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
ACL Business Meeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Session 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Best Paper Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6 Colocated conference: CoNLL: ThursdayFriday, July 3031 151
7 Workshops: ThursdayFriday, July 3031 159

W1: Eighth SIGHAN Workshop on Chinese Language Processing . . . . . . . . . . . . . 160
W2: Arabic Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 163
W3: Grammar Engineering Across Frameworks (GEAF 2015) . . . . . . . . . . . . . . . 165
W4: Eighth Workshop on Building and Using Comparable Corpora . . . . . . . . . . . . 166
W5: Semantics-Driven Statistical Machine Translation: Theory and Practice . . . . . . . 168
W6: Novel Computational Approaches to Keyphrase Extraction . . . . . . . . . . . . . . 169
W7: SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences,
and Humanities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
W8: BioNLP 2015 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
W9: The Fifth Named Entities Workshop . . . . . . . . . . . . . . . . . . . . . . . . . . 174
W10: Third Workshop on Continuous Vector Space Models and their Compositionality . . 176
W11: Fourth Workshop on Hybrid Approaches to Translation . . . . . . . . . . . . . . . 177
W12: Fourth Workshop on Linked Data in Linguistics: Resources and Applications . . . . 179
W13: Workshop on Noisy User-generated Text . . . . . . . . . . . . . . . . . . . . . . . 180
W14: The 2nd Workshop on Natural Language Processing Techniques for Educational
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
W15: The First Workshop on Computing News Storylines . . . . . . . . . . . . . . . . . 185
8 Anti-harassment policy 187
Author Index 189
9 Local Guide 205
ii
Local Organizer
The Chinese Information Processing Society of China
(CIPS) was officially founded in June 1981, with the
objective to unite the domestic and overseas researchers
in the field of Chinese information processing.
Currently, CIPS has 15 professional committees, 5
working committees, 61 group members and more than
3000 personal members. The Journal of Chinese
Information Processing is the journal of CIPS.
Website: http://www.cipsc.org.cn/
Local Supporters (sorted by A-Z)

- Beijing Institute of Technology
- Beijing Language and Culture University
- Fudan University
- Harbin Institute of Technology
- Institute of Automation, Chinese Academy of Sciences
- Institute of Computing Technology, Chinese Academy of Sciences
- Institute of Software, Chinese Academy of Sciences
- Peking University
- Tsinghua University
- University of International Relations
Samsung Electronics leads the global market in high-tech electronics
manufacturing and digital media. SRC-Beijing (Samsung R&D Institute
China - Beijing) is one of Samsung's overseas R&D centers, located in
Beijing, China. SRC-Beijing focuses on the research & development of next
generation communication, advanced multimedia, big data, 3D display,
mobile terminal and other emerging technologies. We foster innovative
ideas that advance technology, produce new products and improve the
everyday lives of our customers.

Google Inc. http://research.google.com
Googles mission is to organize the worlds information and make it universally accessible and
useful. Perhaps as remarkable as two Stanford research students having the ambition to found a
company with such a lofty objective is the progress the company has made to that end. Ten
years ago, Larry Page and Sergey Brin applied their research to an interesting problem and
invented the world's most popular search engine. The same spirit holds true at Google today.
The mission of research at Google is to deliver cutting-edge innovation that improves Google
products and enriches the lives of all who use them. We publish innovation through industry
standards, and our researchers are often helping to define not just todays products but also
tomorrows.
Noahs Ark Lab of Huawei Technologies
The Noahs Ark Lab is a research lab of Huawei Technologies, located in Hong Kong and
Shenzhen.
The mission of the lab is to make significant contributions to both the company and society by
innovating in data mining, artificial intelligence, and related fields. Mainly driven by long term and
big impact projects, research in the lab also tries to advance the state of the art in the fields as well
as to harness the products and services of the company, at each stage of the innovation process.
As a world class research lab, we are pushing the frontier of research and development in all areas
that we work in. We dare to address both the challenges and opportunities in this era of big data,
to revolutionize the ways in which people work and live, and the ways in which companies do
business, through intelligentization of all processes, with the slogan 'from big data to deep
knowledge'.
Research areas of the lab mainly include machine learning, data mining, speech and language
processing, intelligent systems, information and knowledge management, and human computer
interaction.
Founded in 2012, the lab has now grown to be a research organization with many significant
achievements in both academia and industry. We welcome talented researchers and engineers to
join us to realize their dreams.
In Memoriam
IN MEMORIAM: ADAM KILGARRIFF

1960-2015
Obituary written by Roger Evans
The Association for Computational Linguistics mourns the passing of Adam Kilgarriff. a remarkable
computational linguist and long term ACL member.
A long time ago now (maybe 1988?), Gerald (Gazdar) and I supervised Adams DPhil at the Uni-
versity of Sussex. Adam was my age, give or take a year, having come to academia a little late,
and was my first doctoral student. Adams topic was polysemy, and Im not really sure that much
supervision was actually required, though I recall fun exchanges trying to model the subtleties of
word meaning using symbolic knowledge representation techniques - an experience that was clearly
enough to convince Adam later that this was a bad idea. In fact, Adams thesis title itself was Poly-
semy. Much as we encourage concision in thesis titles, pulling off the one-word title is a tall order,
requiring a unique combination of focus and coverage, breadth and depth, and most of all, author-
ity. Adam completely nailed it, at least from the perspective of the pre-empirical Computational
Linguistics of the early nineties.
Three years later, after a spell working for dictionary publishers, Adam joined me as a research
fellow now at the University of Brighton. I had a project to explore the automatic enrichment of
lexical databases to support the latest trends in language analysis, and in particular, task-specific
lexical resources. I was really pleased and excited to recruit Adam - he had lost none of his intellectual
independence, a quality I particularly valued. Within a few weeks he came to me with his own plan
for the research - a detour, as he put it, from the original workplan. I still have the email, dated
April 6 1995, in which he proposed that, instead of chasing a prescriptive notion of a single lexical
resource that needed to be customised to a domain, we should let the domain determine the lexicon,
providing lexicographic tools to explore words, and particularly word senses, that were significant
for that domain. In that email, Computational Lexicography at Brighton was born.
Over the next eight years or Computational Lexicography became a key part of our groups success,
increasingly under Adams direct leadership. The key project, WASPS, developed the WASPbench
- the direct precursor to the Sketch Engine, recruiting David (Tugwell) to the team. In addition,
Adam was one of the founding organisers of SENSEVAL, an initiative to bring international teams of
researchers together to work in friendly competition on a pre-determined word sense disambiguation
task (and which has now transformed into SEMEVAL). Together we secured funding to support the
first two rounds of SENSEVAL. Each round required the preparation of standardised datasets, guided
by Adams highly tuned intuitions about lexical data preparation and management. And we engaged
somewhat in the European funding merry-go-round, most fondly in the CONCEDE project, working
on dictionaries for Central European languages with amazing teams from the MULTEXT-EAST
consortium, and with Georgian and German colleagues in the GREG project.
But Adam was not entirely comfortable in academia, or at least not in a version of academia which
didnt share his drive for the practical as well as the theoretical. He didnt have tenure, nor any clear
12
route to achieve tenure, which meant that he could not apply for and hold grants in his own right
(although he freely admitted he was happy not to have the associated administrative responsibilities);
he set up a high quality masters programme in Computational Lexicography, which ran for a couple
of years, but the funding model didnt really work, and it quickly evolved into the highly successful,
but independent, Lexicom workshop series; and he couldnt engage university support for developing
the WASPbench as a commercial product. So in 2003, he spread his wings, left the university, and
set up Lexical Computing Ltd.
For many people, Lexical Computing and the Sketch Engine are what Adam is best known for. He
spent eleven years tirelessly developing the company, the software, the methodology, the resources,
the discipline. It was an environment in which he seemed completely at ease, sometimes the shame-
less promoter of his wares, sometimes the astute academic authority, often the source of practical
solutions to real problems, and the instigator of new initiatives, and always the generous facilitator,
educator and friend. For me personally, though, this was a time when our friendship was more promi-
nent than our professional relationship. We would meet for the odd drink, usually in the Constant
Service pub (Adams favourite), and chat about life, family, sometimes work, and occasional schemes
for new collaborations, though the company didnt leave him very much time for that. It was one of
those relaxed undemanding friendships that just picks up whenever and wherever we find the time to
meet, but remains strong nevertheless.
Adams illness was as unexpected to him as to anyone. Over the summer of 2014, he was making
plans for new directions and projects. And then, a brief hiatus in communication before we heard the
news in early November, And yet, already, he seemed reconciled - not resigned, but resolved, calm,
dignified. I was upset, angry, helpless - hopeless really, and feeling very selfish in my distress. I saw
Adam three times after he became ill and they are all good, strong memories, and that is more to his
credit than mine.
The first was in his kitchen, with early spring sunshine, drinking strong coffee he had made very
meticulously, watching the winter birds scavenging in the garden, just chatting about nothing in
particular, and gossiping about work for a couple of hours. The second was a surprise trip to the pub
- the surprise being that Adam was strong enough to get there (and back) on his own, and drink a
couple of pints too. We went to the Constant Service, as always, and it was one of our occasional
NLP group outings so a good crowd was there. The third was back in his kitchen, this time for
work a few weeks later. Ironically, the university system that struggled to engage with Adams
practical drive, is now fully signed up to demonstrating the Impact of its research. Adams work
on Computational Lexicography at Brighton and afterwards through Lexical Computing, featured
as an Impact Case Study in recent national evaluations, and has subsequently been selected for a
wider national initiative showcasing Computer Science research. Adam was happy to cooperate with
this, in part to alleviate boredom, and we arranged a skype call with a technical author from his
kitchen. Adam was on excellent form describing his work, his passion, and still full of ideas for
gentle academic engagement if his retirement would allow it.
Shortly after that meeting we heard the news of Adams relapse and decision not to continue
treatment. Like everyone else, I followed the blog, and also emailed a little privately. I arranged to
go and visit again, but Adam wasnt well enough so we cancelled. Like everyone else, I waited for
the inevitable blog post.
Adams funeral was in a modest church in the village of Rottingdean just along the coast from
Brighton. A beautiful setting and a sunny afternoon. The church was absolutely packed - standing
room only - we estimate about 250 people, family, friends and colleagues from far and wide. A
committed atheist, the service focused on fond memories of Adam from those closest to him, with
just one hymn - Immortal Invisible, as all his blog readers will understand. A beautiful and fitting
farewell to a man who, it seems was to everyone a friend first, and a colleague, boss, or antagonist,
second.
There have been many comments on Adams blog, on twitter, in academic forums, which say much
more and so much better than I can. Some have said that Adam will be remembered for the Sketch
Engine and the amazing data resources that have been built up around it. I would say that his real
legacy is much more deeply intellectual than that. Adam would probably smile with satisfaction that
the two things can co-exist so comfortably - a rare combination of the intellectual and practitioner, a
real giant of the field.
13
In Memoriam
IN MEMORIAM: JANE J. ROBINSON

Obituary written by Barbara J. Grosz (Harvard University), Eva Hajicova (Charles University in
Prague), Aravind Joshi (University of Pennsylvania)
The Association for Computational Linguistics (ACL) mourns the passing of Jane J. Robinson, for-
mer president of the ACL.
Jane Robinson, a pioneering computational linguist, made major contributions to machine transla-
tion, natural language, and speech systems research programs at the RAND Corporation, IBM, and
in the AI Center at SRI International. She served as ACL president in 1982.
Jane became a computational linguist accidentally. She had a Ph.D. in history from UCLA, but
could not obtain a faculty position in that field because those were reserved for men. Instead, she
took positions teaching English, first at UCLA and then at California State College, Los Angeles.
While at LA State, where she was tasked with teaching engineers how to write, Jane noticed an
announcement for a talk on Chomskys transformational grammar. She went to the talk thinking
this work on grammar might help her teach better. Although its subject matter did not match her
expectations, the talk marked a turning point in her career.
In the late 1950s, Jane became a consultant to the RAND Corporation group working on ma-
chine translation under Dave Hays (ACL president, 1964). From the beginning, Jane was concerned
with identifying connections between different traditions in formal grammars and their correspond-
ing detailed linguistic realizations. Her 1965 International Conference on Computational Linguistics
(COLING) paper, "Endocentric Constructions and the Cocke Parsing Logic" [Robinson1965], is a
beautiful example of connecting specific linguistic phenomena to parsing strategies in a way that
preserves the nature of the linguistic phenomena, endocentric constructions. While at RAND, Jane
became colleague and friend to many in the machine translation and emerging computational linguis-
tics world, including Susumo Kuno (ACL president 1967), Martin Kay (ACL president, 1969), Joyce
Friedman (ACL president, 1971), and Karen Sparck Jones (ACL president, 1994).
In the late 1960s Jane moved to the Automata Theory and Computability Group at the IBM
Thomas J. Watson Research Center, Yorktown Heights, NY. She used her knowledge of formal
work on grammars and parsing to draw correspondences between Dependency Grammars and Phrase
Structure Grammars. Although Jane came from the Dependency Grammar tradition, her balanced,
careful analysis of tradeoffs enabled others to bridge the approaches. Her 1967 COLING paper,
"Methods for Obtaining Corresponding Phrase Structure and Dependency Grammars" [Robinson1967],
is a wonderful example of her understanding of the seminal issues underlying these different systems.
She subsequently published the classic paper connecting dependency structure and transformational
rules, "Dependency Structures and Transformational Rules" [Robinson1970a]. This paper exem-
plifies Janes scholarship and her deftness in dealing with the formal and computational issues of
language processing in a very fair and informative manner. Her 1970 paper, "Case, category and
configuration" [Robinson1970b], demonstrated in a very convincing way the possibility of formally
interpreting Fillmores case grammar in terms of dependencies, in a more economic fashion and
without any loss of information.
In 1973, Don Walker (ACL secretary-treasurer, 19761993) recruited Jane to the speech group in
the AI Center (AIC) at SRI International. Jane remained a key member of the AICs natural language
14
group until she retired in the mid-1980s. She made major contributions to a wide range of research,
ranging from grammars for speech understanding systems and dialogues to such discourse issues as
codes and clues in contexts. For several of the NLP systems SRI developed in the 1970s and 1980s,
the grammar was the coordinating point for all knowledge about the language, so Jane interacted
with everyone developing any component of the system, from the architecture through semantics and
discourse. Speech processing was a bit "deaf" in those days, and Jane frequently remarked that she
had to write not only a grammar for English, but also its dual (one for non-English to rule out bad
parsings). During her time at SRI, Jane wrote some of the most comprehensive grammars for NLP
systems.
One of us (Barbara Grosz) notes that Janes contributions to the AICs natural language group
went far beyond her official grammar writing responsibilities. She served as mentor (before that
word was widely used in academia) for a large group of "young Turks" as she referred to those of us
in the younger cohort involved in building NLP systems; she was our in-house expert in linguistics;
provided critiques of drafts of papers, making them shorter, clearer and more scholarly; debugged
our ideas across the full spectrum of system components; and introduced us to the most senior people
in linguistics and computational linguistics.
Another of us (Aravind Joshi, ACL president, 1975) recalls meeting Jane in September 1975 at
an NLP workshop at the University of Michigan. At that time, he and his students were working
on two separate areas of CL. One of them concerned the minimal formal machinery needed for
representing structural aspects of language and the other one dealt with some aspects of cooperation
in NL interfaces to databases. After just a brief discussion with Jane, when she asked what he was
doing, it became clear to him, "that I was in the presence of someone who had already worked on
such diverse areas. From thereon, whenever I had an opportunity to meet with Jane, I took advantage
of her deep understanding."
And the third of us (Eva Hajicova, ACL president, 1998) recalls the important ties Jane formed with
Praguian linguists interested in formal grammar starting in 1965, when the founder of the Prague
group, Petr Sgall, first visited the United States and met Jane at RAND. Their common research
interests in computational linguistics, particularly Dependency Grammar, forged deep personal re-
lationships between Jane and Sgalls linguistics group in Prague. Jane visited Prague twice, once
before and once after the change of the communist regime. During difficult political times, Jane
provided the Prague group with linguistics literature published in the West, and she introduced them
to her colleagues and students, yielding additional important connections, which have continued to
this day. Eva notes that, "only those who have the same historical experience as we in Prague have
had can appreciate fully how important such activities were for our research and for our students."
Jane read broadly and her training as a historian made her a careful and deep scholar. Throughout
her life she would go to talks that seemed a bit far afield and come back with new ideas. For those
who worked with her she was an invaluable source of out-of-the-box thinking as well as the go-to
person for what to read in linguistics. Jane often said that had she been born in a later generation, she
would have become an astronomer. Given her love of exploration, she might have been an astronaut.
(You can see this bent in the poem she wrote for her poetry class friends funeral, "Time To Go"
[Robinson2008]). Lucky for all of us, she wound up in Computational Linguistics.
Jane was the mother of four children and the proud grandmother of two grandsons, one now a
lawyer, the other an actor. She extended her family to encompass her colleagues, building cama-
raderie through dinners at her home lamb stew mostly and picnics at Foothills Park in Palo Alto,
activities which drew the families of AIC researchers together and yielded many lifelong friendships.
Jane built such friendships throughout her career. In responding to our questions about Janes time at
RAND, Dave Hayss wife Rita noted that Jane remained close friends with her and Dave even after
Dave went to SUNY Buffalo and Jane to IBM Yorktown Heights and then SRI. Rita and Jane were
traveling and hiking companions into Janes 90s.
To celebrate her 60th birthday, Jane "got in shape" to hike in the Himalayas around Annapurna.
She came back with beautiful photos and the desire to join the young Turks who went backpacking in
Yosemite. (Although she had been a regular visitor to Yosemite since her 40s, she had not backpacked
before.) She offered to drive everyone in the huge Chrysler Imperial she had gotten to feel safe driving
in New York. Jane backpacked into her late 70s, then switched to the luxury of the High Sierra Camp
tent cabins and later to a small cabin with an inside shower. For those lucky enough to visit Yosemite
with her, she was as much a guide to the mountains as she had been a guide to linguistics.
15
In Memoriam
For people who worked with Jane at SRI, she was a towering figure in the field, a wonderful
colleague who imparted deep wisdom as well as linguistic facts, and a dear friend. She was senior
to most of the members of the AIC. Looking back on her arrival, Peter Hart (a director of the AIC)
noted that Janes presence changed the tone of the early SRI AI Center. She brought not only a keen
intellect and depth of knowledge, but "also a gentleness, openness, and generosity of spirit." Gary
Hendrix (who led the NLP group at SRI for several years) remembers that for many who worked
with her at SRI, "Jane was like a second mother, loving and giving and nurturing." Jerry Hobbs (ACL
president, 1990) recalls that, "one of the things I learned from her, though imperfectly, was how to be
tough with grace." Several remember Jane as one of a handful of elder statesmen that their generation
could look up to.
Jane was a colleague and friend of Ray Perrault (ACL president, 1983, and current AIC direc-
tor) from the time he was a Ph.D. student at the University of Michigan. In the early 1970s, Jane
frequently visited Joyce Friedman (ACL president, 1971), her old friend and his advisor. Ray re-
members Jane was a gentle but firm critic of his thesis, and he fondly recalls her passing though
in her huge Chrysler on her move from IBM to SRI. Ray was ACL vice-president when Jane was
president, and she was delighted when he decided to join the young Turks at SRI. Subsequently, she
"even tolerated me as her manager until her retirement and became doting godmother to my son."
Jane made a difference in peoples lives, not just their research. Her death marks the end of an era
and the passing of an icon. We will miss her greatly.
Acknowledgments: This remembrance was a composite of the reminiscences of a number of peo-
ple, including Tom Garvey, Marguerite Hays, Peter Hart, Gary Hendrix, Jerry Hobbs, David Israel,
Ray Perrault, Candy Sidner, and Marty Tenenbaum.
References
[Robinson1965] Robinson, Jane J. 1965. Endocentric constructions and the cocke parsing logic. In
Proceedings of the 1965 Conference on Computational Linguistics, COLING 65, pages 123
[Robinson1967] Robinson, Jane J. 1967. Methods for obtaining corresponding phrase structure and
dependency grammars. In Proceedings of the 1967 Conference on Computational Linguistics, COL-
ING 67, pages 125
[Robinson1970b] Robinson, Jane J. 1970. Case, category, and configuration. Journal of Linguistics,
6(1):5780.
[Robinson1970a] Robinson, Jane J. 1970. Dependency structures and transformational rules. Lan-
guage, 46(2):259285.
[Robinson2008] Robinson, Jane J. 2008. Time to go. http://poemsofjanerobinson.blogspot.com/2008/06/time-
to-go.html.
16
IN MEMORIAM: PAUL CHAPIN
1938-2015
Memorial statement prepared by Tom Bever, Merrill Garrett, and Cecile McKee (University of
Arizona)
The Association for Computational Linguistics (ACL) mourns the July 1, 2015 death of Paul
Chapin, former ACL president (1977).
The language sciences lost a truly valued defender and friend on July 1, 2015, with the death of
Paul Gipson Chapin from Acute Myeloid Leukemia, in Tucson, Arizona.
Paul was born December 27, 1938, in El Paso, Texas, son of John Letcher Chapin and Velma
Gipson Chapin. In 1962, he married Susan Levy of New York, his beloved spouse of 53 years
who survives him and will continue to reside in Tucson. Other survivors are: sister Clare Ratliff
of Santa Fe, NM; children Ellen Endress of Beltsville, MD, John Chapin of Alexandria, VA, Robin
Chapin of Honolulu, HI, and Noelle Green of Sherman Oaks, CA; and grandchildren Kasey Chapin
of Woodland, WA and Malia Green of Sherman Oaks, CA.
Paul received his B.A. from Drake University in 1960 and his Ph.D. from MIT in 1967 as a student
of Noam Chomsky. He served as Assistant Professor at UCSD from 1967 to 1975, in a newly form-
ing department. During this period, he developed a particular interest in psycholinguistics, and made
early contributions to its initial growth. He then had an opportunity at NSF to be of use as a broad
organizer of the language sciences in general. He directed the National Science Foundations Lin-
guistics Program from 1975 to 1999, declining several offers to move up to higher positions in NSF.
Between 1999 and his retirement in 2001, Paul supported cross-directorate activities at NSF. When
he retired, NSF gave him the Directors Superior Accomplishment Award. The Linguistic Society of
America gave him the first Victoria A. Fromkin Award for Distinguished Service to the Profession
that same year. He later served as Secretary-Treasurer of the LSA from 2008-2013. He was elected
a fellow of the Linguistic Society of America, the American Association for the Advancement of
Science, and the Association for Psychological Sciences.
As a 1967 Ph.D. graduate from MIT, Paul could have expressed a particular professional bias for
generative grammar while at NSF. But even as a student, he was eclectic and was as interested in lan-
guages as in their structure. His first graduate program was in Philosophy at Harvard for a year. He
subsequently became a student at MIT, but worked for most of that time in the MITRE Corporations
pioneering lab in computational linguistics indeed, he later became President of the Association of
Computational Linguistics in 1977. His dissertation, "On the syntax of word-derivation in English"
argued against the then prevailing view that transformations preserve meaning; he showed that in
the terms of the current theory (essentially an early version of the Aspects model), there is a cycle
of transformations internal to complex words that modify their meaning: this can be seen as a pre-
monition of Generative Semantics treatment of lexical items, and todays principles of Derivational
Morphology. During his period as an assistant professor in the linguistics program at UCSD, Paul
published papers on a range of topics, including articles from his dissertation, analyses of Samoan,
the history of Polynesian languages, methodological papers on computational topics (e.g., automatic
morpheme stripping), experimental studies of sentence comprehension (e.g., on click location dur-
17
In Memoriam
ing sentence comprehension), and several important review articles. Theoretical frameworks for his
investigations included transformational grammar, case grammar, and generative semantics, among
others.
Thus, no one could have been more broadly trained to take on the task of managing the funding
of NSFs Linguistics Program. As much as any leading academic, he must be credited with shaping
the field as it is today. His positive influence on linguistics cannot be overstated. Since NSF is the
primary source of government support for the field, the NSF program director has a supremely impor-
tant influence: Paul used this influence with a great sense of critical judgment but with an equal sense
of impartiality in a field rife with academic conflicts. The 25 year period of his stewardship of the
program, witnessed some of the most extreme disagreements within the language sciences, pitting
rationalists against associationists, structuralists against functionalists, nativists against empiricists:
They all were struggling for NSF support during a time of increasingly limited resources. Paul stood
above these arguments, and insisted on supporting any affordable proposal that had promise for im-
portant results, both theoretical and empirical, whatever the philosophical stripe of the investigators.
He could see past the intellectual commercials accompanying a project into its value in propelling
the field forward. Without him, research support could easily have fallen into one camp or another
with an ultimate loss for everyone.
Paul was first known to many of us professionally in his capacity as director of the NSF Linguistics
Program. He was a consistent chaperone of ideas and research projects, gentle with his advice,
generous with assistance in helping applicants modify their proposals to become more successful.
His mellifluous basso profundo voice on the telephone (the earlier days were prior to email) still
resonates in our memory of discussions about why our most recent proposal did not get funded,
or needed some changes, or in fact did get funded. His tone was always the same, quiet, factual,
friendly, and concerned to be helpful. Indeed, a few years after his official retirement, he published
an extraordinary book on how to write grant proposals and use them to formulate coherent research
programs.
Paul was a witty and engaging personal friend, with wide ranging interests. He had a lifelong love
of music, as a flute player, a singer, and in retirement serving on the board of the Desert Chorale
in Santa Fe, NM. He enjoyed great food whether high cuisine or ethnic. He could always tell you
where and when he had eaten his favorite version of any particular dish. He found most published
crossword puzzles too easy. He collaborated with an online community from 2003-2012 to follow
the Samuel Pepys diary on a day by day basis.
Family and friends remember Paul for his deep caring for others and his lifelong commitment to
social progress. He will be sorely missed.
Memorial donations may be made to the MD Anderson Cancer Center in Houston, TX.
18
Conference Information
1
Message from the General Chair
It was fifteen years ago when ACL first came to Asia in 2000. The conference in Hong Kong was
a very exciting one and attracted lots of people. It was a great opportunity for a number of Asian
NLP researchers to meet face-to-face in such a large scale meeting. Establishment of AFNLP (Asian
Federation of Natural Language Processing) was discussed soon after this wonderful event, and then
AFNLP started IJCNLP (the International Joint Conference on Natural Language Processing) as a
biennial flagship conference of AFNLP. ACLs three year regional rotation and IJCNLPs two year
cycle meet every six years, and this is the second joint ACL-IJCNLP conference following the first
held in Singapore in 2009. ACL meetings in Asia and IJNCLPs are now a propelling force of NLP
research in Asian regions, and provide valuable experiences especially to young researchers and
students who first attend this size of a big conference.
The success of ACL-IJCNLP owes a great deal to the hard work and dedication of many people. I
would like to thank all of them for their time and contribution to this joint ACL-AFNLP conference.
Priscilla Rasmussen (the ACL Business Manager), Gertjan van Noord (ACL Past President), Chris
Manning (ACL President), Graeme Hirst (ACL Treasurer), Dragomir Radev (ACL Secretary), Keh-
Yih Su (AFNLP Past President), Kam-Fai Wong (AFNLP President), all other ACL and AFNLP
Executive Committee members and ACL-AFNLP Conference Coordinating Committee members
(forgive me for not listing all their names) have always been very helpful and guided me anytime I
missed something or was behind the schedule, and given me appropriate advice. Without their help,
I could not fulfill even half my duty.
I was very lucky to have a wonderful team of chairs, who have done a fantastic job for leading
this conference to an invaluable one. I would like to express my deepest gratitude to Michael Strube
and Chengqing Zong (Program Committee Co-Chairs), Le Sun and Yang Liu (Local Arrangement
Co-Chairs), Hang Li and Sebastian Riedel (Workshop Co-Chairs), Kevin Duh and Eneko Agirre (Tu-
torial Co-Chairs), Hsin-His Chen and Katja Markert (System Demonstration Co-Chairs), Wanxiang
Che and Guodong Zhou (Publications Co-Chairs), Stephan Oepen, Chin-Yew Lin and Emily Bender
(Student Research Workshop Faculty Advisors), Kuan-Yu Chen, Angelina Ivanova and Ellie Pavlick
(Student Research Workshop Co-Chairs), Francis Bond (Mentoring Chair), Xianpei Han and Kang
Liu (Publicity Co-Chairs), Zhiyuan Liu (Webmaster), and all the team members of the Local Orga-
nizing Committee. Thanks to their dedicated efforts, we now have a great conference consisting of
the Presidential Address (by Chris Manning), two Keynote Addresses (by Marti Hearst and Jiawei
Han), 173 long and 145 short papers, 13 TACL papers, 7 Student Research Workshop papers, 25
system demonstrations, 8 tutorials, 15 workshops, one collocated conference (CoNLL-2015), and a
not yet known Lifetime Achievement Awardees speech.
I am also grateful to our sponsors for their generous contributions. ACL-IJCNLP-2015 is sup-
ported by six Platinum Sponsors (CreditEase, Baidu, Tencent, Alibaba Group, SAMSUNG, and
Microsoft), four Gold Sponsors (Google, Facebook, SinoVoice, and Huawei), three Silver Spon-
19
Message from the General Chair
sors (Nuance, Amazon, and Sogou), one Bronze Sponsor (Yandex), one Oversea Student Fellowship
Sponsor (Baobab), and one Best Paper Sponsor (IBM). I would express special thanks to Yiqun Liu
(Local Sponsorship Chair) and all members of the International Sponsorship Committee (Ting Liu,
Hideto Kazawa, Asli Celikyilmaz, Julia Hockenmaier, and Alessandro Moschitti).
Finally, I would like to thank two keynote speakers, the area chairs of the main conference, the
workshop organizers, the tutorial presenters, the authors of main conference and demo papers, the
reviewers for their contribution, and all the attendees for participation. I hope everyone have a great
time and enjoy this conference.
Yuji Matsumoto, Nara Institute of Science and Technology

ACL-IJCNLP 2015 General Chair
20
Message from the Program Committee Co-Chairs
Welcome to the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th
International Joint Conference on Natural Language Processing of the Asian Federation of Natural
Language Processing (ACL-IJCNLP)! This year ACL-IJCNLP received 692 long paper submissions
and 648 short paper submissions which sets a new record for ACL for both long and short papers! We
are pleased to observe that our community continues to grow. Of the long papers, 173 were accepted
for presentation at ACL-IJCNLP 105 as oral and 68 as poster presentations. 145 short papers
were accepted 50 as oral and 95 as poster presentations. In addition, ACL-IJCNLP also features 12
presentations of papers accepted in the Transactions of the Association for Computational Linguistics
(TACL).
The submissions were reviewed under different categories and using different review forms for
empirical/data-driven, theoretical, applications/tools, resources/evaluation, and survey papers. This
year we introduced the item MENTORING to the review form to indicate whether a paper needs
the help of a mentor in its writing, organization or presentation. For short papers, following up on last
years successful experiences, we also welcomed the submissions describing negative results. We are
glad to see that the community is becoming more open towards negative results so that such papers
have the chance to be published, so that other researchers do not fall in the same trap.
We view posters as an integral part of ACL-IJCNLP. Half of the papers have been accepted as
posters. Hence, we spent a great deal of time to ensure that the poster session will be a good experi-
ence for poster presenters and their audience. Following last years exciting poster session, we also
organized the posters in two large poster sessions to accommodate the high-quality submissions ac-
cepted in poster presentation format. We hope attendees and authors will benefit from this additional
time to present and have more time to discuss with each other.
ACL-IJCNLP 2015 will have two distinguished invited speakers. Marti Hearst (professor at UC
Berkeley in the School of Information and EECS) and Jiawei Han (Abel Bliss Professor at University
of Illinois at Urbana-Champaign). We are grateful that they accepted our invitation.
There are many individuals to thank for their contributions to ACL-IJCNLP 2015. We would like
to thank the 37 area chairs for their hard work on recruiting reviewers, leading the discussion pro-
cess, and carefully ranking the submissions. We would also like to thank the 751 primary and the 137
secondary reviewers on whose efforts we depended to select high-quality and timely scientific work.
This year we specifically acknowledge around 18.2% of the reviewers who went the extra mile and
provided extremely helpful reviews (their names are marked with a * in the organization section of
the proceedings). The ACL coordinating committee members, including Dragomir Radev, Graeme
Hirst, Jian Su, and Gertjan van Noord were invaluable on various issues relating to the organization.
We would like to thank the prior conference chairs Kristina Toutanova and Hua Wu and their prede-
cessors for their advice. We are very grateful for the guidance and support of the general chair Yuji
Matsumoto, to the ACL Business Manager Priscilla Rasmussen who knew practically everything,
to the local chairs Le Sun and Yang Liu, the publication chairs Wanxiang Che and Guodong Zhou,
and web master Zhiyuan Liu. We would also like to thank Jiajun Zhang who helped with reviewer
assignment and numerous other tasks. Rich Gerber and Paolo Gai from Softconf were extremely
responsive to all of our requests, and we are grateful for that.
We are indebted to the best paper award committee which consisted of Eneko Agirre, Tim Baldwin,
Philipp Koehn, Joakim Nivre, and Yue Zhang. They read the candidate papers, ranked them and
provided comments on a very short notice.
We hope you will enjoy ACL-IJCNLP 2015 in Beijing!
ACL-IJCNLP 2015 Program Co-Chairs

Chengqing Zong, Chinese Academy of Sciences
Michael Strube, Heidelberg Institute for Theoretical Studies
21
Organizing Committee
General Chair
Yuji Matsumoto, Nara Institute of Science and Technology
Program Committee Co-chairs
Chengqing Zong, Institute of Automation, Chinese Academy of Sciences
Local Co-chairs
Le Sun, Institute of Software, Chinese Academy of Sciences
Yang Liu, Tsinghua University
Workshop Co-chairs
Hang Li, Huawei
Sebastian Riedel, University College London
Tutorial Co-chairs
Kevin Duh, Nara Institute of Science and Technology
Eneko Agirre, University of the Basque Country
Publications Co-chairs
Wanxiang Che, Harbin Institute of Technology
Guodong Zhou, Suzhou University
Demonstration Co-Chairs
Hsin-Hsi Chen, National Taiwan University
Katja Markert, University of Leeds
Sponsorship Chair
Yiqun Liu, Tsinghua University
Publicity Co-Chairs
Xianpei Han, Institute of software, Chinese Academy of Sciences
Kang Liu, Institute of Automation, Chinese Academy of Sciences
Student Research Workshop Co-chairs
Student Co-chairs
Kuan-Yu Chen, National Taiwan University
Angelina Ivanova, University of Oslo
Ellie Pavlick, University of Pennsylvania
Faculty Advisors
Stephan Oepen, University of Oslo
Chin-Yew Lin, Microsoft Research Asia
Emily Bender, University of Washington
Mentoring Chair
Francis Bond, Nanyang Technological University
Student Volunteer Co-Chairs
Erhong Yang, Beijing Language and Culture University
Dong Yu, Beijing Language and Culture University
Webmasters
Zhiyuan Liu, Tsinghua University
Qi Zhang, Fudan University
Entertainment Chair
Binyang Li, University of International Relations
Space Management Co-Chairs
Jiajun Zhang, Institute of Automation, Chinese Academy of Sciences
Wenbin Jiang, Institute of Computing Technology, Chinese Academy of Sciences
Qiuye Zhao, Institute of Computing Technology, Chinese Academy of Sciences
22
Graphic Design
Yi Han, Tsinghua University
Ying Lin, Beijing University of Posts and Telecommunications
Business Manager
Priscilla Rasmussen
23
Program Committee
Program Committee Co-chairs

Chengqing Zong, Chinese Academy of Sciences
Area Chairs
Discourse, Coreference and Pragmatics
Pascal Denis, INRIA Lille
Marta Recasens, Google Research
Information Retrieval
Jian-Yun Nie, Universit de Montral
Information Extraction and Text Mining
Razvan Bunescu, Ohio University
Mausam, IIT Dehli
Steven Bethard, University of Alabama at Birmingham
Language and Vision
John Kelleher, Dublin Institute of Technology
Language Resources and Evaluation
Rashmi Prasad, University of Wisconsin-Milwaukee
Jin-Dong Kim, Database Center for Life Science, Research Organization of Information and
Systems
Lexical Semantics and Ontology
Qin Lu, The Hong Kong Polytechnic University
Zornitsa Kozareva, Yahoo! Labs
Linguistic and Psycholinguistic Aspects of CL
Suzanne Stevenson, University of Toronto
Machine Learning and Topic Models for Language Processing
Daichi Mochihashi, The Institute of Statistical Mathematics
Eric Xing, Carnegie Mellon University
Regina Barzilay, MIT
Machine Translation and Multilinguality
Fei Huang, Facebook
Taro Watanabe, Google
Min Zhang, Soochow University
NLP Applications and NLP-enabled Technology
Joel Tetreault, Yahoo! Labs
Srinivas Bangalore, Interactions
NLP for the Web and Social Media
Shou-De Lin, National Taiwan University
Alice Oh, KAIST
Phonology, Morphology and Word Segmentation
Greg Kondrak, University of Alberta
Jianfeng Gao, Microsoft Research
Question Answering
James Fan, IBM
Semantics
Chris Biemann, TU Darmstadt
Raquel Fernandez, University of Amsterdam
Tom Kwiatkowski, Google Research
24
Sentiment Analysis and Opinion Mining
Jing Jiang, Singapore Management University
Lun-Wei Ku, Academia Sinica
Spoken Language Processing, Dialogue and Interactive Systems, and Multimodal NLP
David Schlangen, University of Bielefeld
Julia Hirschberg, Columbia University
Summarization and Generation
Xiaojun Wan, Peking University
Mirella Lapata, University of Edinburgh
Tagging, Chunking, Syntax and Parsing
Yusuke Miyao, National Institute of Informatics
Anders Soegaard, University of Copenhagen
Mark Dras, Macquarie University
25
Meal Info
Date Time Item Location
7.26 Lunch 11:30-13:30 Chinese bento Function Hall B
7.26 Lunch 11:30-13:30 Western bento Function Hall B
7.26 Dinner 18:00-21:00 Buffet Ballroom C
7.27 Lunch 11:30-13:30 Student Lunch Function A+B
7.27 Lunch 11:30-13:30 Chinese bento 306AB/307AB
7.27 Lunch 11:30-13:30 Western bento 308
7.27 Dinner Buffet+Poster Plenary Hall B
7.28 Lunch 11:30-13:30 Chinese bento 306AB/307AB
7.28 Dinner Buffet+Poster Plenary Hall B
7.29 Lunch 11:30-13:30 Chinese bento 307AB
7.30 Lunch 11:30-13:30 Chinese bento 403/405
7.31 Lunch 11:30-13:30 Chinese bento 403/405
26
Tutorials: Sunday, July 26
2
Overview
7:30 18:00 Registration 3rd floor

9:00 12:30 Morning Tutorials
Successful Data Mining Methods for NLP 301A+B
Jiawei Han, Heng Ji, and Yizhou Sun
Structured Belief Propagation for NLP 302A

Matthew R. Gormley and Jason Eisner
Sentiment and Belief: How to Think about and Represent and and Annotate Private
States 302B
Owen Rambow and Janyce Wiebe
Corpus Patterns for Semantic Processing 305

Patrick Hanks, Elisabetta Jezek, Daisuke Kawahara, and Octavian Popescu
10:30 11:00 Coffee break

12:30 14:00 Lunch break
14:00 17:30 Afternoon Tutorials
Matrix and Tensor Factorization Methods for Natural Language Processing 301A+B
Guillaume Bouchard, Jason Naradowsky, Sebastian Riedel, Tim Rocktaschel, and
Andreas Vlachos
Scalable Large-Margin Structured Learning: Theory and Algorithms 302A

Liang Huang and Kai Zhao
Detecting Deceptive Opinion Spam using Linguistics, Behavioral and Statistical

Modeling 302B
Arjun Mukherjee
What You Need to Know about Chinese for Chinese Language Processing 305
Chu-Ren Huang
15:30 16:00 Coffee break
27
Tutorials
18:00 21:00 Welcome Reception Ballroom C
28
Sunday, July 26, 2015
Tutorial 1
Successful Data Mining Methods for NLP
Jiawei Han, Heng Ji, and Yizhou Sun
Sunday, July 26, 2015, 9:0012:30
301A+B
Historically Natural Language Processing (NLP) focuses on unstructured data (speech and text)
understanding while Data Mining (DM) mainly focuses on massive, structured or semi-structured
datasets. The general research directions of these two fields also have followed different philoso-
phies and principles. For example, NLP aims at deep understanding of individual words, phrases and
sentences ("micro-level"), whereas DM aims to conduct a high-level understanding, discovery and
synthesis of the most salient information from a large set of documents when working on text data
("macro-level"). But they share the same goal of distilling knowledge from data. In the past five
Jiawei Han is Abel Bliss Professor in the Department of Computer Science at the University of
Illinois. He has been researching into data mining, information network analysis, and database sys-
tems, with over 600 publications. He served as the founding Editor-in-Chief of ACM Transactions
on Knowledge Discovery from Data (TKDD). He has received ACM SIGKDD Innovation Award
(2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W.
Wallace McDowell Award (2009), and Daniel C. Drucker Eminent Faculty Award at UIUC (2011).
He is a Fellow of ACM and a Fellow of IEEE. He is currently the Director of Information Network
Academic Research Center (INARC) supported by the Network Science-Collaborative Technology
Alliance (NS-CTA) program of U.S. Army Research Lab and also the Director of KnowEnG, an NIH
Center of Excellence in big data computing as part of NIH Big Data to Knowledge (BD2K) initiative.
His co-authored textbook "Data Mining: Concepts and Techniques" (Morgan Kaufmann) has been
adopted worldwide. He has delivered tutorials in many reputed international conferences, including
WWW14, SIGMOD14 and KDD14.
Heng Ji is Edward H. Hamilton Development Chair Associate Professor in Computer Science De-
partment of Rensselaer Polytechnic Institute. She received "AIs 10 to Watch" Award in 2013, NSF
CAREER award in 2009, Google Research Awards in 2009 and 2014 and IBM Watson Faculty
Awards in 2012 and 2014. In the past five years she has done extensive collaborations with Prof.
Jiawei Han and Prof. Yizhou Sun on applying data mining techniques to NLP problems and jointly
published 15 papers, including a "Best of SDM2013" paper and a "Best of ICDM2013" paper. She
has delivered tutorials at COL-ING2012, ACL2014 and NLPCC2014.
Yizhou Sun is an assistant professor in the College of Computer and Information Science of North-
eastern University. She received her Ph.D. in Computer Science from the University of Illinois at
Urbana Champaign in 2012. Her principal research interest is in mining information and social net-
works, and more generally in data mining, database systems, statistics, machine learning, information
retrieval, and network science, with a focus on modeling novel problems and proposing scalable al-
gorithms for large scale, real-world applications. Yizhou has over 60 publications in books, journals,
and major conferences. Tutorials based on her thesis work on mining heterogeneous information
networks have been given in several premier conferences, including EDBT 2009, SIGMOD 2010,
KDD 2010, ICDE 2012, VLDB 2012, and ASONAM 2012. She received 2012 ACM SIGKDD
Best Student Paper Award, 2013 ACM SIGKDD Doctoral Dissertation Award, and 2013 Yahoo ACE
(Academic Career Enhancement) Award.
29
Tutorials
years, these two areas have had intensive interactions and thus mutually enhanced each other through
many successful text mining tasks. This positive progress mainly benefits from some innovative in-
termediate representations such as "heterogeneous information networks" [Han et al., 2010, Sun et
al., 2012b].
However, successful collaborations between any two fields require substantial mutual understanding,
patience and passion among researchers. Similar to the applications of machine learning techniques
in NLP, there is usually a gap of at least several years between the creation of a new DM approach and
its first successful application in NLP. More importantly, many DM approaches such as gSpan [Yan
and Han, 2002] and RankClus [Sun et al., 2009a] have demonstrated their power on structured data.
But they remain relatively unknown in the NLP community, even though there are many obvious
potential applications. On the other hand, compared to DM, the NLP community has paid more
attention to developing large-scale data annotations, resources, shared tasks which cover a wide range
of multiple genres and multiple domains. NLP can also provide the basic building blocks for many
DM tasks such as text cube construction [Tao et al., 2014]. Therefore in many scenarios, for the
same approach the NLP experiment setting is often much closer to real-world applications than its
DM counterpart.
We would like to share the experiences and lessons from our extensive inter-disciplinary collabora-
tions in the past five years. The primary goal of this tutorial is to bridge the knowledge gap between
these two fields and speed up the transition process. We will introduce two types of DM methods: (1).
those state-of-the-art DM methods that have already been proven effective for NLP; and (2). some
newly developed DM methods that we believe will fit into some specific NLP problems. In addition,
we aim to suggest some new research directions in order to better marry these two areas and lead to
more fruitful outcomes. The tutorial will thus be useful for researchers from both communities. We
will try to provide a concise roadmap of recent perspectives and results, as well as point to the related
DM software and resources, and NLP data sets that are available to both research communities.
30
Tutorial 2
Structured Belief Propagation for NLP
Matthew R. Gormley and Jason Eisner
Sunday, July 26, 2015, 9:0012:30
302A
Statistical natural language processing relies on probabilistic models of linguistic structure. More
complex models can help capture our intuitions about language, by adding linguistically meaningful
interactions and latent variables. However, inference and learning in the models we want often poses
a serious computational challenge. Belief propagation (BP) and its variants provide an attractive
approximate solution, especially using recent training methods. These approaches can handle joint
models of interacting components, are computationally efficient, and have extended the state-of-the-
art on a number of common NLP tasks, including dependency parsing, modeling of morphologi-
cal paradigms, CCG parsing, phrase extraction, semantic role labeling, and information extraction
(Smith and Eisner, 2008; Dreyer and Eisner, 2009; Auli and Lopez, 2011; Burkett and Klein, 2012;
Naradowsky et al., 2012; Stoyanov and Eisner, 2012).
This tutorial delves into BP with an emphasis on recent advances that enable state-of-the-art perfor-
mance in a variety of tasks. Our goal is to elucidate how these approaches can easily be applied
to new problems. We also cover the theory underlying them. Our target audience is researchers in
human language technologies; we do not assume familiarity with BP. In the first three sections, we
discuss applications of BP to NLP problems, the basics of modeling with factor graphs and message
passing, and the theoretical underpinnings of "what BP is doing" and how it relates to other inference
techniques. In the second three sections, we cover key extensions to the standard BP algorithm to
enable modeling of linguistic structure, efficient inference, and approximation-aware training. We
survey a variety of software tools and introduce a new software framework that incorporates many of
the modern approaches covered in this tutorial.
Matt Gormley is a PhD student at Johns Hopkins University working with Mark Dredze and Jason
Eisner. His current research focuses on joint modeling of multiple linguistic strata in learning settings
where supervised resources are scarce. He has authored papers in a variety of areas including topic
modeling, global optimization, semantic role labeling, relation extraction, and grammar induction.
http://www.cs.jhu.edu/~mrg/
Jason Eisner is a Professor in Computer Science and Cognitive Science at Johns Hopkins University,
where he has received two school-wide awards for excellence in teaching. His 90+ papers have
presented many models and algorithms spanning numerous areas of NLP. His goal is to develop the
probabilistic modeling, inference, and learning techniques needed for a unified model of all kinds of
linguistic structure. In particular, he and his students introduced structured belief propagation (which
incorporates classical NLP models and their associated dynamic programming algorithms), as well as
loss-calibrated training for use with belief propagation. http://www.cs.jhu.edu/~jason/
31
Tutorials
Tutorial 3
Sentiment and Belief: How to Think about and Represent and and
Annotate Private States
Owen Rambow and Janyce Wiebe
Sunday, July 26, 2015, 9:0012:30
302B
Over the last ten years, there has been an explosion in interest in sentiment analysis, with many
interesting and impressive results. For example, the first twenty publications on Google Scholar
returned for the Query "sentiment analysis all date from 2003 or later, and have a total citation
count of 12,140. The total number of publications is in the thousands. Partly, this interest is driven
by the immediate commercial applications of sentiment analysis.
Sentiment is a "private state" (Wiebe 1990). However, it is not the only private state that has received
attention in the computational literature; others include belief and intention. In this tutorial, we pro-
pose to provide a deeper understanding of what a private state is. We will concentrate on sentiment
and belief. Belief is very closely related to factuality, and also to notions such as veridicality, modal-
ity, and hedging. We will provide background that will allow the tutorial participants to understand
the notion of a private state as a cognitive phenomenon, which can be manifested linguistically in
various ways. We will explain the formalization in terms of a triple of state, source, and target. We
will discuss how to model the source and the target. We will then explain in some detail the anno-
tations that have been made. The issue of annotation is crucial for private states: while the MPQA
corpus (Wiebe et al. 2005) has been around for some time, most research using it does not make
use of many of its features. We believe this is because the MPQA annotation is quite complex and
requires a deeper understanding of the phenomenon of "private state, which is what the annotation
is getting at. Furthermore, there are currently several efforts underway of creating new versions of
annotations, which we will also present.
Owen Rambow is a Senior Research Scientist at the Center for Computational Learning Systems
at Columbia University. He is also the co-chair of the Center for New Media at the Data Science
Institute at Columbia University. He has been interested in modeling cognitive states in relation
to language for a long time, initially in the context of natural language generation (Rambow 1993
Walker and Rambow 1994). More recently, he has studied belief in the context of recognizing beliefs
in language (diab et al. 2009, Prabhakaran et al. 2010, Danlos and Rambow 2011, Prabhakaran et al.
2012). He is currently leading the DARPA DEFT Belief group, working with other researchers and
with the LDC to define annotation standards and evaluations. He was recently involved in the pilot
evaluation for belief recognition (in English) in the DARPA DEFT program.
Janyce Wiebe is Professor of Computer Science and Professor and Co-Director of the Intelligent
Systems at the University of Pittsburgh. She has worked on issues related to private states for some
time, originally in the context of tracking point of view in narrative (Wiebe 1994), and later in the
context of recognizing sentiment in other genres such as news articles (Wilson et al. 2005). She
has approached the area from the perspective of corpus annotation (Wiebe et al. 2005, Deng et al.
2013), lexical semantics (Wiebe and Mihalcea 2006), and discourse (Somasundaran et al. 2009). In
addition to continuing these lines of research, she has recently begun investigating implicatures in
opinion analysis (Deng and Wiebe 2014). http://people.cs.pitt.edu/~wiebe/
32
The larger goal of this tutorial is to allow the tutorial participants to gain a deeper understanding of
the role of private states in human communication, and to encourage them to use this deeper under-
standing in their computational work. The immediate goal of this tutorial is to allow the participants
to make more complete use of available annotated resources. These include the MPQA corpus, The
LU Coprus (Diab et al. 2009), FactBank (Saur and Pustejovsky 2009), and the corpora under de-
velopment at the LDC which include sentiment and belief. We propose to achieve these goals by
concentrating on annotated corpora, since this will allow participants to both understand the under-
lying content (achieving the larger goal) and the technical details of the annotations (achieving the
immediate goal).
33
Tutorials
Tutorial 4
Corpus Patterns for Semantic Processing
Patrick Hanks, Elisabetta Jezek, Daisuke Kawahara, and Octavian Popescu
Sunday, July 26, 2015, 14:0017:30
305
This tutorial presents a pattern-based empirical approach to meaning representation and computation.
It is a response to the finding by corpus linguists that "most meanings require the presence of more
than one word for their normal realization". The tutorial shows how patterns are built, using corpus
evidence, using machine learning methods, and discusses potential applications of patterns. It is
intended for an audience with heterogeneous competences but with a common interest in corpus
linguistics and computational models for meaning-related tasks in NLP. The goal is to equip the
audience with a better understanding of the role played by patterns in natural language, an operative
command of the methodology used to acquire patterns, and a forum in which to discuss their utility
in NLP applications.
The relatively recent explosion of corpus-driven research has shown that intermediate text represen-
tations (ITRs), built from the bottom up, using corpus examples towards a complex representation
of phrases, play an important role in dealing with the meaning disambiguation problem. It has been
shown that it is possible to identify and to learn corpus patterns that encode the information that
accounts for the senses of the verb and its arguments in the context. These patterns link the syntactic
structure of verbal phrases and the semantic types of their argument fillers via the role that each of
these play in the disambiguation of the phrase as a whole. The available solutions developed so far
range from supervised to totally unsupervised approaches. The patterns obtained encode the neces-
sary information for handling the meaning of each word individually as well as that of the phrase
as a whole. As such, they are instrumental in building better language models as in the contexts
matched by such patterns. The semantic types used in pattern representation play a discriminative
role, therefore the patterns are sense discriminative and as such they can be used in word sense dis-
ambiguation and other meaning-related tasks. The meaning of a pattern as a whole is expressed as a
set of basic implicatures. The implicatures are instrumental in textual entailment, semantic similarity
Patrick Hanks is a lexicographer and corpus linguist. He currently holds two research professorships: one at
the Research Institute of Information and Language Processing in the University of Wolverhampton, the other
at the Bristol Centre for Linguistics in the University of the West of England (UWE, Bristol).
Elisabetta Jezek has been teaching Syntax and Semantics and Applied Linguistics at the University of Pavia
since 2001. Her research interests and areas of expertise are lexical semantics, verb classification, theory of
Argument Structure, event structure in syntax and semantics, corpus annotation, computational Lexicography.
Daisuke Kawahara is an Associate Professor at Kyoto University. He is an expert in the areas of parsing,
knowledge acquisition and information analysis. He teaches graduate classes in natural language processing.
His current work is focused on automatic induction of semantic frames and semantic parsing, verb polysemic
classes, verb sense disambiguation, and automatic induction of semantic frames.
Octavian Popescu is a researcher at IBMs T. J. Watson Research Center, Yorktown, working on computational
semantics with a focus on corpus patterns for meaning processing. His work is focused on models for word
sense disambiguation, textual entailment and paraphrase acquisition. He taught various NLP graduate courses
in computational semantics at Trento University (IT), Colorado University at Boulder (US) and University of
Bucharest (RO).
34
and paraphrasing generation etc.

The corpus patterns methodology is designed to offer a viable solution to meaning representation.
The techniques we present are widely applicable in NLP and they deal efficiently with data sparseness
and open domain expression of semantic relationships. We show how including a corpus pattern
module into an NLP system is beneficial for accurately and consistently resolving textual entailment,
paraphrase generation and semantic similarity.
The tutorial is divided into three main parts, which are interconnected: (1) Probabilities and Patterns
Corpus Pattern Theory of Norms and Exploitations (2) Inducing Semantic Types and Semantic Task
Oriented Ontologies and (C) Machine Learning and Applications of Corpus Patterns.
1. Discovering Computable Semantic Properties of Verb Phrases Why do we need patterns? How
to analyse corpus data; lexical statistics; theory of linguistic norms and exploitations; the Sketch
Engine; sense discriminative patterns.
2. Semantic Types and Ontologies Inducing semantic types and ontologies; grouping and processing
lexical sets according to their semantic types; argument structures, semantic frames, and patterns.
3. Statistical Models for Corpus Pattern Recognition and Extraction from Corpora Finite State
Markov chains and ranching Processes; Naive Bayesian and Gaussian Random Fields for comput-
ing conditional probabilities over semantic types; Latent Dirichlet analysis for unsupervised pattern
extraction; Probabilities: approximately correct models and statistical query models; Joint Source
Channel model for recognition of normal patterns in text; recognising exploitation; using patterns in
tasks such as computing textual entailment, paraphrase generation, and measuring textual similarity
35
Tutorials
Tutorial 5
Matrix and Tensor Factorization Methods for Natural Language

Processing
Guillaume Bouchard, Jason Naradowsky, Sebastian Riedel, Tim Rocktaschel, and Andreas Vlachos
Sunday, July 26, 2015, 14:0017:30
301A+B
Guillaume Bouchard is a senior researcher in statistics and machine learning at Xerox, focusing on
statistical learning using low-rank model for large relational databases. His research includes text
understanding, user modeling, and social media analytics. The theoretical part of his work is related
to the efficient algorithms to compute high dimensional integrals, essential to deal with uncertainty
(missing and noisy data, latent variable models, Bayesian inference). The main application areas of
his work includes the design of virtual conversational agents, link prediction (predictive algorithms
for relational data), social media monitoring and transportation analytics. His web page is available
at http://www.xrce.xerox.com/people/bouchard.
Jason Naradowsky is a postdoc at the Machine Reading group at UCL. Having previously obtained
a PhD at UMass Amherst under the supervision of David Smith and Mark Johnson, his current
research aims to improve natural language understanding by performing task-specific training of
word representations and parsing models. He is also interested in semi-supervised learning, joint
inference, and semantic parsing. His web page is available at http://narad.github.io/.
Sebastian Riedel is a senior lecturer at University College London and an Allen Distinguished In-
vestigator, leading the Machine Reading Lab. Before, he was a postdoc and research scientist with
Andrew McCallum at UMass Amherst, a researcher at Tokyo University and DBCLS with Tsujii
Junichi, and a PhD student with Ewan Klein at the University of Edinburgh. He is interested in
teaching machines how to read and works at the intersection of Natural Language Processing (NLP)
and Machine Learning, investigating various stages of the NLP pipeline, in particular those that
require structured prediction, as well as fully probabilistic architectures of end-to-end reading and
reasoning systems. Recently he became interested in new ways to represent textual knowledge using
low-rank embeddings and how to reason with such representations. His web page is available at
http://www.riedelcastro.org/.
Tim Rocktaschel is a PhD student in Sebastian Riedels Machine Reading group at University Col-
lege London. Before that he worked as research assistant in the Knowledge Management in Bioin-
formatics group at Humboldt-University zu Berlin, where he also obtained his Diploma in Computer
Science. He is broadly interested in representation learning (e.g. matrix/tensor factorization, deep
learning) for NLP and automated knowledge base completion, and how these methods can take ad-
vantage of symbolic background knowledge. His webpage is available at http://rockt.github.io/.
Andreas Vlachos is postdoc at the Machine Reading group at UCL working with Sebastian Riedel
on automated fact-checking using low-rank factorization methods. Before that he was a postdoc at
the Natural Language and Information Processing group at the University of Cambridge and at the
University of Wisconsin-Madison. He is broadly interested in natural language understanding (e.g.
information extraction, semantic parsing) and in machine learning approaches that would help us
towards this goal. He has also worked on active learning, clustering and biomedical text mining. His
web page is available at http://sites.google.com/site/andreasvlachos/.
36
Tensor and matrix factorization methods have attracted a lot of attention recently thanks to their
successful applications to information extraction, knowledge base population, lexical semantics and
dependency parsing. In the first part, we will first cover the basics of matrix and tensor factorization
theory and optimization, and then proceed to more advanced topics involving convex surrogates and
alternative losses. In the second part we will discuss recent NLP applications of these methods and
show the connections with other popular methods such as transductive learning, topic models and
neural networks. The aim of this tutorial is to present in detail applied factorization methods, as well
as to introduce more recently proposed methods that are likely to be useful to NLP applications.
37
Tutorials
Tutorial 6
Scalable Large-Margin Structured Learning: Theory and

Algorithms
Liang Huang and Kai Zhao
Sunday, July 26, 2015, 14:0017:30
302A
Much of NLP tries to map structured input (sentences) to some form of structured output (tag se-
quences, parse trees, semantic graphs, or translated/paraphrased/compressed sentences). Thus struc-
tured prediction and its learning algorithm is of central importance to us NLP researchers. However,
when applying traditional machine learning to structured domains, we often face scalability issues
for two reasons:
1. Exact structured inference (such as parsing and translation) is too slow for repeated use on the
training data, but approximate search (such as beam search) unfortunately breaks down the nice
theoretical properties (such as convergence) of existing machine learning algorithms.
2. Even with inexact search, the scale of the training data in NLP still makes pure online learning
(such as perceptron and MIRA) too slow on a single CPU.
This tutorial reviews recent advances that address these two challenges. In particular, we will cover
principled machine learning methods that are designed to work under vastly inexact search, and
parallelization algorithms that speed up learning on multiple CPUs. We will also extend structured
learning to the latent variable setting, where in many NLP applications such as translation the gold-
standard derivation is hidden. interpretation of metaphors.
Liang Huang is an Assistant Professor at the City University of New York (CUNY). He graduated
in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant
Professor at USC/ISI. His work is mainly on the theoretical aspects (algorithms and formalisms) of
computational linguistics, as well as theory and algorithms of structured learning. He has received a
Best Paper Award at ACL 2008, several best paper nominations (ACL 2007, EMNLP 2008, and ACL
2010), two Google Faculty Research Awards (2010 and 2013), and a University Graduate Teaching
Prize at Penn (2005). He has given three tutorials at COLING 2008, NAACL 2009 and ACL 2014.
James Cross is a Ph.D. candidate at the City University of New York (CUNY) Graduate Center,
working under the direction of Liang Huang in the area of natural language processing. He has
undergraduate degrees in computer science and French from Auburn University, a law degree from
New York University, and a masters degree in computer science from the City College of New York.
He was a summer intern at the IBM T.J. Watson Research Center in 2014 and is a summer research
intern at Facebook in 2015.
38
Tutorial 7
Detecting Deceptive Opinion Spam using Linguistics, Behavioral and

Statistical Modeling
Arjun Mukherjee
Sunday, July 26, 2015, 14:0017:30
302B
With the advent of Web 2.0, consumer reviews have become an important resource for public opinion
that influence our decisions over an extremely wide spectrum of daily and professional activities:
e.g., where to eat, where to stay, which products to purchase, which doctors to see, which books to
read, which universities to attend, and so on. Positive/negative reviews directly translate to financial
gains/losses for companies. This unfortunately gives strong incentives for opinion spamming which
refers to illegal human activities (e.g., writing fake reviews and giving false ratings) that try to mislead
customers by promoting/demoting certain entities (e.g., products and businesses). The problem has
been widely reported in the news. Despite the recent research efforts on detection, the problem is far
from solved. What is worse is that opinion spamming is widespread. While credit card fraud is as
rare as 0.2
Major review hosting sites and e-commerce vendors have already made some progress in detecting
fake reviews. However, the task is still extremely challenging because it is very difficult to obtain
large-scale ground truth samples of deceptive opinions for algorithm development and for evaluation,
or to conduct large-scale domain ex-pert evaluations. Further, in contrast to other kinds of spamming
(e.g., Web and link spam, social/blog spam, email spam, etc.) opinion spam has a very unique flavor
as it involves fluid sentiments of users and their evaluations. Thus, they require a very different treat-
ment. Since our first paper in 2007 (Jindal and Liu, 2007) on the topic, our group and many other
researchers have pro-posed several algorithms and bridged algorithmic methodologies from various
scientific disciplines including computational linguistics (Ott et al., 2011), social and behavioral sci-
ences (Jindal and Liu, 2008; Mukherjee et al., 2013a), machine learning, data mining and Bayesian
statistics (Mukherjee et al., 2012; Fei et al., 2013; Mukherjee et al., 2013b; Li et al., 2014b; Li et
al., 2014a) to solve the problem. The field of deceptive opinion spam has gained a lot of interest
in communications (Hancock et al., 2008), psycho-linguistics communities (Gokhman et al., 2012),
and economic analysis (Wang, 2010) apart from mainstream NLP and Web mining as attested by
publications in top tier venues in their respective communities. The problem has far reaching im-
plications in various allied NLP topics including Lie Detection, Forensic Linguistics, Opinion Trust
and Veracity Verification and Plagiarism Detection. However, owing to the inherent nature of the
problem, a unique blend of NLP, data mining, ma-chine learning, social, behavioral, and statistical
techniques are required which many NLP re-searchers may not be familiar with.
In this tutorial, we aim to cover the problem in its full depth and width, covering diverse algorithms
Arjun Mukherjee is an Assistant Professor in the Department of Computer Science at the University of Hous-
ton. He is an active researcher in the area of opinion spam, sentiment analysis and Web mining. He is the
lead author behind several influential works on opinion spam research. These include group opinion spam,
commercial fake re-view filters (e.g., Yelp), and various statistical models for detecting singular opinion spam-
mers, burstiness patterns, and campaign. His work on opinion mining including deception detection have also
received significant media attention (e.g., ACM Tech News, NYTimes, LATimes, Business Week, CNet, etc).
Mukherjee has also served as program committee members of WWW, ACL, EMNLP, and IJCNLP.
39
Tutorials
that have been developed over the past 7 years. The most attractive quality of these techniques is that
many of them can be adapted for cross-domain and unsupervised settings. Some of the methods are
even in use by startups and established companies. Our focus is on insight and understanding, using
illustrations and intuitive deductions. The goal of the tutorial is to make the inner workings of these
techniques transparent, intuitive and their results interpretable.
40
Tutorial 8
What You Need to Know about Chinese for Chinese Language

Processing
Chu-Ren Huang
Sunday, July 26, 2015, 14:0017:30
305
The synergy between language sciences and language technology has been an elusive one for the
computational linguistics community, especially when dealing with a language other than English.
The reasons are two-fold: the lack of an accessible comprehensive and robust account of a specific
language so as to allow strategic linking between a processing task to linguistic devices, and the lack
of successful computational studies taking advantage of such links. With a fast growing number of
available online resources, as well as a rapidly increasing number of members of the CL community
who are interested in and/or working on Chinese language processing, the time is ripe to take a
serious look at how knowledge of Chinese can help Chinese language processing.
The tutorial will be organized according to the structure of linguistic knowledge of Chinese, starting
from the basic building block to the use of Chinese in context. The first part deals with characters
as the basic linguistic unit of Chinese in terms of phonology, orthography, and basic concepts. An
ontological view of how the Chinese writing system organizes meaningful content as well as how
this onomasiological decision affects Chinese text processing will also be discussed. The second
part deals with words and presents basic issues involving the definition and identification of words
in Chinese, especially given the lack of conventional marks of word boundaries. The third part deals
with parts of speech and focuses on definition of a few grammatical categories specific to Chinese,
as well as distributional properties of Chinese PoS and tagging systems. The fourth part deals with
sentence and structure, focusing on how to identify grammatical relations in Chinese as well as a
few Chinese-specific constructions. The fifth part deals with how meanings are represented and
expressed, especially how different linguistic devices (from lexical choice to information structure)
are used to convey different information. Lastly, the sixth part deals with the ranges of different
varieties of Chinese in the world and the computational approaches to detect and differentiate these
varieties. In each topic, an empirical foundation of linguistics facts are clearly explicated with a
robust generalization, and the linguistic generalization is then accounted for in terms of its function
in the knowledge representation system. Lastly this knowledge representation role is then exploited
Chu-Ren Huang is currently a Chair Professor at the Hong Kong Polytechnic University. He is a
Fellow of the Hong Kong Academy of the Humanities, a permanent member of the International
Committee on Computational Linguistics, and President of the Asian Association of Lexicography.
He currently serves as Chief Editor of the Journal Lingua Sinica, as well as Cambridge University
Press? Studies in Natural Language Processing. He is an associate editor of both Journal of Chi-
nese Linguistics, and Lexicography. He has served advisory and/or organizing roles for conferences
including ALR, ASIALEX, CLSW, CogALex, COLING, IsCLL, LAW, OntoLex, PACLIC, RO-
CLING, and SIGHAN. Chinese language resources constructed under his direction include the CKIP
lexicon and ICG, Sinica, Sinica Treebank, Sinica BOW, Chinese WordSketch, Tagged Chinese Giga-
word Corpus, Hantology, Chinese WordNet, and Emotion Annotated Corpus. He is the co-author of
a Chinese Reference Grammar (Huang and Shi 2016), and a book on Chinese Language Processing
(Lu, Xue and Huang in preparation).
41
Tutorials
in terms of the aims of specific language technology tasks. In terms of references, in addition to
language resources and various relevant papers, the tutorial will make reference to Huang and Shis
(2016) reference grammar for a linguistic description of Chinese.
42
Monday, July 27, 2015
Welcome Reception
Sunday, July 26, 2015, 18:00 21:00
China National Convention Center (CNCC)

Ballroom C
Catch up with your colleagues at the Welcome Reception! It will be held immediately following the
Tutorials on Sunday, July 26 at 6:00pm in Ballroom C of CNCC.
43
44
Main Conference: Monday, July 27
3
Overview
7:30 18:00 Registration 1st and 3rd floors

8:45 9:00 Welcome to ACL-IJCNLP 2015 Plenary Hall B
9:00 9:40 Presidential Address: Christopher D. Manning Plenary Hall B
9:40 10:10 Coffee Break
Session 1
Machine Trans- Language and Semantics: Machine Learn- Information
10:10 11:50 lation: Neural Vision/NLP Embeddings ing Extraction 1
Networks Applications
Plenary Hall B 309A 310 311A 311B
11:50 13:20 Lunch Break; Student Lunch
Session 2
Machine Trans- Question An- Semantics: Parsing: Neural Information
13:20 15:00 lation swering Distributional Networks Extraction 2
Approaches
309A 309B 310 311A 311B
Session 3
Language Sentiment Natural Lan- Spoken Lan- Information
Resources Analysis: guage Genera- guage Pro- Extraction
15:30 16:45
Cross-/Multi tion cessing and 3/Information
Lingual Understanding Retrieval
309A 309B 310 311A 311B
16:45 17:00 Short Break
Session 4
Semantics Sentiment Summarization Discourse, Language and
17:00 18:00
Analysis and Generation Coreference Vision
309A 309B 310 311A 311B
18:00 21:00 Poster and Dinner Session 1: TACL Papers, Long Papers, System Demonstra-
tions Plenary Hall B
45
Session 1
Session 1 Overview Monday, July 27, 2015
Track A Track B Track C Track D Track E

Machine Trans- Language and Semantics: Em- Machine Learn- Information
lation: Neural Vision/NLP Ap- beddings ing Extraction 1
Networks plications
On Using Very Describing Im-
[TACL] Im- Joint Models of Compositional
10:10
Large Target ages using In-

proving Dis- Disagreement Vector Space
Vocabulary for ferred Visual
tributional and Stance in Models for
Neural Machine Dependency Similarity Online Debate Knowledge
Translation Representations
with Lessons Sridhar, Foulds, Base Comple-
Jean, Cho, Learned from
Elliott and De Huang, Getoor, tion
Memisevic, and Vries Word Embed- and Walker Neelakantan,
Bengio dings Roth, and Mc-
Levy, Goldberg, Callum
and Dagan
Addressing the Text to 3D Semantically Low-Rank Reg- Event Extrac-
10:35
Rare Word Scene Gen- Smooth Knowl- ularization for tion via Dy-
Problem in eration with edge Graph Sparse Con- namic Multi-
Neural Machine Rich Lexical Embedding junctive Fea- Pooling Convo-
Translation Grounding Guo, Wang, ture Spaces: An lutional Neural
Luong, Sutskev- Chang, Monroe, Wang, Wang, Application to Networks
er, Le, Vinyals, Savva, Potts, and Guo Named Entity Chen, Xu, Liu,
and Zaremba and Manning Classification Zeng, and Zhao
Primadhanty,
Carreras, and
Quattoni
Encoding MultiGranCNN: SensEmbed: Learning Word Stacked Ensem-
11:00
Source Lan- An Architecture Learning Sense Representa- bles of Informa-

guage with for General Embeddings for tions by Jointly tion Extractors
Convolutional Matching of Word and Rela- Modeling Syn- for Knowledge-
Neural Network Text Chunks on tional Similarity tagmatic and Base Population
for Machine Multiple Levels Iacobacci, Pile- Paradigmatic Viswanathan,
Translation of Granularity hvar, and Nav- Relations Rajani, Bentor,
Meng, Lu, Yin and Schtze igli Sun, Guo, Lan, and Mooney
Wang, Li, Jiang, Xu, and Cheng
and Liu
Statistical Ma- Weakly Super- Revisiting Word Learning Dy- Generative
11:25
chine Transla- vised Mod- Embedding for namic Feature Event Schema
tion Features els of Aspect- Contrasting Selection for Induction with
with Multitask Sentiment for Meaning Fast Sequential Entity Disam-
Tensor Net- Online Course Chen, Lin, Prediction biguation
works Discussion Fo- Chen, Chen, Strubell, Vilnis, Nguyen, Tan-
Setiawan, rums Wei, Jiang, and Silverstein, and nier, Ferret, and
Huang, Devlin, Ramesh, Ku- Zhu McCallum Besanon
Lamar, Zbib, mar, Foulds,
Schwartz, and and Getoor
Makhoul
46
Parallel Session 1
Session 1A: Machine Translation: Neural Networks

Plenary Hall B Chair: Taro Watanabe
On Using Very Large Target Vocabulary for Neural Machine Translation
Sbastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio 10:1010:35
Neural machine translation, a recently proposed approach to machine translation based purely on neural net-
works, has shown promising results compared to the existing approaches such as phrase-based statistical ma-
chine translation. Despite its recent success, neural machine translation has its limitation in handling a larger
vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of
target words. In this paper, we propose a method based on importance sampling that allows us to use a very
large target vocabulary without increasing training complexity. We show that decoding can be efficiently done
even with the model having a very large target vocabulary by selecting only a small subset of the whole target
vocabulary. The models trained by the proposed approach are empirically found to match, and in some cases
outperform, the baseline models with a small vocabulary as well as the LSTM-based neural machine transla-
tion models. Furthermore, when we use an ensemble of a few models with very large target vocabularies, we
achieve performance comparable to the state of the art measured by BLEU on both the English-German and
English-French translation tasks of WMT14.
Addressing the Rare Word Problem in Neural Machine Translation

Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba 10:3511:00
Neural Machine Translation NMT is a new approach to machine translation that has shown promising results
that are comparable to traditional approaches. A significant weakness in conventional NMT systems is their
inability to correctly translate very rare words: end-to-end NMTs tend to have relatively small vocabularies with
a single unk symbol that represents every possible out-of-vocabulary OOV word. In this paper, we propose and
implement an effective technique to address this problem. We train an NMT system on data that is augmented
by the output of a word alignment algorithm, allowing the NMT system to emit, for each OOV word in the
target sentence, the position of its corresponding word in the source sentence. This information is later utilized
in a post-processing step that translates every OOV word using a dictionary. Our experiments on the WMTs4
English to French translation task show that this method provides a substantial improvement of up to 2.8 BLEU
points over an equivalent NMT system that does not use this technique. With 37.5 BLEU points, our NMT
system is the first to surpass the best result achieved on a WMTs4 contest task.
Encoding Source Language with Convolutional Neural Network for Machine Translation
Fandong Meng, Zhengdong Lu, Mingxuan Wang, Hang Li, Wenbin Jiang, and Qun Liu 11:0011:25
The recently proposed neural network joint model NNJM Devlin et al., 2014 augments the n-gram target
language model with a heuristically chosen source context window, achieving state-of-the-art performance
in SMT. In this paper, we give a more systematic treatment by summarizing the relevant source information
through a convolutional architecture guided by the target information. With different guiding signals during
decoding, our specifically designed convolution+gating architectures can pinpoint the parts of a source sentence
that are relevant to predicting a target word, and fuse them with the context of entire source sentence to form
a unified representation. This representation, together with target language words, are fed to a deep neural
network DNN to form a stronger NNJM. Experiments on two NIST Chinese-English translation tasks show
that the proposed model can achieve significant improvements over the previous NNJM by up to +1.08 BLEU
points on average.
47
Session 1
Statistical Machine Translation Features with Multitask Tensor Networks

Hendra Setiawan, Zhongqiang Huang, Jacob Devlin, Thomas Lamar, Rabih Zbib, Richard Schwartz,
and John Makhoul 11:2511:50
We present a three-pronged approach to improving Statistical Machine Translation SMT, building on recent
success in the application of neural networks to SMT. First, we propose new features that rely on neural net-
works to model various non-local translation phenomena. Second, we augment the architecture of our neural
network with tensor layers that capture important higher-order interaction between the network units. Third,
we apply multitask learning to estimate the neural network parameters jointly. Each of our proposed methods
results in significant improvements that are complementary The overall improvement is +2.7 and +1.8 BLEU
points for Arabic-English and Chinese-English translation over a state-of-the-art system that already includes
neural network features.
48
Session 1B: Language and Vision/NLP Applications

309A Chair: Joel Tetreault
Describing Images using Inferred Visual Dependency Representations
Desmond Elliott and Arjen De Vries 10:1010:35
The Visual Dependency Representation VDR is an explicit model of the spatial relationships between objects in
an image. In this paper we present an approach to training a VDR Parsing Model without the extensive human
supervision used in previous work. Our approach is to find the objects mentioned in a given description using a
state-of-the-art object detector, and to use successful detections to produce training data. The description of an
unseen image is produced by first predicting its VDR over automatically detected objects, and then generating
the text with a template-based generation model using the predicted VDR. The performance of our approach is
comparable to a state-of-the-art multimodal deep neural network in images depicting actions.
Text to 3D Scene Generation with Rich Lexical Grounding

Angel Chang, Will Monroe, Manolis Savva, Christopher Potts, and Christopher D. Manning 10:35
11:00
The ability to map descriptions of scenes to 3D geometric representations has many applications in areas
such as art, education, and robotics. However, prior work on the text to 3D scene generation task has used
manually specified object categories and language that identifies them. We introduce a dataset of 3D scenes
annotated with natural language descriptions and learn from this data how to ground textual descriptions to
physical objects. Our method successfully grounds a variety of lexical terms to concrete referents, and we
show quantitatively that our method improves 3D scene generation over previous work using purely rule-
based methods. We evaluate the fidelity and plausibility of 3D scenes generated with our grounding approach
through human judgments. To ease evaluation on this task, we also introduce an automated metric that strongly
correlates with human judgments.
MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of

Granularity
Wenpeng Yin and Hinrich Schtze 11:0011:25
We present MultiGranCNN, a general deep learning architecture for matching text chunks. MultiGranCNN
supports multigranular comparability of representations: shorter sequences in one chunk can be directly com-
pared to longer sequences in the other chunk. MultiGranCNN also contains a flexible and modularized match
feature component that is easily adaptable to different types of chunk matching. We demonstrate state-of-the-art
performance of MultiGranCNN on clause coherence and paraphrase identification tasks.
Weakly Supervised Models of Aspect-Sentiment for Online Course Discussion Forums

Arti Ramesh, Shachi H. Kumar, James Foulds, and Lise Getoor 11:2511:50
Massive open online courses MOOCs are redefining the education system and transcending boundaries posed
by traditional courses. With the increase in popularity of online courses, there is a corresponding increase in the
need to understand and interpret the communications of the course participants. Identifying topics or aspects of
conversation and inferring sentiment in online course forum posts can enable instructor interventions to meet
the needs of the students, rapidly address course-related issues, and increase student retention. Labeled aspect-
sentiment data for MOOCs are expensive to obtain and may not be transferable between courses, suggesting
the need for approaches that do not require labeled data. We develop a weakly supervised joint model for
aspect-sentiment in online courses, modeling the dependencies between various aspects and sentiment using a
recently developed scalable class of statistical relational models called hinge-loss Markov random fields. We
validate our models on posts sampled from twelve online courses, each containing an average of 10,000 posts,
and demonstrate that jointly modeling aspect with sentiment improves the prediction accuracy for both aspect
and sentiment.
49
Session 1
Session 1C: Semantics: Embeddings

310 Chair: Edward Grefenstette
1 [TACL] Improving Distributional Similarity with Lessons Learned from Word Embeddings
Omer Levy, Yoav Goldberg, and Ido Dagan 10:1010:35
Recent trends suggest that neural network-inspired word embedding models outperform traditional count-based
distributional models on word similarity and analogy detection tasks. We reveal that much of the performance
gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather
than the embedding algorithms themselves. Furthermore, we show that these modifications can be transferred
to traditional distributional models, yielding similar gains. In contrast to prior reports, we observe mostly local
or insignificant performance differences between the methods, with no global advantage to any single approach
over the others.
Semantically Smooth Knowledge Graph Embedding
Shu Guo, Quan Wang, Bin Wang, Lihong Wang, and Li Guo 10:3511:00
This paper considers the problem of embedding Knowledge Graphs KGs consisting of entities and relations
into low-dimensional vector spaces. Most of the existing methods perform this task based solely on observed
facts. The only requirement is that the learned embeddings should be compatible within each individual fact. In
this paper, aiming at further discovering the intrinsic geometric structure of the embedding space, we propose
Semantically Smooth Embedding SSE. The key idea of SSE is to take full advantage of additional semantic
information and enforce the embedding space to be semantically smooth, i.e., entities belonging to the same
semantic category will lie close to each other in the embedding space. Two manifold learning algorithms
Laplacian Eigenmaps and Locally Linear Embedding are used to model the smoothness assumption. Both
are formulated as geometrically based regularization terms to constrain the embedding task. We empirically
evaluate SSE in two benchmark tasks of link prediction and triple classification, and achieve significant and
consistent improvements over state-of-the-art methods. Furthermore, SSE is a general framework. The smooth-
ness assumption can be imposed to a wide variety of embedding models, and it can also be constructed using
other information besides entities semantic categories.
SensEmbed: Learning Sense Embeddings for Word and Relational Similarity

Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli 11:0011:25
Word embeddings have recently gained considerable popularity for modeling words in different Natural Lan-
guage Processing NLP tasks including semantic similarity measurement. However, notwithstanding their suc-
cess, word embeddings are by their very nature unable to capture polysemy, as different meanings of a word
are conflated into a single representation. In addition, their learning process usually relies on massive corpora
only, preventing them from taking advantage of structured knowledge. We address both issues by proposing
a multi-faceted approach that transforms word embeddings to the sense level and leverages knowledge from
a large semantic network for effective semantic similarity measurement. We evaluate our approach on word
similarity and relational similarity frameworks, reporting state-of-the-art performance on multiple datasets.
Revisiting Word Embedding for Contrasting Meaning

Zhigang Chen, Wei Lin, Qian Chen, Xiaoping Chen, Si Wei, Hui Jiang, and Xiaodan Zhu 11:2511:50
Contrasting meaning is a basic aspect of semantics. Recent word-embedding models based on distributional se-
mantics hypothesis are known to be weak for modeling lexical contrast. We present in this paper the embedding
models that achieve an F-score of 92
1
At the suggestion of the TACL co-editors-in-chief, the decision between talk and poster for TACL papers was made
using random selection, with the exception of two papers that specifically requested a poster presentation.
50
Session 1D: Machine Learning

311A Chair: Rui Xia
Joint Models of Disagreement and Stance in Online Debate
Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, and Marilyn Walker 10:1010:35
Online debate forums present a valuable opportunity for the understanding and modeling of dialogue. To
understand these debates, a key challenge is inferring the stances of the participants, all of which are interrelated
and dependent. While collectively modeling users stances has been shown to be effective Walker et al., 2012c;
Hasan and Ng, 2013, there are many modeling decisions whose ramifications are not well understood. To
investigate these choices and their effects, we introduce a scalable unified probabilistic modeling framework
for stance classification models that 1 are collective, 2 reason about disagreement, and 3 can model stance at
either the author level or at the post level. We comprehensively evaluate the possible modeling choices on
eight topics across two online debate corpora, finding accuracy improvements of up to 11.5 percentage points
over a local classifier. Our results highlight the importance of making the correct modeling choices for online
dialogues, and having a unified probabilistic modeling framework that makes this possible.
Low-Rank Regularization for Sparse Conjunctive Feature Spaces: An Application to Named

Entity Classification
Audi Primadhanty, Xavier Carreras, and Ariadna Quattoni 10:3511:00
Entity classification, like many other important problems in NLP, involves learning classifiers over sparse high-
dimensional feature spaces that result from the conjunction of elementary features of the entity mention and
its context. In this paper we develop a spectral regularization framework for training max-entropy models
in such sparse conjunctive feature spaces. Our approach handles conjunctive feature spaces using matrices
and induces an implicit low-dimensional representation via low-rank constraints. We show that when learning
entity classifiers under minimal supervision, using a seed set, our approach is more effective in controlling
model capacity than standard techniques for linear classifiers.
Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations

Fei Sun, Jiafeng Guo, Yanyan Lan, Jun Xu, and Xueqi Cheng 11:0011:25
Vector space representation of words has been widely used to capture fine-grained linguistic regularities, and
proven to be successful in various natural language processing tasks in recent years. However, existing models
for learning word representations focus on either syntagmatic or paradigmatic relations alone. In this paper, we
argue that it is beneficial to jointly modeling both relations so that we can not only encode different types of
linguistic properties in a unified way, but also boost the representation learning due to the mutual enhancement
between these two types of relations. We propose two novel distributional models for word representation
using both syntagmatic and paradigmatic relations via a joint training objective. The proposed models are
trained on a public Wikipedia corpus, and the learned representations are evaluated on word analogy and word
similarity tasks. The results demonstrate that the proposed models can perform significantly better than all the
state-of-the-art baseline methods on both tasks.
Learning Dynamic Feature Selection for Fast Sequential Prediction
Emma Strubell, Luke Vilnis, Kate Silverstein, and Andrew McCallum 11:2511:50
We present paired learning and inference algorithms for significantly reducing computation and increasing
speed of the vector dot products in the classifiers that are at the heart of many NLP components. This is accom-
plished by partitioning the features into a sequence of templates which are ordered such that high confidence
can often be reached using only a small fraction of all features. Parameter estimation is arranged to maximize
accuracy and early confidence in this sequence. Our approach is simpler and better suited to NLP than other
related cascade methods. We present experiments in left-to-right part-of-speech tagging, named entity recog-
nition, and transition-based dependency parsing. On the typical benchmarking datasets we can preserve POS
tagging accuracy above 97
51
Session 1
Session 1E: Information Extraction 1

311B Chair: Heng Ji
Compositional Vector Space Models for Knowledge Base Completion
Arvind Neelakantan, Benjamin Roth, and Andrew McCallum 10:1010:35
Knowledge base KB completion adds new facts to a KB by making inferences from existing facts, for example
by inferring with high likelihood nationalityX,Y from bornInX,Y. Most previous methods infer simple one-
hop relational synonyms like this, or use as evidence a multi-hop relational path treated as an atomic feature,
like bornInX,Z containedInZ,Y. This paper presents an approach that reasons about conjunctions of multi-hop
relations non-atomically, composing the implications of a path using a recursive neural network RNN that takes
as inputs vector embeddings of the binary relation in the path. Not only does this allow us to generalize to paths
unseen at training time, but also, with a single high-capacity RNN, to predict new relation types not seen when
the compositional model was trained zero-shot learning. We assemble a new dataset of over 52M relational
triples, and show that our method improves over a traditional classifier by 11
Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks

Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao 10:3511:00
Traditional approaches to the task of ACE event extraction primarily rely on elaborately designed features
and complicated natural language processing NLP tools. These traditional approaches lack generalization,
take a large amount of human effort and are prone to error propagation and data sparsity problems. This
paper proposes a novel event-extraction method, which aims to automatically extract lexical-level and sentence-
level features without using complicated NLP tools. We introduce a word-representation model to capture
meaningful semantic regularities for words and adopt a framework based on a convolutional neural network
CNN to capture sentence-level clues. However, CNN can only capture the most important information in
a sentence and may miss valuable facts when considering multiple-event sentences. We propose a dynamic
multi-pooling convolutional neural network DMCNN, which uses a dynamic multi-pooling layer according
to event triggers and arguments, to reserve more crucial information. The experimental results show that our
approach significantly outperforms other state-of-the-art methods.
Stacked Ensembles of Information Extractors for Knowledge-Base Population

Vidhoon Viswanathan, Nazneen Fatema Rajani, Yinon Bentor, and Raymond Mooney 11:0011:25
We present results on using stacking to ensemble multiple systems for the Knowledge Base Population English
Slot Filling KBP-ESF task. In addition to using the output and confidence of each system as input to the stacked
classifier, we also use features capturing how well the systems agree about the provenance of the information
they extract. We demonstrate that our stacking approach outperforms the best system from the 2014 KBP-ESF
competition as well as alternative ensembling methods employed in the 2014 KBP Slot Filler Validation task
and several other ensembling baselines. Additionally, we demonstrate that including provenance information
further increases the performance of stacking.
Generative Event Schema Induction with Entity Disambiguation

Kiem-Hieu Nguyen, Xavier Tannier, Olivier Ferret, and Romaric Besanon 11:2511:50
This paper presents a generative model to event schema induction. Previous methods in the literature only use
head words to represent entities. However, elements other than head words contain useful information. For
instance, an armed man is more discriminative than man. Our model takes into account this information and
precisely represents it using probabilistic topic distributions. We illustrate that such information plays an impor-
tant role in parameter estimation. Mostly, it makes topic distributions more coherent and more discriminative.
Experimental results on benchmark dataset empirically confirm this enhancement.
52
Student Lunch
Monday, July 27, 2015, 11:5013:20
Function A+B
Join your fellow students for a students-only lunch on Monday, July 27 at 11:50 in the Function A+B
at the CNCC. This is a chance to get to know others who share similar interests and goals and who
may become your lifelong colleagues.
53
Session 2

Machine Trans- Question An- Semantics: Dis- Parsing: Neural Information
lation swering tributional Ap- Networks Extraction 2
proaches
309A 309B 310 311A 311B
Syntax-based Learning Hubness and Neural CRF Leveraging Lin-
13:20
Simultaneous Answer- Pollution: Delv- Parsing guistic Struc-

Translation Entailing Struc- ing into Cross- Durrett and ture For Open
through Predic- tures for Ma- Space Mapping Klein Domain Infor-
tion of Unseen chine Compre- for Zero-Shot mation Extrac-
Syntactic Con- hension Learning tion
stituents Sachan, Dubey, Lazaridou, Angeli, Premku-
Oda, Neubig, Xing, and Dinu, and Ba- mar, and Man-
Sakti, Toda, and Richardson roni ning
Nakamura
Efficient Top- Learning Con- [TACL] Learn- An Effective Joint Informa-
13:45
Down BTG tinuous Word ing a Composi- Neural Net- tion Extraction
Parsing for Ma- Embedding tional Seman- work Model and Reason-
chine Transla- with Metada- tics for Free- for Graph-based ing: A Scalable
tion Preordering ta for Ques- base with an Dependency Statistical Rela-
Nakagawa tion Retrieval Open Predicate Parsing tional Learning
in Community Vocabulary Pei, Ge, and Approach
Question An- Krishnamurthy Chang Wang and Co-
swering and Mitchell hen
Zhou, He, Zhao,
and Hu
Online Multi- Question An- A Generalisa- Structured A Knowledge-
14:10
task Learning swering over tion of Lexical Training for Intensive Model
for Machine Freebase with Functions for Neural Network for Preposition-
Translation Multi-Column Composition in Transition- al Phrase At-
Quality Esti- Convolution- Distributional Based Parsing tachment
mation al Neural Net- Semantics Weiss, Alberti, Nakashole and
De Souza, Ne- works Bride, Cruys, Collins, and Mitchell
gri, Ricci, and Dong, Wei, and Asher Petrov
Turchi Zhou, and Xu
A Context- [TACL] Higher- Simple Learn- Transition- A Convolution
14:35
Aware Topic order Lexical ing and Com- Based Depen- Kernel Ap-
Model for Sta- Semantic Mod- positional Ap- dency Pars- proach to Iden-
tistical Machine els for Non- plication of ing with Stack tifying Compar-
Translation factoid Answer Perceptually Long Short- isons in Text
Su, Xiong, Liu, Reranking Grounded Word Term Memory Tkachenko and
Han, Lin, Yao, Fried, Jansen, Meanings for Dyer, Balles- Lauw
and Zhang Hahn-Powell, Incremental teros, Ling,
Surdeanu, and Reference Res- Matthews, and
Clark olution Smith
Kennington and
Schlangen
54
Parallel Session 2
Session 2A: Machine Translation

309A Chair: Yves Lepage
Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents
Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura 13:2013:45
Simultaneous translation is a method to reduce the latency of communication through machine translation MT
by dividing the input into short segments before performing translation. However, short segments pose prob-
lems for syntax-based translation methods, as it is difficult to generate accurate parse trees for sub-sentential
segments. In this paper, we perform the first experiments applying syntax-based SMT to simultaneous trans-
lation, and propose two methods to prevent degradations in accuracy: a method to predict unseen syntactic
constituents that help form a complete parse tree, and a method that waits for more input when the current
utterance is not enough to generate a fluent translation. Experiments on English-Japanese translation show that
the proposed methods allow for improvements in accuracy, particularly with regards to word order of the target
sentences.
Efficient Top-Down BTG Parsing for Machine Translation Preordering
Tetsuji Nakagawa 13:4514:10
We present an efficient incremental top-down parsing method for preordering based on Bracketing Transduction
Grammar BTG. The BTG-based preordering framework Neubig et al., 2012 can be applied to any language
using only parallel text, but has the problem of computational efficiency. Our top-down parsing algorithm
allows us to use the early update technique easily for the latent variable structured Perceptron algorithm with
beam search, and solves the problem.Experimental results showed that the top-down method is more than 10
times faster than a method using the CYK algorithm. A phrase-based machine translation system with the
top-down method had statistically significantly higher BLEU scores for 7 language pairs without relying on
supervised syntactic parsers, compared to baseline systems using existing preordering methods.
Online Multitask Learning for Machine Translation Quality Estimation

Jos G. C. De Souza, Matteo Negri, Elisa Ricci, and Marco Turchi 14:1014:35
We present a method for predicting machine translation output quality geared to the needs of computer-assisted
translation. These include the capability to: i continuously learn and self-adapt to a stream of data coming
from multiple translation jobs, ii react to data diversity by exploiting human feedback, and iii leverage data
similarity by learning and transferring knowledge across domains. To achieve these goals, we combine two
supervised machine learning paradigms, online and multitask learning, adapting and unifying them in a single
framework. We show the effectiveness of our approach in a regression task HTER prediction, in which online
multitask learning outperforms the competitive online single-task and pooling methods used for comparison.
This indicates the feasibility of integrating in a CAT tool a single QE component capable to simultaneously
serve and continuously learn from multiple translation jobs involving different domains and users.
A Context-Aware Topic Model for Statistical Machine Translation

Jinsong Su, Deyi Xiong, Yang Liu, Xianpei Han, Hongyu Lin, Junfeng Yao, and Min Zhang 14:35
15:00
Lexical selection is crucial for statistical machine translation. Previous studies separately exploit sentence-level
contexts and documentlevel topics for lexical selection, neglecting their correlations. In this paper, we propose
a context-aware topic model for lexical selection, which not only models local contexts and global topics but
also captures their correlations. The model uses target-side translations as hidden variables to connect docu-
ment topics and source-side local contextual words. In order to learn hidden variables and distributions from
data, we introduce a Gibbs sampling algorithm for statistical estimation and inference. A new translation prob-
ability based on distributions learned by the model is integrated into a translation system for lexical selection.
Experiment results on NIST Chinese-English test sets demonstrate that 1 our model significantly outperforms
previous lexical selection methods and 2 modeling correlations between local words and global topics can
further improve translation quality.
55
Session 2
Session 2B: Question Answering

309B Chair: Alessandro Moschitti
Learning Answer-Entailing Structures for Machine Comprehension
Mrinmaya Sachan, Kumar Dubey, Eric Xing, and Matthew Richardson 13:2013:45
Understanding open-domain text is one of the primary challenges in NLP. Machine comprehension evaluates
the systems ability to understand text through a series of question-answering tasks on short pieces of text such
that the correct answer can be found only in the given text. For this task, we posit that there is a hidden latent
structure that explains the relation between the question, correct answer, and text. We call this the answer-
entailing structure; given the structure, the correctness of the answer is evident. Since the structure is latent,
it must be inferred. We present a unified max-margin framework that learns to find these hidden structures
given a corpus of question-answer pairs, and uses what it learns to answer machine comprehension questions
on novel texts. We extend this framework to incorporate multi-task learning on the different sub-tasks that
are required to perform machine comprehension. Evaluation on a publicly available dataset shows that our
framework outperforms various IR and neural-network baselines, achieving an overall accuracy of 67.8% vs.
59.9%, the best previously-published result.
Learning Continuous Word Embedding with Metadata for Question Retrieval in Community
Question Answering
Guangyou Zhou, Tingting He, Jun Zhao, and Po Hu 13:4514:10
Community question answering cQA has become an important issue due to the popularity of cQA archives on
the web. This paper is concerned with the problem of question retrieval. Question retrieval in cQA archives
aims to find the existing questions that are semantically equivalent or relevant to the queried questions. How-
ever, the lexical gap problem brings about new challenge for question retrieval in cQA. In this paper, we propose
to learn continuous word embeddings with metadata of category information within cQA pages for question
retrieval. To deal with the variable size of word embedding vectors, we employ the framework of fisher kernel
to aggregated them into the fixed-length vectors. Experimental results on large-scale real world cQA data set
show that our approach can significantly outperform state-of-the-art translation models and topic-based models
for question retrieval in cQA.
Question Answering over Freebase with Multi-Column Convolutional Neural Networks

Li Dong, Furu Wei, Ming Zhou, and Ke Xu 14:1014:35
Answering natural language questions over a knowledge base is an important and challenging task. Most of
existing systems typically rely on hand-crafted features and rules to conduct question understanding and/or an-
swer ranking. In this paper, we introduce multi-column convolutional neural networks MCCNNs to understand
questions from three different aspects namely, answer path, answer context, and answer type and learn their
distributed representations. Meanwhile, we jointly learn low-dimensional embeddings of entities and relations
in the knowledge base. Question-answer pairs are used to train the model to rank candidate answers. We also
leverage question paraphrases to train the column networks in a multi-task learning manner. We use Freebase
as the knowledge base and conduct extensive experiments on the WebQuestions dataset. Experimental results
show that our method achieves better or comparable performance compared with baseline systems. In addition,
we develop a method to compute the salience scores of question words in different column networks. The
results help us intuitively understand what MCCNNs learn.
[TACL] Higher-order Lexical Semantic Models for Non-factoid Answer Reranking

Daniel Fried, Peter Jansen, Gustave Hahn-Powell, Mihai Surdeanu, and Peter Clark 14:3515:00
Lexical semantic models provide robust performance for question answering, but, in general, can only capitalize
on direct evidence seen during training. For example, monolingual alignment models acquire term alignment
probabilities from semi-structured data such as question-answer pairs; neural network language models learn
term embeddings from unstructured text. All this knowledge is then used to estimate the semantic similarity
between question and answer candidates. We introduce a higher-order formalism that allows all these lexical
semantic models to chain direct evidence to construct indirect associations between question and answer texts,
by casting the task as the traversal of graphs that encode direct term associations. Using a corpus of 10,000
questions from Yahoo! Answers, we experimentally demonstrate that higher-order methods are broadly ap-
plicable to alignment and language models, across both word and syntactic representations. We show that an
important criterion for success is controlling for the semantic drift that accumulates during graph traversal. All
in all, the proposed higher-order approach improves five out of the six lexical semantic models investigated,
with relative gains of up to +13% over their first-order variants.
56
Session 2C: Semantics: Distributional Approaches

310 Chair: Jing-Shin Chang
Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning
Angeliki Lazaridou, Georgiana Dinu, and Marco Baroni 13:2013:45
Zero-shot methods in language, vision and other domains rely on a cross-space mapping function that projects
vectors from the relevant feature space e.g., visual- feature-based image representations to a large semantic
word space induced in an unsupervised way from corpus data, where the entities of interest e.g., objects im-
ages depict are labeled with the words associated to the nearest neighbours of the mapped vectors. Zero-shot
cross-space mapping methods hold great promise as a way to scale up annotation tasks well beyond the labels
in the training data e.g., recognizing objects that were never seen in training. However, the current performance
of cross-space mapping functions is still quite low, so that the strategy is not yet usable in practical applica-
tions. In this paper, we explore some general properties, both theoretical and empirical, of the cross-space
mapping function, and we build on them to propose better methods to estimate it. In this way, we attain large
improvements over the state of the art, both in cross-linguistic word translation and cross-modal image labeling
zero-shot experiments.
[TACL] Learning a Compositional Semantics for Freebase with an Open Predicate Vocabulary
Jayant Krishnamurthy and Tom M. Mitchell 13:4514:10
We present an approach to learning a model-theoretic semantics for natural language tied to Freebase. Cru-
cially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as
"Republican front-runner from Texas" whose semantics cannot be represented using the Freebase schema. Our
approach directly converts a sentences syntactic CCG parse into a logical form containing predicates derived
from the words in the sentence, assigning each word a consistent semantics across sentences. This logical
form is evaluated against a probabilistic database containing a learned denotation for each textual predicate. A
training phase produces this probabilistic database using a corpus of entity-linked text and probabilistic ma-
trix factorization. We evaluate our approach on a compositional question answering task where it outperforms
several competitive baselines. We also compare our approach against manually annotated Freebase queries,
finding that our open predicate vocabulary enables us to answer many questions that Freebase cannot.
A Generalisation of Lexical Functions for Composition in Distributional Semantics

Antoine Bride, Tim Van de Cruys, and Nicholas Asher 14:1014:35
Over the last two decades, numerous algorithms have been developed that successfully capture something
of the semantics of single words by looking at their distribution in text and comparing these distributions in a
vector space model. However, it is not straightforward to construct meaning representations beyond the level of
individual words i.e. the combination of words into larger units using distributional methods. Our contribution
is twofold. First of all, we carry out a large scale evaluation, comparing different composition methods within
the distributional framework for the cases of both adjective noun and noun-noun composition, making use of
a newly developed dataset. Secondly, we propose a novel method for composition, which is a generalization
of lexical functions. The performance of our novel method is also evaluated on our new dataset and proves
competitive with the best methods
Simple Learning and Compositional Application of Perceptually Grounded Word Meanings

for Incremental Reference Resolution
Casey Kennington and David Schlangen 14:3515:00
An elementary way of using language is to refer to objects. Often, these objects are physically present in
the shared environment and reference is done via mention of perceivable properties of the objects. This is
a type of language use that is modelled well neither by logical semantics nor by distributional semantics,
the former focusing on inferential relations between expressed propositions, the latter on similarity relations
between words or phrases. We present an account of word and phrase meaning that is perceptually grounded,
trainable, compositional, and dialogue-plausible in that it computes meanings word-by-word. We show that
the approach performs well with an accuracy of 65
57
Session 2
Session 2D: Parsing: Neural Networks

311A Chair: Yusuke Miyao
Neural CRF Parsing
Greg Durrett and Dan Klein 13:2013:45
This paper describes a parsing model that combines the exact dynamic programming of CRF parsing with
the rich nonlinear featurization of neural net approaches. Our model is structurally a CRF that factors over
anchored rule productions, but instead of linear potential functions based on sparse features, we use nonlinear
potentials computed via a feedforward neural network. Because potentials are still local to anchored rules,
structured inference CKY is unchanged from the sparse case. Computing gradients during learning involves
backpropagating an error signal formed from standard CRF sufficient statistics expected rule counts. Using only
dense features, our neural CRF already exceeds a strong baseline CRF model Hall et al., 2014. In combination
with sparse features, our system achieves 91.1 F1 on section 23 of the Penn Treebank, and more generally
outperforms the best prior single parser results on a range of languages.
An Effective Neural Network Model for Graph-based Dependency Parsing

Wenzhe Pei, Tao Ge, and Baobao Chang 13:4514:10
Most existing graph-based parsing models rely on millions of hand-crafted features, which limits their gener-
alization ability and slows down the parsing speed. In this paper, we propose a general and effective Neural
Network model for graph-based dependency parsing. Our model can automatically learn high-order feature
combinations using only atomic features by exploiting a novel activation function tanh-cube. Moreover, we
propose a simple yet effective way to utilize phrase-level information that is expensive to use in conventional
graph-based parsers. Experiments on the English Penn Treebank show that parsers based on our model perform
better than conventional graph-based parsers.
Structured Training for Neural Network Transition-Based Parsing

David Weiss, Chris Alberti, Michael Collins, and Slav Petrov 14:1014:35
We present structured perceptron training for neural network transition-based dependency parsing. We learn
the neural network representation using a gold corpus augmented by a large number of automatically parsed
sentences. Given this fixed network representation, we learn a final layer using the structured perceptron with
beam-search decoding. On the Penn Treebank, our parser reaches 94.26% unlabeled and 92.41% labeled
attachment accuracy, which to our knowledge is the best accuracy on Stanford Dependencies to date. We
also provide in-depth ablative analysis to determine which aspects of our model provide the largest gains in
accuracy.
Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith 14:3515:00
We propose a technique for learning representations of parser states in transition-based dependency parsers.
Our primary innovation is a new control structure for sequence-to-sequence neural networksthe stack LSTM.
Like the conventional stack data structures used in transition-based parsing, elements can be pushed to or
popped from the top of the stack in constant time, but, in addition, an LSTM maintains a continuous space
embedding of the stack contents. This lets us formulate an efficient parsing model that captures three facets of
a parsers state: i unbounded look-ahead into the buffer of incoming words, ii the complete history of actions
taken by the parser, and iii the complete contents of the stack of partially built tree fragments, including their
internal structures. Standard backpropagation techniques are used for training and yield state-of-the-art parsing
performance.
58
Session 2E: Information Extraction 2

311B Chair: Doug Downey
Leveraging Linguistic Structure For Open Domain Information Extraction
Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning 13:2013:45
Relation triples produced by open domain information extraction open IE systems are useful for question an-
swering, inference, and other IE tasks. Traditionally these are extracted using a large set of patterns; however,
this approach is brittle on out-of-domain text and long-range dependencies, and gives no insight into the sub-
structure of the arguments. We replace this large pattern set with a few patterns for canonically structured sen-
tences, and shift the focus to a classifier which learns to extract self-contained clauses from longer sentences.
We then run natural logic inference over these short clauses to determine the maximally specific arguments
for each candidate triple. We show that our approach outperforms a state-of-the-art open IE system on the
end-to-end TAC-KBP 2013 Slot Filling task.
Joint Information Extraction and Reasoning: A Scalable Statistical Relational Learning Ap-
proach
William Yang Wang and William W. Cohen 13:4514:10
A standard pipeline for statistical relational learning involves two steps: one first constructs the knowledge base
KB from text, and then performs the learning and reasoning tasks using probabilistic first-order logics. How-
ever, a key issue is that information extraction IE errors from text affect the quality of the KB, and propagate to
the reasoning task. In this paper, we propose a statistical relational learning model for joint information extrac-
tion and reasoning. More specifically, we incorporate context-based entity extraction with structure learning
SL in a scalable probabilistic logic framework. We then propose a latent context invention LCI approach to
improve the performance. In experiments, we show that our approach outperforms state-of-the-art baselines
over three real-world Wikipedia datasets from multiple domains; that joint learning and inference for IE and
SL significantly improve both tasks; that latent context invention further improves the results.
A Knowledge-Intensive Model for Prepositional Phrase Attachment

Ndapandula Nakashole and Tom M. Mitchell 14:1014:35
Prepositional phrases PPs express crucial information that knowledge base construction methods need to ex-
tract. However, PPs are a major source of syntactic ambiguity and still pose problems in parsing. We present a
method for resolving ambiguities arising from PPs, making extensive use of semantic knowledge from various
resources. As training data, we use both labeled and unlabeled data, utilizing an expectation maximization
algorithm for parameter estimation. Experiments show that our method yields improvements over existing
methods including a state of the art dependency parser
A Convolution Kernel Approach to Identifying Comparisons in Text

Maksim Tkachenko and Hady Lauw 14:3515:00
Comparisons in text, such as in online reviews, serve as useful decision aids. In this paper, we focus on the
task of identifying whether a comparison exists between a specific pair of entity mentions in a sentence. This
formulation is transformative, as previous work only seeks to determine whether a sentence is comparative,
which is presumptuous in the event the sentence mentions multiple entities and is comparing only some, not
all, of them. Our approach leverages not only lexical features such as salient words, but also structural features
expressing the relationships among words and entity mentions. To model these features seamlessly, we rely on
a dependency tree representation, and investigate the applicability of a series of tree kernels. This leads to the
development of a new context-sensitive tree kernel: Skip-node Kernel SNK. We further describe both its exact
and approximate computations. Through experiments on real-life datasets, we evaluate the effectiveness of our
kernel-based approach for comparison identification, as well as the utility of SNK and its approximations.
59
Session 3

Language Re- Sentiment Anal- Natural Lan- Spoken Lan- Information
sources ysis: Cross- guage Genera- guage Process- Extraction 3/In-
/Multi Lingual tion ing and Under- formation Re-
standing trieval
309A 309B 310 311A 311B
[TACL] A New Aligning Opin- Content Mod- New Transfer S-MART: Nov-
15:30
Corpus and Im- ions: Cross- els for Survey Learning Tech- el Tree-based
itation Learn- Lingual Opin- Generation: A niques for Dis- Structured
ing Framework ion Mining with Factoid-Based parate Label Learning Algo-
for Context- Dependencies Evaluation Sets rithms Applied
Dependent Se- Almeida, Pin- Jha, Finegan- Kim, Stratos, to Tweet Entity
mantic Parsing to, Figueira, Dollak, King, Sarikaya, and Linking
Vlachos and Mendes, and Coke, and Jeong Yang and
Clark Martins Radev Chang
It Depends: De- Learning to Training a Nat- Matrix Factor- [TACL] Design
15:55
pendency Parser Adapt Cred- ural Language ization with Challenges for
Comparison ible Knowl- Generator From Knowledge Entity Linking
Using A Web- edge in Cross- Unaligned Data Graph Propa- Ling, Singh,
based Evalua- lingual Senti- Duek and Jur- gation for Un- and Weld
tion Tool ment Analysis cicek supervised Spo-
Choi, Tetreault, Chen, Li, Lei, ken Language
and Stent Liu, and He Understanding
Chen, Wang,
Gershman, and
Rudnicky
Generating Learning Bilin- Event-Driven Efficient Disflu- Entity Retrieval
16:20
High Quali- gual Sentiment Headline Gen- ency Detection via Entity Fac-
ty Proposition Word Embed- eration with Transition- toid Hierarchy
Banks for Mul- dings for Cross- Sun, Zhang, based Parsing Lu, Lam, and
tilingual Se- language Senti- Zhang, and Ji Wu, Zhang, Liao
mantic Role ment Classifica- Zhou, and Zhao
Labeling tion
Akbik, Zhou, Chen,
Chiticariu, Shi, and Huang
Danilevsky, Li,
Vaithyanathan,
and Zhu
60
Parallel Session 3
Session 3A: Language Resources

309A Chair: Key-Sun Choi
[TACL] A New Corpus and Imitation Learning Framework for Context-Dependent Semantic
Parsing
Andreas Vlachos and Stephen Clark 15:3015:55
Semantic parsing is the task of translating natural language utterances into a machine-interpretable meaning
representation. Most approaches to this task have been evaluated on a small number of existing corpora which
assume that all utterances must be interpreted according to a database and typically ignore context. In this
paper we present a new, publicly available corpus for context-dependent semantic parsing. The MRL used
for the annotation was designed to support a portable, interactive tourist information system. We develop
a semantic parser for this corpus by adapting the imitation learning algorithm DAGGER without requiring
alignment information during training. DAGGER improves upon independently trained classifiers by 9.0 and
4.8 points in F-score on the development and test sets respectively.
It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool

Jinho D. Choi, Joel Tetreault, and Amanda Stent 15:5516:20
The last few years have seen a surge in the number of accurate, fast, publicly available dependency parsers.
At the same time, the use of dependency parsing in NLP applications has increased. It can be difficult for a
non-expert to select a good off-the-shelf parser. We present a comparative analysis of nine leading statistical
dependency parsers on a multi-genre corpus of English. For our analysis, we developed a new web-based
tool that gives a convenient way of comparing dependency parser outputs. Our analysis will help practitioners
choose a parser to optimize their desired speed/accuracy tradeoff, and our tool will help practitioners examine
and compare parser output.
Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling
Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan, and
Huaiyu Zhu 16:2016:45
Semantic role labeling SRL is crucial to natural language understanding as it identifies the predicate-argument
structure in text with semantic labels. Unfortunately, resources required to construct SRL models are expensive
to obtain and simply do not exist for most languages. In this paper, we present a two-stage method to enable
the construction of SRL models for resource-poor languages by exploiting monolingual SRL and multilingual
parallel data. Experimental results show that our method outperforms existing methods. We use our method
to generate Proposition Banks with high to reasonable quality for 7 languages in three language families and
release these resources to the research community.
61
Session 3
Session 3B: Sentiment Analysis: Cross-/Multi Lingual

309B Chair: Jing Jiang
Aligning Opinions: Cross-Lingual Opinion Mining with Dependencies
Mariana S. C. Almeida, Claudia Pinto, Helena Figueira, Pedro Mendes, and Andr F. T. Martins
15:3015:55
We propose a cross-lingual framework for fine-grained opinion mining using bitext projection. The only re-
quirements are a running system in a source language and word-aligned parallel data. Our method projects
opinion frames from the source to the target language, and then trains a system on the target language using
the automatic annotations. Key to our approach is a novel dependency-based model for opinion mining, which
we show, as a byproduct, to be on par with the current state of the art for English, while avoiding the need
for integer programming or reranking. In cross-lingual mode English to Portuguese, our approach compares
favorably to a supervised system with scarce labeled data, and to a delexicalized model trained using universal
tags and bilingual word embeddings.
Learning to Adapt Credible Knowledge in Cross-lingual Sentiment Analysis

Qiang Chen, Wenjie Li, Yu Lei, Xule Liu, and Yanxiang He 15:5516:20
Cross-lingual sentiment analysis is a task of identifying sentiment polarities of texts in a low-resource language
by using sentiment knowledge in a resource-abundant language. While most existing approaches are driven by
transfer learning, their performance does not reach to a promising level due to the transferred errors. In this
paper, we propose to integrate into knowledge transfer a knowledge validation model, which aims to prevent
the negative influence from the wrong knowledge by distinguishing highly credible knowledge. Experiment
results demonstrate the necessity and effectiveness of the model.
Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification

Huiwei Zhou, Long Chen, Fulin Shi, and Degen Huang 16:2016:45
The sentiment classification performance relies on high-quality sentiment resources. However, these resources
are imbalanced in different languages. Cross-language sentiment classification CLSC can leverage the rich
resources in one language source language for sentiment classification in a resource-scarce language target
language. Bilingual embeddings could eliminate the semantic gap between two languages for CLSC, but
ignore the sentiment information of text. This paper proposes an approach to learning bilingual sentiment word
embeddings BSWE for English-Chinese CLSC. The proposed BSWE incorporate sentiment information of
text into bilingual embeddings. Furthermore, we can learn high-quality BSWE by simply employing labeled
corpora and their translations, without relying on large-scale parallel corpora. Experiments on NLPCC 2013
CLSC dataset show that our approach outperforms the state-of-the-art systems.
62
Session 3C: Natural Language Generation

310 Chair: Jackie Chi Kit Cheung
Content Models for Survey Generation: A Factoid-Based Evaluation
Rahul Jha, Catherine Finegan-Dollak, Ben King, Reed Coke, and Dragomir Radev 15:3015:55
We present a new factoid-annotated dataset for evaluating content models for scientific survey article generation
containing 3,425 sentences from 7 topics in natural language processing. We also introduce a novel HITS-based
content model for automated survey article generation called HitSum that exploits the lexical network structure
between sentences from citing and cited papers. Using the factoid-annotated data, we conduct a pyramid
evaluation and compare HitSum with two previous state-of-the-art content models: C-Lexrank, a network
based content model, and TopicSum, a Bayesian content model. Our experiments show that our new content
model captures useful survey-worthy information and outperforms C-Lexrank by 4
Training a Natural Language Generator From Unaligned Data

Ondrej Duek and Filip Jurcicek 15:5516:20
We present a novel syntax-based natural language generation system that is trainable from unaligned pairs of
input meaning representations and output sentences. It is divided into sentence planning, which incrementally
builds deep-syntactic dependency trees, and surface realization. Sentence planner is based on A* search with
a perceptron ranker that uses novel differing subtree updates and a simple future promise estimation; surface
realization uses a rule-based pipeline from the Treex NLP toolkit.Our first results show that training from
unaligned data is feasible, the outputs of our generator are mostly fluent and relevant.
Event-Driven Headline Generation

Rui Sun, Yue Zhang, Meishan Zhang, and Donghong Ji 16:2016:45
We propose an event-driven model for headline generation. Given an input document, the system identifies
a key event chain by extracting a set of structural events that describe them. Then a novel multi-sentence
compression algorithm is used to fuse the extracted events, generating a headline for the document. Our
model can be viewed as a novel combination of extractive and abstractive headline generation, combining the
advantages of both methods using event structures. Standard evaluation shows that our model achieves the best
performance compared with previous state-of-the-art systems.
63
Session 3
Session 3D: Spoken Language Processing and Understanding

311A Chair: David Schlangen
New Transfer Learning Techniques for Disparate Label Sets
Young-Bum Kim, Karl Stratos, Ruhi Sarikaya, and Minwoo Jeong 15:3015:55
In natural language understanding NLU, a user utterance can be labeled differently depending on the domain
or application e.g., weather vs. calendar. Standard domain adaptation techniques are not directly applicable
to take advantage of the existing annotations because they assume that the label set is invariant. We propose a
solution based on label embeddings induced from canonical correlation analysis CCA that reduces the problem
to a standard domain adaptation task and allows use of a number of transfer learning techniques. We also
introduce a new transfer learning technique based on pretraining of hidden-unit CRFs HUCRFs. We perform
extensive experiments on slot tagging on eight personal digital assistant domains and demonstrate that the
proposed methods are superior to strong baselines.
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language
Understanding
Yun-Nung Chen, William Yang Wang, Anatole Gershman, and Alexander Rudnicky 15:5516:20
Spoken dialogue systems SDS typically require a predefined semantic ontology to train a spoken language
understanding SLU module. In addition to the annotation cost, a key challenge for designing such an ontology
is to define a coherent slot set while considering their complex relations. This paper introduces a novel matrix
factorization MF approach to learn latent feature vectors for utterances and semantic elements without the
need of corpus annotations. Specifically, our model learns the semantic slots for a domain-specific SDS in an
unsupervised fashion, and carries out semantic parsing using latent MF techniques. To further consider the
global semantic structure, such as inter-word and inter-slot relations, we augment the latent MF-based model
with a knowledge graph propagation model based on a slot-based semantic graph and a word-based lexical
graph. Our experiments show that the proposed MF approaches produce better SLU models that are able to
predict semantic slots and word patterns taking into account their relations and domain-specificity in a joint
manner.
Efficient Disfluency Detection with Transition-based Parsing
Shuangzhi Wu, Dongdong Zhang, Ming Zhou, and Tiejun Zhao 16:2016:45
Automatic speech recognition ASR outputs often contain various disfluencies. It is necessary to remove these
disfluencies before processing downstream tasks. In this paper, an efficient disfluency detection approach based
on right-to-left transition-based parsing is proposed, which can efficiently identify disfluencies and keep ASR
outputs grammatical. Our method exploits a global view to capture long-range dependencies for disfluency
detection by integrating a rich set of syntactic and disfluency features with linear complexity. The experimental
results show that our method outperforms state-of-the-art work and achieves a 85.1% f-score on the commonly
used English Switchboard test set. We also apply our method to in-house annotated Chinese data and achieve
a significantly higher f-score compared to the baseline of CRF-based approach.
64
Session 3E: Information Extraction 3/Information Retrieval

311B Chair: Jun Zhao
S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking
Yi Yang and Ming-Wei Chang 15:3015:55
Non-linear models recently receive a lot of attention as people are starting to discover the power of statistical
and embedding features. However, tree-based models are seldom studied in the context of structured learning
despite their recent success on various classification and ranking tasks. In this paper, we propose S-MART, a
tree-based structured learning framework based on multiple additive regression trees. S-MART is especially
suitable for handling tasks with dense features, and can be used to learn many different structures under various
loss functions.We apply S-MART to the task of tweet entity linking a core component of tweet information
extraction, which aims to identify and link name mentions to entities in a knowledge base. A novel inference
algorithm is proposed to handle the special structure of the task. The experimental results show that S-MART
significantly outperforms state-of-the-art tweet entity linking systems.
[TACL] Design Challenges for Entity Linking

Xiao Ling, Sameer Singh, and Daniel S. Weld 15:5516:20
Recent research on entity linking (EL) has introduced a plethora of promising techniques, ranging from deep
neural networks to joint inference. But despite numerous papers there is surprisingly little understanding of
the state of the art in EL. We attack this confusion by analyzing differences between several versions of the EL
problem and presenting a simple yet effective, modular, unsupervised system, called VINCULUM , for entity
linking. We conduct an extensive evaluation on nine data sets, comparing VINCULUM with two state-of-
the-art systems, and elucidate key aspects of the system that include mention extraction, candidate generation,
entity type prediction, entity coreference, and coherence.
Entity Retrieval via Entity Factoid Hierarchy

Chunliang Lu, Wai Lam, and Yi Liao 16:2016:45
We propose that entity queries are generated via a two-step process: users first select entity facts that can
distinguish target entities from the others; and then choose words to describe each selected fact. Based on this
query generation paradigm, we propose a new entity representation model named as entity factoid hierarchy.
An entity factoid hierarchy is a tree structure composed of factoid nodes. A factoid node describes one or more
facts about the entity in different information granularities. The entity factoid hierarchy is constructed via a
factor graph model, and the inference on the factor graph is achieved by a modified variant of Multiple-try
Metropolis algorithm. Entity retrieval is performed by decomposing entity queries and computing the query
likelihood on the entity factoid hierarchy. Using an array of benchmark datasets, we demonstrate that our
proposed framework significantly improves the retrieval performance over existing models.
65
Session 4

Semantics Sentiment Anal- Summarization Discourse, Language and
ysis and Generation Coreference Vision
309A 309B 310 311A 311B
A Framework Semi-Stacking Using Tweets to Generating Language Mod-
17:00
for the Con- for Semi- Help Sentence overspecified els for Image
struction of supervised Sen- Compression referring ex- Captioning:
Monolingual timent Classifi- for News High- pressions: the The Quirks and
and Cross- cation lights Genera- role of discrimi- What Works
lingual Word Li, Huang, tion nation Devlin, Cheng,
Similarity Wang, and Zhou Wei, Liu, Li, Paraboni, Fang, Gup-
Datasets and Gao Galindo, and ta, Deng, He,
Camacho- Iacovelli Zweig, and
Collados, Pile- Mitchell
hvar, and Nav-
igli
On metric em- Deep Markov Domain- Using prosodic A Distributed
17:15
bedding for Neural Network Specific Para- annotations to Representation

boosting se- for Sequential phrase Extrac- improve coref- Based Query
mantic similari- Data Classifica- tion erence resolu- Expansion Ap-
ty computations tion Pavlick, Gan- tion of spoken proach for Im-
Subercaze, Yang itkevitch, text age Captioning
Gravier, and Chan, Yao, Van Roesiger and Yagcioglu, Er-
Laforest Durme, and Riester dem, Erdem,
Callison-Burch and Cakici
Improving Dis- Semantic Anal- Simplifying Spectral Semi- Learning lan-
17:30
tributed Rep- ysis and Help- Lexical Simpli- Supervised Dis- guage through
resentation of fulness Predic- fication: Do We course Relation pictures
Word Sense via tion of Text for Need Simplified Classification Chrupaa,
WordNet Gloss Online Product Corpora? Fisher and Sim- Kdr, and Al-
Composition Reviews Glava and Sta- mons ishahi
and Context Yang, Yan, Qiu, jner
Clustering and Bao
Chen, Xu, He,
and Wang
A Multitask Document Clas- Zoom: a cor- I do not dis- Exploiting Im-
17:45
Objective to sification by In- pus of natural agree: leverag- age Generality

Inject Lexical version of Dis- language de- ing monolin- for Lexical En-
Contrast into tributed Lan- scriptions of gual alignment tailment Detec-
Distributional guage Repre- map locations to detect dis- tion
Semantics sentations Altamirano, agreement in Kiela, Rimell,
Pham, Lazari- Taddy Ferreira, dialogue Vulic, and Clark
dou, and Baroni Paraboni, and Gokcen and De
Benotti Marneffe
66
Parallel Session 4
Session 4A: Semantics

309A Chair: Jonathan Berant
A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets
Jos Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli 17:0017:15
Despite being one of the most popular tasks in lexical semantics, word similarity has often been limited to
the English language. Other languages, even those that are widely spoken such as Spanish, do not have a
reliable word similarity evaluation framework. We put forward robust methodologies for the extension of
existing English datasets to other languages, both at monolingual and cross-lingual levels. We propose an
automatic standardization for the construction of cross-lingual similarity datasets, and provide an evaluation,
demonstrating its reliability and robustness. Based on our procedure and taking the RG-65 word similarity
dataset as a reference, we release two high-quality Spanish and Farsi Persian monolingual datasets, and fifteen
cross-lingual datasets for six languages: English, Spanish, French, German, Portuguese, and Farsi.
On metric embedding for boosting semantic similarity computations

Julien Subercaze, Christophe Gravier, and Frdrique Laforest 17:1517:30
Computing pairwise word semantic similarity is widely used and serves as a building block in many tasks in
NLP. In this paper, we explore the embedding of the shortest-path metrics from a knowledge base Wordnet
into the Hamming hypercube, in order to enhance the computation performance. We show that, altough an
isometric embedding is untractable, it is possible to achieve good non-isometric embeddings. We report a
speedup of three orders of magnitude for the task of computing Leacock and Chodorow LCH similarities while
keeping strong correlations r = .819; rho =.826.
Improving Distributed Representation of Word Sense via WordNet Gloss Composition and
Context Clustering
Tao Chen, Ruifeng Xu, Yulan He, and Xuan Wang 17:3017:45
In recent years, there has been an increasing interest in learning a distributed representation of word sense.
Traditional context clustering based models usually require careful tuning of model parameters, and typically
perform worse on infrequent word senses. This paper presents a novel approach which addresses these limita-
tions by first initializing the word sense embeddings through learning sentence-level embeddings from WordNet
glosses using a convolutional neural networks. The initialized word sense embeddings are used by a context
clustering based model to generate the distributed representations of word senses. Our learned representations
outperform the publicly available embeddings on 2 out of 4 metrics in the word similarity task, and 6 out of 13
sub tasks in the analogical reasoning task.
A Multitask Objective to Inject Lexical Contrast into Distributional Semantics

Nghia The Pham, Angeliki Lazaridou, and Marco Baroni 17:4518:00
Distributional semantic models have trouble distinguishing strongly contrasting words such as antonyms from
highly compatible ones such as synonyms, because both kinds tend to occur in similar contexts incorpora. We
introduce the multitask Lexical Contrast Model mLCM, an extension of the effective Skip-gram method that
optimizes semantic vectors on the joint tasks of predicting corpus contexts and making the representations of
WordNet synonyms closer than that of matching WordNet antonyms. mLCM outperforms Skip-gram both on
general semantic tasks and on synonym/antonym discrimination, even when no direct lexical contrast infor-
mation about the test words is provided during training. mLCM also shows promising results on the task of
learning a compositional negation operator mapping adjectives to their antonyms.
67
Session 4
Session 4B: Sentiment Analysis

309B Chair: Fei Liu
Semi-Stacking for Semi-supervised Sentiment Classification
Shoushan Li, Lei Huang, Jingjing Wang, and Guodong Zhou 17:0017:15
In this paper, we address semi-supervised sentiment learning via semi-stacking, which integrates two or more
semi-supervised learning algorithms from an ensemble learn-ing perspective. Specifically, we apply meta-
learning to predict the unlabeled data given the outputs from the member algorithms and propose N-fold cross
validation to guarantee a suitable size of the data for training the meta-classifier. Evaluation on four domains
shows that such a semi-stacking strategy performs consistently bet-ter than its member algorithms.
Deep Markov Neural Network for Sequential Data Classification

Min Yang 17:1517:30
We present a general framework for incorporating sequential data and arbitrary features into language model-
ing. The general framework consists of two parts: a hidden Markov component and a recursive neural network
component. We demonstrate the effectiveness of our model by applying it to a specific application: predicting
topics and sentiments in dialogues. Experiments on real data demonstrate that our method is substantially more
accurate than previous methods.
Semantic Analysis and Helpfulness Prediction of Text for Online Product Reviews
Yinfei Yang, Yaowei Yan, Minghui Qiu, and Forrest Bao 17:3017:45
Predicting the helpfulness of product reviews is a key component of many e-commerce tasks such as review
ranking and recommendation. However, previous work mixed review helpfulness prediction with those outer
layer tasks. Using non-text features, it leads to less transferable models. This paper solves the problem from
a new angle by hypothesizing that helpfulness is an internal property of text. Purely using review text, we
isolate review helpfulness prediction from its outer layer tasks, employ two interpretable semantic features, and
use human scoring of helpfulness as ground truth. Experimental results show that the two semantic features
can accurately predict helpfulness scores and greatly improve the performance compared with using features
previously used. Cross-category test further shows the models trained with semantic features are easier to be
generalized to reviews of different product categories. The models we built are also highly interpretable and
align well with human annotations.
Document Classification by Inversion of Distributed Language Representations

Matt Taddy 17:4518:00
There have been many recent advances in the structure and measurement of distributed language models: those
that map from words to a vector-space that is rich in information about word choice and composition. This
vector-space is the distributed language representation.The goal of this note is to point out that any distributed
representation can be turned into a classifier through inversion via Bayes rule. The approach is simple and
modular, in that it will work with any language representation whose training can be formulated as optimizing
a probability model. In our application to 2 million sentences from Yelp reviews, we also find that it performs
as well as or better than complex purpose-built algorithms.
68
Session 4C: Summarization and Generation

310 Chair: Simone Teufel
Using Tweets to Help Sentence Compression for News Highlights Generation
Zhongyu Wei, Yang Liu, Chen Li, and Wei Gao 17:0017:15
We explore using relevant tweets of a given news article to help sentence compression for generating com-
pressive news highlights. We extend an unsupervised dependency-tree based sentence compression approach
by incorporating tweet information to weight the tree edge in terms of informativeness and syntactic impor-
tance. The experimental results on a public corpus that contains both news articles and relevant tweets show
that our proposed tweets guided sentence compression method can improve the summarization performance
significantly compared to the baseline generic sentence compression method.
Domain-Specific Paraphrase Extraction

Ellie Pavlick, Juri Ganitkevitch, Tsz Ping Chan, Xuchen Yao, Benjamin Van Durme, and Chris
Callison-Burch 17:1517:30
The validity of applying paraphrase rules depends on the domain of the text that they are being applied to. We
develop a novel method for extracting domain-specific paraphrases. We adapt the bilingual pivoting paraphrase
method to bias the training data to be more like our target domain of biology. Our best model results in higher
precision while retaining complete recall, giving a 10
Simplifying Lexical Simplification: Do We Need Simplified Corpora?

Goran Glava and Sanja Stajner 17:3017:45
Simplification of lexically complex texts, by replacing complex words with their simpler synonyms, helps non-
native speakers, children, and language-impaired people understand text better. Recent lexical simplification
methods rely on manually simplified corpora, which are expensive and time-consuming to build. We present an
unsupervised approach to lexical simplification that makes use of the most recent word vector representations
and requires only regular corpora. Results of both automated and human evaluation show that our simple
method is as effective as systems that rely on simplified corpora.
Zoom: a corpus of natural language descriptions of map locations

Romina Altamirano, Thiago Ferreira, Ivandr Paraboni, and Luciana Benotti 17:4518:00
This paper describes an experiment to elicit referring expressions from human subjects for research in natural
language generation and related fields, and preliminary results of a computational model for the generation of
these expressions. Unlike existing resources of this kind, the resulting data set - the Zoom corpus of natural
language descriptions of map locations - takes into account a domain that is significantly closer to real-world
applications than what has been considered in previous work, and addresses more complex situations of refer-
ence, including contexts with different levels of detail, and instances of singular and plural reference produced
by speakers of Spanish and Portuguese.
69
Session 4
Session 4D: Discourse, Coreference

311A Chair: Vincent Ng
Generating overspecified referring expressions: the role of discrimination
Ivandr Paraboni, Michelle Galindo, and Douglas Iacovelli 17:0017:15
We present an experiment to compare a standard, minimally distinguishing algorithm for the generation of
relational referring expressions with two alternatives that produce overspecified descriptions. The experiment
shows that discrimination - which normally plays a major role in the disambiguation task - is also a major
influence in referential overspecification, even though disambiguation is in principle not relevant.
Using prosodic annotations to improve coreference resolution of spoken text

Ina Roesiger and Arndt Riester 17:1517:30
This paper is the first to examine the effect of prosodic features on coreference resolution in spoken discourse.
We test features from different prosodic levels and investigate which strategies can be applied. Our results on
the basis of manual prosodic labelling show that the presence of an accent is a helpful feature in a machine-
learning setting. Including prosodic boundaries and determining whether the accent is the nuclear accent further
increases results.
Spectral Semi-Supervised Discourse Relation Classification
Robert Fisher and Reid Simmons 17:3017:45
Discourse parsing is the process of discovering the latent relational structure of a long form piece of text and
remains a significant open challenge. One of the most difficult tasks in discourse parsing is the classification
of implicit discourse relations. Most state-of-the-art systems do not leverage the great volume of unlabeled
text available on the webthey rely instead on human annotated training data. By incorporating a mixture
of labeled and unlabeled data, we are able to improve relation classification accuracy, reduce the need for
annotated data, while still retaining the capacity to use labeled data to ensure that specific desired relations are
learned. We achieve this using a latent variable model that is trained in a reduced dimensionality subspace
using spectral methods. Our approach achieves an F1 score of 0.485 on the implicit relation labeling task for
the Penn Discourse Treebank.
I do not disagree: leveraging monolingual alignment to detect disagreement in dialogue
Ajda Gokcen and Marie-Catherine De Marneffe 17:4518:00
A wide array of natural dialogue discourse can be found on the internet. Previous attempts to automatically
determine disagreement between interlocutors in such dialogue have mostly relied on n-gram and grammatical
dependency features taken from respondent text. Agreement-disagreement classifiers built upon these baseline
features tend to do poorly, yet have proven difficult to improve upon. Using the Internet Argument Corpus,
which comprises quote and response post pairs taken from an online debate forum with human-annotated
agreement scoring, we introduce semantic environment features derived by comparing quote and response
sentences which align well. We show that this method improves classifier accuracy relative to the baseline
method namely in the retrieval of disagreeing pairs, which improves from 69
70
Session 4E: Language and Vision

311B Chair: Desmond Elliott
Language Models for Image Captioning: The Quirks and What Works
Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, and
Margaret Mitchell 17:0017:15
Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined
process where a set of candidate words is generated by a convolutional neural network CNN trained on images,
and then a maximum entropy ME language model is used to arrange these words into a coherent sentence.
The second uses the penultimate activation layer of the CNN as input to a recurrent neural network RNN
that then generates the caption sequence. In this paper, we compare the merits of these different language
modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues
in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By
combining key aspects of the ME and RNN methods, we achieve a new record performance over previously
published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to
human judgments.
A Distributed Representation Based Query Expansion Approach for Image Captioning

Semih Yagcioglu, Erkut Erdem, Aykut Erdem, and Ruket Cakici 17:1517:30
In this paper, we propose a novel query expansion approach for improving transfer-based automatic image cap-
tioning. The core idea of our method is to translate the given visual query into a distributional semantics based
form, which is generated by the average of the sentence vectors extracted from the captions of images visu-
ally similar to the input image. Using three image captioning benchmark datasets, we show that our approach
provides more accurate results compared to the state-of-the-art data-driven methods in terms of both automatic
metrics and subjective evaluation.
Learning language through pictures

Grzegorz Chrupaa, kos Kdr, and Afra Alishahi 17:3017:45
We propose Imaginet, a model of learning visually grounded representations of language from coupled textual
and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings,
and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict
its visual representation and the next word in the sentence. Mimicking an important aspect of human lan-
guage learning, it acquires meaning representations for individual words from descriptions of visual scenes.
Moreover, it learns to effectively use sequential structure in semantic interpretation of multi-word phrases.
Exploiting Image Generality for Lexical Entailment Detection

Douwe Kiela, Laura Rimell, Ivan Vulic, and Stephen Clark 17:4518:00
We exploit the visual properties of concepts for lexical entailment detection by examining a concepts general-
ity. We introduce three unsupervised methods for determining a concepts generality, based on its related im-
ages, and obtain state-of-the-art performance on two standard semantic evaluation datasets. We also introduce
a novel task that combines hypernym detection and directionality, significantly outperforming a competitive
frequency-based baseline.
71
Poster session P1.02
Poster session P1
Time: 18:0021:00 Location: Plenary Hall B

1-1 Encoding Distributional Semantics into Triple-Based Knowledge Ranking for Document
Enrichment
Muyu Zhang, Bing Qin, Mao Zheng, Graeme Hirst, and Ting Liu
Document enrichment focuses on retrieving relevant knowledge from external resources, which is essential
because text is generally replete with gaps. Since conventional work primarily relies on special resources, we
instead use triples of Subject, Predicate, Object as knowledge and incorporate distributional semantics to rank
them. Our model first extracts these triples automatically from raw text and converts them into real-valued
vectors based on the word semantics captured by Latent Dirichlet Allocation. We then represent these triples,
together with the source document that is to be enriched, as a graph of triples, and adopt a global iterative
algorithm to propagate relevance weight from source document to these triples so as to select the most relevant
ones. Evaluated as a ranking problem, our model significantly outperforms multiple strong baselines. More-
over, we conduct a task-based evaluation by incorporating these triples as additional features into document
classification and enhances the performance by 3.02
1-2 A Strategic Reasoning Model for Generating Alternative Answers

Jon Stevens, Anton Benz, Sebastian Reue, and Ralf Klabunde
We characterize a class of indirect answers to yes/no questions, alternative answers, where information is
given that is not directly asked about, but which might nonetheless address the underlying motivation for the
question. We develop a model rooted in game theory that generates these answers via strategic reasoning
about possible unobserved domain-level user requirements. We implement the model within an interactive
question answering system simulating real estate dialogue. The system learns a prior probability distribution
over possible user requirements by analyzing training dialogues, which it uses to make strategic decisions about
answer selection. The system generates pragmatically natural and interpretable answers which make for more
efficient interactions compared to a baseline.
1-3 Modeling Argument Strength in Student Essays

Isaac Persing and Vincent Ng
While recent years have seen a surge of interest in automated essay grading, including work on grading essays
with respect to particular dimensions such as prompt adherence, coherence, and technical quality, there has been
relatively little work on grading the essay dimension of argument strength, which is arguably the most important
aspect of argumentative essays. We introduce a new corpus of argumentative student essays annotated with
argument strength scores and propose a supervised, feature-rich approach to automatically scoring the essays
along this dimension. Our approach significantly outperforms a baseline that relies solely on heuristically
applied sentence argument function labels by up to 16.1

1-4 Summarization of Multi-Document Topic Hierarchies using Submodular Mixtures
Ramakrishna Bairi, Rishabh Iyer, Ganesh Ramakrishnan, and Jeff Bilmes
We study the problem of summarizing DAG-structured topic hierarchies over a given set of documents. Ex-
ample applications include automatically generating Wikipedia disambiguation pages for a set of articles,
and generating candidate multi-labels for preparing machine learning datasets e.g., for text classification,
functional genomics, and image classification. Unlike previous work, which focuses on clustering the set of
documents using the topic hierarchy as features, we directly pose the problem as a submodular optimization
problem on a topic hierarchy using the documents as features. Desirable properties of the chosen topics include
72
document coverage, specificity, topic diversity, and topic homogeneity, each of which, we show, is naturally
modeled by a submodular function. Other information, provided say by unsupervised approaches such as LDA
and its variants, can also be utilized by defining a submodular function that expresses coherence between the
chosen topics and this information. We use a large-margin framework to learn convex mixtures over the set
of submodular components. We empirically evaluate our method on the problem of automatically generating
Wikipedia disambiguation pages using human generated clusterings as ground truth. We find that our frame-
work improves upon several baselines according to a variety of standard evaluation metrics including the
Jaccard Index, F1 score and NMI, and moreover, can be scaled to extremely large scale problems.
1-5 Learning to Explain Entity Relationships in Knowledge Graphs

Nikos Voskarides, Edgar Meij, Manos Tsagkias, Maarten De Rijke, and Wouter Weerkamp
We study the problem of explaining relationships between pairs of knowledge graph entities with human-
readable descriptions. Our method extracts and enriches sentences that refer to an entity pair from a corpus
and ranks the sentences according to how well they describe the relationship between the entities. We model
this task as a learning to rank problem for sentences and employ a rich set of features. When evaluated on a
large set of manually annotated sentences, we find that our method significantly improves over state-of-the-art
baseline models.

1-6 [TACL] Exploiting Parallel News Streams for Unsupervised Event Extraction
Congle Zhang, Stephen Soderland, and Daniel S. Weld
Most approaches to relation extraction, the task of extracting ground facts from natural language text, are based
on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to
a comprehensive set of relations. Distant supervision, which automatically creates training data, only works
with relations that already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase rarely cover
event relations (e.g. "person travels to location"). Thus, the problem of extracting a wide range of events
e.g., from news streams is an important, open challenge. This paper introduces NewsSpike-RE, a novel,
unsupervised algorithm that discovers event relations and then learns to extract them. NewsSpike-RE uses a
novel probabilistic graphical model to cluster sentences describing similar events from parallel news streams.
These clusters then comprise training data for the extractor. Our evaluation shows that NewsSpike-RE gener-
ates high quality training sentences and learns extractors that perform much better than rival approaches, more
than doubling the area under a precision-recall curve compared to Universal Schemas.
1-7 Bring you to the past: Automatic Generation of Topically Relevant Event Chronicles
Tao Ge, Wenzhe Pei, Heng Ji, Sujian Li, Baobao Chang, and Zhifang Sui
An event chronicle provides people with an easy and fast access to learn the past. In this paper, we propose the
first novel approach to automatically generate a topically relevant event chronicle during a certain period given
a reference chronicle during another period. Our approach consists of two core components a time-aware
hierarchical Bayesian model for event detection, and a learning-to-rank model to select the salient events to
construct the final chronicle. Experimental results demonstrate our approach is promising to tackle this new
problem.
1-8 Context-aware Entity Morph Decoding

Boliang Zhang, Hongzhao Huang, Xiaoman Pan, Sujian Li, Chin-Yew Lin, Heng Ji, Kevin Knight,
Zhen Wen, Yizhou Sun, Jiawei Han, and Bulent Yener
People create morphs, a special type of fake alternative names, to achieve certain communication goals such
as expressing strong sentiment or evading censors. For example, Black Mamba, the name for a highly ven-
omous snake, is a morph Kobe Bryant created for himself due to his agility and aggressiveness in playing
basketball games. This paper presents the first end-to-end context-aware entity morph decoding system that
can automatically identify, disambiguate, verify morph mentions based on specific contexts, and resolve them
to target entities. Our approach is based on an absolute cold-start - it does not require any candidate morph
or target entity lists as input, nor any manually constructed morph-target pairs for training. We design a semi-
supervised collective inference framework for morph mention extraction, and compare various deep learning
73
based approaches for morph resolution. Our approach achieved significant improvement over the state-of-the-
art method Huang et al., 2013 which used a large amount of training data.
1-9 Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities
Dirk Weissenborn, Leonhard Hennig, Feiyu Xu, and Hans Uszkoreit
In this paper, we present a novel approach to joint word sense disambiguation WSD and entity linking EL that
combines a set of complementary objectives in an extensible multi-objective formalism. During disambiguation
the system performs continuous optimization to find optimal probability distributions over candidate senses.
The performance of our system on nominal WSD as well as EL improves state-of-the-art results on several
corpora. These improvements demonstrate the importance of combining complementary objectives in a joint
model for robust disambiguation.
1-10 Building a Scientific Concept Hierarchy Database (SCHBase)

Eytan Adar and Srayan Datta
Extracted keyphrases can enhance numerous applications ranging from search to tracking the evolution of sci-
entific discourse. We present SCHBase, a hierarchical database of keyphrases extracted from large collections
of scientific literature. SCHBase relies on a tendency of scientists to generate new abbreviations that extend ex-
isting forms as a form of signaling novelty. We demonstrate how these keyphrases/concepts can be extracted,
and their viability as a database in relation to existing collections. We further show how keyphrases can be
placed into a semantically-meaningful phylogenetic structure and describe key features of this structure. The
complete SCHBase dataset is available at: http://cond.org/schbase.html.
1-11 Sentiment-Aspect Extraction based on Restricted Boltzmann Machines

Linlin Wang, Kang Liu, Zhu Cao, Jun Zhao, and Gerard De Melo
Aspect extraction and sentiment analysis of reviews are both important tasks in opinion mining. We propose a
novel sentiment and aspect extraction model based on Restricted Boltzmann Machines to jointly address these
two tasks in an unsupervised setting. This model reflects the generation process of reviews by introducing a
heterogeneous structure into the hidden layer and incorporating informative priors. Experiments show that our
model outperforms previous state-of-the-art methods.
1-12 Classifying Relations by Ranking with Convolutional Neural Networks

Cicero dos Santos, Bing Xiang, and Bowen Zhou
Relation classification is an important semantic processing task for which state-of-the-art systems still rely on
costly handcrafted features. In this work we tackle the relation classification task using a convolutional neural
network that performs classification by ranking CR-CNN. We propose a new pairwise ranking loss function that
makes it easy to reduce the impact of artificial classes. We perform experiments using the the SemEval-2010
Task 8 dataset, which is designed for the task of classifying the relationship between two nominals marked in a
sentence. Using CRCNN, we outperform the state-of-the-art for this dataset and achieve a F1 of 84.1 without
using any costly handcrafted features. Additionally, our experimental results show that: 1 our approach is more
effective than CNN followed by a softmax classifier; 2 omitting the representation of the artificial class Other
improves both precision and recall; and 3 using only word embeddings as input features is enough to achieve
state-of-the-art results if we consider only the text between the two target nominals.
1-13 Semantic Representations for Domain Adaptation: A Case Study on the Tree Kernel-based
Method for Relation Extraction
Thien Huu Nguyen, Barbara Plank, and Ralph Grishman
We study the application of word embeddings to generate semantic representations for the domain adapta-
tion problem of relation extraction RE in the tree kernel-based method. We systematically evaluate various
techniques to generate the semantic representations and demonstrate that they are effective to improve the
generalization performance of a tree kernel-based relation extractor across domains up to 7
1-14 Omnia Mutantur, Nihil Interit: Connecting Past with Present by Finding Corresponding
Terms across Time
Yating Zhang, Adam Jatowt, Sourav Bhowmick, and Katsumi Tanaka
In the current fast paced world, people tend to possess limited knowledge about things from the past. For
example, some young users may not know that Walkman played similar function as iPod does nowadays. In
this paper, we approach the temporal correspondence problem in which, given the input term e.g., iPod and the
target time e.g. 1980s, the task is to find the counterpart of the query that existed in the target time. We propose
74
an approach that transforms word contexts across time based on their neural network representations. We then
experimentally demonstrate the effectiveness of our method on the New York Times Annotated Corpus.
1-15 Negation and Speculation Identification in Chinese Language

Bowei Zou, Qiaoming Zhu, and Guodong Zhou
Identifying negative or speculative narrative fragments from fact is crucial for natural language processing
NLP applications. Previous studies on negation and speculation identification in Chinese language suffers
much from two problems: corpus scarcity and the bottleneck in fundamental Chinese information processing.
To resolve these problems, this paper constructs a Chinese corpus which consists of three sub-corpora from
different resources. In order to detect the negative and speculative cues, a sequence labeling model is proposed.
Moreover, a bilingual cue expansion method is proposed to increase the coverage in cue detection. In addition,
this paper presents a new syntactic structure-based framework to identify the linguistic scope of a cue, instead of
the traditional chunking-based frame-work. Experimental results justify the usefulness of our Chinese corpus
and the appropriateness of our syntactic structure-based framework which obtained significant improvement
over the state-of-the-art on negation and speculation identification in Chinese language.
1-16 Learning Relational Features with Backward Random Walks

Ni Lao, Einat Minkov, and William W. Cohen
The path ranking algorithm PRA has been recently proposed to address relational classification and retrieval
tasks at large scale. We describe Cor-PRA, an enhanced system that can model a larger space of relational rules,
including longer relational rules and a class of first order rules with constants, while maintaining scalability. We
describe and test faster algorithms for searching for these features. A key contribution is to leverage backward
random walks to efficiently discover these types of rules. An empirical study is conducted on the tasks of graph-
based knowledge base inference, and person named entity extraction from parsed text. Our results show that
learning paths with constants improves performance on both tasks, and that modeling longer paths dramatically
improves performance for the named entity extraction task.

1-17 Learning the Semantics of Manipulation Action
Yezhou Yang, Yiannis Aloimonos, Cornelia Fermuller, and Eren Erdal Aksoy
In this paper we present a formal computational framework for modeling manipulation actions. The introduced
formalism leads to semantics of manipulation action and has applications to both observing and understanding
human manipulation actions as well as executing them with a robotic mechanism e.g. a humanoid robot. It is
based on a Combinatory Categorial Grammar. The goal of the introduced framework is to: 1 represent manip-
ulation actions with both syntax and semantic parts, where the semantic part employs -calculus; 2 enable a
probabilistic semantic parsing schema to learn the lambda-calculus representation of manipulation action from
an annotated action corpus of videos; 3 use 1 and 2 to develop a system that visually observes manipulation
actions and understands their meaning while it can reason beyond observations using propositional logic and
axiom schemata. The experiments conducted on a public available large manipulation action dataset validate
the theoretical framework and our implementation.

1-18 Knowledge Graph Embedding via Dynamic Mapping Matrix
Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao
Knowledge graphs are useful resources for many AI applications, but they often suffer from incompleteness.
Previous work like TransE, TransH and TransR/CTransR, they regard relation as a translating from head en-
tity to tail entity and the CTransR achieves state-of-the-art performance. In this paper, we propose a more
fine-grained model named TransD, which is an improvement of CTransR. In TransD, we use two vectors to
75
represent a named symbol object entity and relation. The first one represents the meaning of an entity relation,
the other one is used to construct mapping matrix dynamically. Compared with CTransR, TransD not only
considers the diversity of relations, but also entities. TransD has less parameters and has no matrix-vector mul-
tiplication. In Experiments, we evaluate our model on two typical task including triplets classification and link
prediction. Evaluation results show that our model outperforms the other embedding models including TransE,
TransH and TransR/CTransR.
1-19 How Far are We from Fully Automatic High Quality Grammatical Error Correction?
Christopher Bryant and Hwee Tou Ng
In this paper, we first explore the role of inter-annotator agreement statistics in grammatical error correction
and conclude that they are less informative in fields where there may be more than one correct answer. We
next created a dataset of 50 student essays, each corrected by 10 different annotators for all error types, and
investigated how both human and GEC system scores vary when different combinations of these annotations
are used as the gold standard. Upon learning that even humans are unable to score higher than 75

1-20 Knowledge Portability with Semantic Expansion of Ontology Labels
Mihael Arcan, Marco Turchi, and Paul Buitelaar
Our research focuses on the multilingual enhancement of ontologies that, often represented only in English,
need to be translated in different languages to enable knowledge access across languages. Ontology translation
is a rather different task then the classic document translation, because ontologies contain highly specific vocab-
ulary and they lack of contextual information. For these reasons, to improve automatic ontology translations,
we first focus on identifying relevant unambiguous and domain-specific sentences from a large set of generic
parallel corpora. Then, we leverage Linked Open Data resources, such as DBPedia, to isolate ontology-specific
bilingual lexical knowledge. In both cases, we take advantage of the semantic information of the labels to
select relevant bilingual data with the aim of building an ontology-specific statistical machine translation sys-
tem. We evaluate our approach to the translation of a medical ontology, translating from English into German.
Our experiment shows significant improvement of around 3 BLEU points compared to a generic as well as a
domain-specific translation approach.
1-21 Automatic disambiguation of English puns

Tristan Miller and Iryna Gurevych
Traditional approaches to word sense disambiguation WSD rest on the assumption that there exists a single,
unambiguous communicative intention underlying every word in a document. However, writers sometimes
intend for a word to be interpreted as simultaneously carrying multiple distinct meanings. This deliberate use
of lexical ambiguity i.e., punning is a particularly common source of humour. In this paper we describe
how traditional, language-agnostic WSD approaches can be adapted to disambiguate puns, or rather to identify
their double meanings. We evaluate several such approaches on a manually sense-annotated corpus of English
puns and observe performance exceeding that of some knowledge-based and supervised baselines.
1-22 Unsupervised Cross-Domain Word Representation Learning

Danushka Bollegala, Takanori Maehara, and Ken-Ichi Kawarabayashi
Meaning of a word varies from one domain to another. Despite this important domain dependence in word
semantics, existing word representation learning methods are bound to a single domain. Given a pair of source-
target domains, we propose an unsupervised method for learning domain-specific word representations that
accurately capture the domain-specific aspects of word semantics. First, we select a subset of frequent words
that occur in both domains as pivots. Next, we optimize an objective function that enforces two constraints:
a for both source and target domain documents, pivots that appear in a document must accurately predict the
co-occurring non-pivots, and b word representations learnt for pivots must be similar in the two domains.
Moreover, we propose a method to perform domain adaptation using the learnt word representations. Our pro-
posed method significantly outperforms competitive baselines including the state-of-the-art domain-insensitive
word representations, and reports best sentiment classification accuracies for all domain-pairs in a benchmark
dataset.
76
1-23 A Unified Multilingual Semantic Representation of Concepts

Jos Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli
Semantic representation lies at the core of several applications in Natural Language Processing. However,
most existing semantic representation techniques cannot be used effectively for the representation of individual
word senses. We put forward a novel multilingual concept representation, called Muffin, which not only en-
ables accurate representation of word senses in different languages, but also provides multiple advantages over
existing approaches. Muffin represents a given concept in a unified semantic space irrespective of the language
of interest, enabling cross-lingual comparison of different concepts. We evaluate our approach in two dif-
ferent evaluation benchmarks, semantic similarity and Word Sense Disambiguation, reporting state-of-the-art
performance on several standard datasets.

1-24 Demographic Factors Improve Classification Performance
Dirk Hovy
Extra-linguistic factors influence language use, and are accounted for by speakers and listeners. Most natu-
ral language processing NLP tasks to date, however, treat language as uniform. This assumption can harm
performance. We investigate the effect of including demographic information on performance in a variety of
text-classification tasks. We find that by including age or gender information, we consistently and significantly
improve performance over demographic-agnostic models. These results hold across three text-classification
tasks in five languages.
1-25 Vector-space calculation of semantic surprisal for predicting word pronunciation duration
Asad Sayeed, Stefan Fischer, and Vera Demberg
In order to build psycholinguistic models of processing difficulty and evaluate these models against human
data, we need highly accurate language models. Here we specifically consider surprisal, a words predictability
in context. Existing approaches have mostly used n-gram models or more sophisticated syntax-based parsing
models; this largely does not account for effects specific to semantics. We build on the work by Mitchell et
al. 2010 and show that the semantic prediction model suggested there can successfully predict spoken word
durations in naturalistic con versational data.An interesting finding is that the training data for the semantic
model also plays a strong role: the model trained on in-domain data, even though a better language model for
our data, is not able to predict word durations, while the out-of-domain trained language model does predict
word durations. We argue that this at first counter-intuitive result is due to the out-of-domain model better
matching the language models of the speakers in our data.

1-26 Efficient Methods for Inferring Large Sparse Topic Hierarchies
Doug Downey, Chandra Bhagavatula, and Yi Yang
Latent variable topic models such as Latent Dirichlet Allocation LDA can discover topics from text in an un-
supervised fashion. However, scaling the models up to the many distinct topics exhibited in modern corpora
is challenging. Flat topic models like LDA have difficulty modeling sparsely expressed topics, and richer
hierarchical models become computationally intractable as the number of topics increases.In this paper, we
introduce efficient methods for inferring large topic hierarchies. Our approach is built upon the Sparse Backoff
Tree SBT, a new prior for latent topic distributions that organizes the latent topics as leaves in a tree. We show
how a document model based on SBTs can effectively infer accurate topic spaces of over a million topics.
We introduce a collapsed sampler for the model that exploits sparsity and the tree structure in order to make
inference efficient. In experiments with multiple data sets, we show that scaling to large topic spaces results in
much more accurate models, and that SBT document models make use of large topic spaces more effectively
77
than flat LDA.

1-27 Trans-dimensional Random Fields for Language Modeling
Bin Wang, Zhijian Ou, and Zhiqiang Tan
Language modeling LM involves determining the joint probability of words in a sentence. The conditional
approach is dominant, representing the joint probability in terms of conditionals. Examples include n-gram
LMs and neural network LMs. An alternative approach, called the random field RF approach, is used in
whole-sentence maximum entropy WSME LMs. Although the RF approach has potential benefits, the em-
pirical results of previous WSME models are not satisfactory. In this paper, we revisit the RF approach for
language modeling, with a number of innovations. We propose a trans-dimensional RF TDRF model and de-
velop a training algorithm using joint stochastic approximation and trans-dimensional mixture sampling. We
perform speech recognition experiments on Wall Street Journal data, and find that our TDRF models lead to
performances as good as the recurrent neural network LMs but are computationally more efficient in computing
sentence probability.
1-28 Gaussian LDA for Topic Models with Word Embeddings

Rajarshi Das, Manzil Zaheer, and Chris Dyer
Continuous space word embeddings learned from large, unstructured corpora have been shown to be effective
at capturing semantic regularities in language. In this paper we replace LDAs parameterization of topics
as categorical distributions over opaque word types with multivariate Gaussian distributions on the embedding
space. This encourages the model to group words that are a priori known to be semantically related into
topics. To perform inference, we introduce a fast collapsed Gibbs sampling algorithm based on Cholesky
decompositions of covariance matrices of the posterior predictive distributions. We further derive a scalable
algorithm that draws samples from stale posterior predictive distributions and corrects them with a Metropolis
Hastings step. Using vectors learned from a domain-general corpus English Wikipedia, we report results on
two document collections 20-newsgroups and NIPS. Qualitatively, Gaussian LDA infers different but still very
sensible topics relative to standard LDA. Quantitatively, our technique outperforms existing models at dealing
with OOV words in held-out documents.

1-29 Pairwise Neural Machine Translation Evaluation
Francisco Guzmn, Shafiq Joty, Llus Mrquez, and Preslav Nakov
We present a novel framework for machine translation evaluation using neural networks in a pairwise setting,
where the goal is to select the better translation from a pair of hypotheses, given the reference translation. In
this framework, lexical, syntactic and semantic information from the reference and the two hypotheses is com-
pacted into relatively small distributed vector representations, and fed into a multi-layer neural network that
models the interaction between each of the hypotheses and the reference, as well as between the two hypothe-
ses. These compact representations are in turn based on word and sentence embeddings, which are learned
using neural networks. The framework is flexible, allows for efficient learning and classification, and yields
correlation with humans that rivals the state of the art.
1-30 String-to-Tree Multi Bottom-up Tree Transducers
Nina Seemann, Fabienne Braune, and Andreas Maletti
We achieve significant improvements in several syntax-based machine translation experiments using a string-
to-tree variant of multi bottom-up tree transducers.Our new parameterized rule extraction algorithm extracts
string-to-tree rules that can be discontiguous and non-minimal in contrast to existing algorithms for the tree-to-
tree setting. The obtained models significantly outperform the string-to-tree component of the Moses frame-
work in a large-scale empirical evaluation on several known translation tasks. Our linguistic analysis reveals
the remarkable benefits of discontiguous and non-minimal rules.
1-31 Non-linear Learning for Statistical Machine Translation

Shujian Huang, Huadong Chen, Xinyu Dai, and Jiajun Chen
78
Modern statistical machine translation SMT systems usually use a linear combination of features to model
the quality of each translation hypothesis. The linear combination assumes that all the features are in a linear
relationship and constrains that each feature interacts with the rest features in an linear manner, which might
limit the expressive power of the model and lead to a under-fit model with the current data. In this paper,
we propose a non-linear modeling for the quality of translation hypotheses based on neural networks, which
allows more complex interaction between features. A learning framework is presented for training the non-
linear models. We also discuss possible heuristics in designing the network structure which may improve
the non-linear learning performance. Experimental results show that with the basic features of a hierarchical
phrase-based machine translation system, our method produce translations that are significantly better than a
linear model.
1-32 Unifying Bayesian Inference and Vector Space Models for Improved Decipherment
Qing Dou, Ashish Vaswani, Kevin Knight, and Chris Dyer
We introduce into Bayesian decipherment a base distribution derived from similarities of word embeddings.
We use Dirichlet multinomial regression Mimno and McCallum, 2012 to learn a mapping between ciphertext
and plaintext word embeddings from non-parallel data. Experimental results show that the base distribution is
highly beneficial to decipherment, improving state-of-the-art decipherment accuracy from 45.8
1-33 Non-projective Dependency-based Pre-Reordering with Recurrent Neural Network for

Machine Translation
Antonio Valerio Miceli Barone and Giuseppe Attardi
The quality of statistical machine translation performed with phrase based approaches can be increased by
permuting the words in the source sentences in an order which resembles that of the target language. We
propose a class of recurrent neural models which exploit source-side dependency syntax features to reorder
the words into a target-like order. We evaluate these models on the German-to-English and Italian-to-English
language pairs, showing significant improvements over a phrase-based Moses baseline. We also compare with
state of the art German-to-English pre-reordering rules, showing that our method obtains similar or better
results.

1-34 Detecting Deceptive Groups Using Conversations and Network Analysis
Dian Yu, Yulia Tyshchuk, Heng Ji, and William Wallace
Deception detection has been formulated as a supervised binary classification problem on single documents.
However, in daily life, millions of fraud cases involve detailed conversations between deceivers and victims.
Deceivers may dynamically adjust their deceptive statements according to the reactions of victims. In addition,
people may form groups and collaborate to deceive others. In this paper, we seek to identify deceptive groups
from their conversations. We propose a novel subgroup detection method that combines linguistic signals and
signed network analysis for dynamic clustering. A social-elimination game called Killer Game is introduced
as a case study. Experimental results demonstrate that our approach significantly outperforms human vot-
ing and state-of-the-art subgroup detection methods at dynamically differentiating the deceptive groups from
truth-tellers.
1-35 WikiKreator: Improving Wikipedia Stubs Automatically
Siddhartha Banerjee and Prasenjit Mitra
Stubs on Wikipedia often lack comprehensive information. The huge cost of editing Wikipedia and the pres-
ence of only a limited number of active contributors curb the consistent growth of Wikipedia. In this work, we
present WikiKreator, a system that is capable of generating content automatically to improve existing stubs on
Wikipedia. The system has two components. First, a text classifier built using topic distribution vectors is used
to assign content from the web to various sections on a Wikipedia article. Second, we propose a novel abstrac-
tive summarization technique based on an optimization framework that generates section-specific summaries
for Wikipedia stubs. Experiments show that WikiKreator is capable of generating well-formed informative
content. Further, automatically generated content from our system have been appended to Wikipedia stubs and
the content has been retained successfully proving the effectiveness of our approach.
79
1-36 Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes

Chris Quirk, Raymond Mooney, and Michel Galley
Using natural language to write programs is a touchstone problem for computational linguistics. We present
an approach that learns to map natural-language descriptions of simple if-then rules to executable code.
By training and testing on a large corpus of naturally-occurring programs called recipes and their natural
language descriptions, we demonstrate the ability to effectively map language to code. We compare a number
of semantic parsing approaches on the highly noisy training data collected from ordinary users, and find that
loosely synchronous systems perform best.
1-37 Deep Questions without Deep Understanding

Igor Labutov, Sumit Basu, and Lucy Vanderwende
We develop an approach for generating deep i.e, high-level comprehension questions from novel text that
bypasses the myriad challenges of creating a full semantic representation. We do this by decomposing the
task into an ontology-crowd-relevance workflow, consisting of first representing the original text in a low-
dimensional ontology, then crowdsourcing candidate question templates aligned with that space, and finally
ranking potentially relevant templates for a novel region of text. If ontological labels are not available, we infer
them from the text. We demonstrate the effectiveness of this method on a corpus of articles from Wikipedia
alongside human judgments, and find that we can generate relevant deep questions with a precision of over 85
1-38 The NL2KR Platform for building Natural Language Translation Systems
Nguyen Vo, Arindam Mitra, and Chitta Baral
This paper presents the NL2KR platform to build systems that can translate text to different formal languages. It
is freely available, customizable, and comes with an Interactive GUI support that is useful in the development of
a translation system. Our key contribution is a user friendly system based on an interactive multistage learning
algorithm. This effective algorithm employs Inverse-Lambda, Generalization and user provided dictionary to
learn new meanings of words from sentences and their representations. Using the learned meanings, and the
Generalization approach, it is able to translate new sentences. ANON is evaluated on two standard corpora,
Jobs and GeoQuery and t exhibits state-of-the-art performance on both of them.

1-39 Tweet Normalization with Syllables
Ke Xu, Yunqing Xia, and Chin-Hui Lee
In this paper, we propose a syllable-based method for tweet normalization to study the cognitive process of
non-standard word creation in social media. Assuming that syllable plays a fundamental role in forming the
non-standard tweet words, we choose syllable as the basic unit and extend the conventional noisy channel
model by incorporating the syllables to represent the word-to-word transitions at both word and syllable lev-
els. The syllables are used in our method not only to suggest more candidates, but also to measure similarity
between words. Novelty of this work is three-fold: First, to the best of our knowledge, this is an early attempt
to explore syllables in tweet normalization. Second, our proposed normalization method relies on unlabeled
samples, making it much easier to adapt our method to handle non-standard words in any period of history.
And third, we conduct a series of experiments and prove that the proposed method is advantageous over the
state-of-art solutions for tweet normalization.
1-40 Improving Named Entity Recognition in Tweets via Detecting Non-Standard Words
Chen Li and Yang Liu
Most previous work of text normalization on informal text made a strong assumption that the system has already
known which tokens are non-standard words NSW and thus need normalization. However, this is not realistic.
In this paper, we propose a method for NSW detection. In addition to the information based on the dictionary,
e.g., whether a word is out-of-vocabulary OOV, we leverage novel information derived from the normalization
results for OOV words to help make decisions. Second, this paper investigates two methods using NSW
detection results for named entity recognition NER in social media data. One adopts a pipeline strategy, and the
other uses a joint decoding fashion. We also create a new data set with newly added normalization annotation
beyond the existing named entity labels. This is the first data set with such annotation and we release it for
80
research purpose. Our experiment results demonstrate the effectiveness of our NSW detection method and the
benefit of NSW detection for NER. Our proposed methods perform better than the state-of-the-art NER system.

1-41 Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A
G2P Experiment
Steffen Eger
We investigate multiple many-to-many alignments as a primary step in integrating supplemental information
strings in string transduction. Besides outlining DP based solutions to the multiple alignment problem, we
detail an approximation of the problem in terms of multiple sequence segmentations satisfying a coupling con-
straint. We apply our approach to boosting baseline G2P systems using homogeneous as well as heterogeneous
sources of supplemental information.

1-42 A Unified Kernel Approach for Learning Typed Sentence Rewritings
Martin Gleize and Brigitte Grau
Many high level natural language processing problems can be framed as determining if two given sentences are
a rewriting of each other. In this paper, we propose a class of kernel functions, referred to as type-enriched string
rewriting kernels, which, used in kernel-based machine learning algorithms, allow to learn sentence rewritings.
Unlike previous work, this method can be fed external lexical semantic relations to capture a wider class of
rewriting rules. It also does not assume preliminary syntactic parsing but is still able to provide a unified frame-
work to capture syntactic structure and alignments between the two sentences. We experiment on three different
natural sentence rewriting tasks paraphrase identification, textual entailment and answer sentence selection and
obtain state-of-the-art results for all of them.

1-43 [TACL] From Visual Attributes to Adjectives through Decompositional Distributional Se-
mantics
Angeliki Lazaridou, Georgiana Dinu, Adam Liska, and Marco Baroni
As automated image analysis progresses, there is increasing interest in richer linguistic annotation of pictures,
with attributes of objects (e.g., furry, brown...) attracting most attention. By building on the recent "zero-shot
learning" approach, and paying attention to the linguistic nature of attributes as noun modifiers, and specif-
ically adjectives, we show that it is possible to tag images with attribute-denoting adjectives even when no
training data containing the relevant annotation are available. Our approach relies on two key observations.
First, objects can be seen as bundles of attributes, typically expressed as adjectival modifiers (a dog is some-
thing furry, brown, etc.), and thus a function trained to map visual representations of objects to nominal labels
can implicitly learn to map attributes to adjectives. Second, objects and attributes come together in pictures
(the same thing is a dog and it is brown). We can thus achieve better attribute (and object) label retrieval by
treating images as "visual phrases", and decomposing their linguistic representation into an attribute-denoting
adjective and an object-denoting noun. Our approach performs comparably to a method exploiting manual at-
tribute annotation, it outperforms various competitive alternatives in both attribute and object annotation, and it
automatically constructs attribute-centric representations that significantly improve performance in supervised
object recognition.
81
1-44 Perceptually Grounded Selectional Preferences

Ekaterina Shutova, Niket Tandon, and Gerard De Melo
Selectional preferences SPs are widely used in NLP as a rich source of semantic information. While SPs have
been traditionally induced from textual data, human lexical acquisition is known to rely on both linguistic and
perceptual experience. We present the first SP learning method that simultaneously draws knowledge from
text, images and videos, producing a perceptually grounded SP model. Our results show that it outperforms
linguistic and visual models in isolation, as well as the existing SP induction approaches.
1-45 Joint Case Argument Identification for Japanese Predicate Argument Structure Analysis
Hiroki Ouchi, Hiroyuki Shindo, Kevin Duh, and Yuji Matsumoto
Existing methods for Japanese predicate argument structure PAS analysis identify case arguments of each pred-
icate without considering interactions between the target PAS and others in a sentence. However, the argument
structures of the predicates in a sentence are semantically related to each other. This paper proposes new meth-
ods for Japanese PAS analysis to jointly identify case arguments of all predicates in a sentence by 1 modeling
multiple PAS interactions with a bipartite graph and 2 approximately searching optimal PAS combinations. Per-
forming experiments on the NAIST Text Corpus, we demonstrate that our joint analysis methods substantially
outperform a strong baseline and are comparable to previous work.
1-46 Jointly optimizing word representations for lexical and sentential tasks with the C-
PHRASE model
Nghia The Pham, Germn Kruszewski, Angeliki Lazaridou, and Marco Baroni
We introduce C-PHRASE, a distributional semantic model that learns word representations by optimizing
context prediction for phrases at all levels in a syntactic tree, from single words to full sentences. C-PHRASE
outperforms the state-of-the-art C-BOW model on a variety of lexical tasks. Moreover, since C-PHRASE word
vectors are induced through a compositional learning objective modeling the contexts of words combined into
phrases, when they are summed, they produce sentence representations that rival those generated by ad-hoc
compositional models.
1-47 Robust Subgraph Generation Improves Abstract Meaning Representation Parsing

Keenon Werling, Gabor Angeli, and Christopher D. Manning
Abstract Meaning Representation AMR is a representation for open-domain rich semantics, with potential
use in fields like semantic parsing and machine translation. Node generation, typically done using a simple
dictionary lookup, is currently an important limiting factor in AMR parsing. We propose a small set of actions
that derive AMR sub-graphs by transformations on spans of text, which allows for more robust learning of this
stage. Our set of construction actions generalize better than the previous approach, and can be learned with a
simple classifier. We improve on the previous state-of-the-art result for AMR parsing, boosting end-to-end F1
from 59 to 62 on the LDC2013E117 and LDC2014T12 datasets.
1-48 Environment-Driven Lexicon Induction for High-Level Instructions
Dipendra Kumar Misra, Kejia Tao, Percy Liang, and Ashutosh Saxena
We focus on the task of interpreting complex natural language instructions to a robot, in which we must ground
high-level commands such as microwave the cup to low-level actions such as grasping. Previous approaches
that learn a lexicon during training have inadequate coverage at test time, and pure search strategies cannot
handle the exponential search space. We propose a new hybrid approach that leverages the environment to
induce new lexical entries at test time, even for new verbs. Our semantic parsing model jointly reasons about
the text, logical forms, and environment over multi-stage instruction sequences. We introduce a new dataset
and show that our approach is able to successfully ground new verbs such as distribute, mix, arrange to complex
logical forms, each containing up to four predicates.
1-49 Structural Representations for Learning Relations between Pairs of Texts

Simone Filice, Giovanni Da San Martino, and Alessandro Moschitti
This paper studies the use of structural representations for learning relations between pairs of short texts e.g.,
sentences or paragraphs of the kind: the second text answers to, or conveys exactly the same information of,
or is implied by, the first text. Engineering effective features that can capture syntactic and semantic relations
between the constituents composing the target text pairs is rather complex. Thus, we define syntactic and
semantic structures representing the text pairs and then apply graph and tree kernels to them for automatically
engineering features in Support Vector Machines. We carry out an extensive comparative analysis of state-of-
the-art models for this type of relational learning. Our findings allow for achieving the highest accuracy in two
82
different and important related tasks, i.e., Paraphrasing Identification and Textual Entailment Recognition.

1-50 [TACL] Joint Modeling of Opinion Expression Extraction and Attribute Classification
Bishan Yang and Claire Cardie
In this paper, we study the problems of opinion expression extraction and expression-level polarity and inten-
sity classification. Traditional fine-grained opinion analysis systems address these problems in isolation and
thus cannot capture interactions among the textual spans of opinion expressions and their opinion-related prop-
erties. We present two types of joint approaches that can account for such interactions during 1) both learning
and inference or 2) only during inference. Extensive experiments on a standard dataset demonstrate that our
approaches provide substantial improvements over previously published results. By analyzing the results, we
gain some insight into the advantages of different joint models.
1-51 Learning Semantic Representations of Users and Products for Document Level Sentiment
Classification
Duyu Tang, Bing Qin, and Ting Liu
Neural network methods have achieved promising results for sentiment classification of text. However, these
models only use semantics of texts, while ignoring users who express the sentiment and products which are
evaluated, both of which have great influences on interpreting the sentiment of text. In this paper, we address
this issue by incorporating user- and product-level information into a neural network approach for document
level sentiment classification. Users and products are modeled using vector space models, the representations
of which capture important global clues such as individual preferences of users or overall qualities of products.
Such global evidence in turn facilitates embedding learning procedure at document level, yielding better text
representations. By combining evidence at user-, product- and document- level in a unified neural framework,
the proposed model achieves state-of-the-art performances on Amazon and Yelp datasets.
1-52 Towards Debugging Sentiment Lexicons

Andrew Schneider and Eduard Dragut
Central to many sentiment analysis tasks are sentiment lexicons SLs. SLs exhibit polarity inconsistencies.
Previous work studied the problem of checking the consistency of an SL for the case when the entries have
categorical labels positive, negative or neutral and showed that it is NP-hard. In this paper, we address the
more general problem, in which polarity tags take the form of a continuous distribution in the interval [0, 1].
We show that this problem is polynomial. We develop a general framework for addressing the consistency
problem using linear programming LP theory. LP tools allow us to uncover inconsistencies efficiently, paving
the way to building SL debugging tools. We show that previous work corresponds to 0-1 integer programming,
a particular case of LP. Our experimental studies show a strong correlation between polarity consistency in SLs
and the accuracy of sentiment tagging in practice.
1-53 Sparse, Contextually Informed Models for Irony Detection: Exploiting User Communities,
Entities and Sentiment
Byron C. Wallace, Do Kook Choe, and Eugene Charniak
Automatically detecting verbal irony roughly, sarcasm in online content is important for many practical appli-
cations e.g., sentiment detection, but it is difficult. Previous approaches have relied predominantly on signal
gleaned from word counts and grammatical cues. But such approaches fail to exploit the context in which
comments are embedded. We thus propose a novel strategy for verbal irony classification that exploits contex-
tual features, specifically by combining noun phrases and sentiment extracted from comments with the forum
type e.g., conservative or liberal to which they were posted. We show that this approach improves verbal irony
classification performance. Furthermore, because this method generates a very large feature space and we ex-
pect predictive contextual features to be strong but few, we propose a mixed regularization strategy that places
a sparsity-inducing l1 penalty on the contextual feature weights on top of the l2 penalty applied to all model
coefficients. This increases model sparsity and reduces the variance of model performance
83
1-54 Sentence-level Emotion Classification with Label and Context Dependence

Shoushan Li, Lei Huang, Rong Wang, and Guodong Zhou
Predicting emotion categories, such as anger, joy, and anxiety, expressed by a sentence is challenging due
to its inherent multi-label classification difficulty and data sparseness. In this paper, we address above two
challenges by incorporating the label dependence among the emotion labels and the context dependence among
the contextual instances into a factor graph model. Specifically, we recast sentence-level emotion classification
as a factor graph inferring problem in which the label and context dependence are modeled as various factor
functions. Empirical evaluation demonstrates the great potential and effectiveness of our proposed approach to
sentence-level emotion classification.
1-55 Co-training for Semi-supervised Sentiment Classification Based on Dual-view Bags-of-
words Representation
Rui Xia, Cheng Wang, Xinyu Dai, and Tao Li
A review text is normally represented as a bag-of-words BOW in sentiment classification. Such a simplified
BOW model has fundamental deficiencies in modeling some complex linguistic phenomena such as negation.
In this work, we propose a dual-view co-training algorithm based on dual-view BOW representation for semi-
supervised sentiment classification. In dual-view BOW, we automatically construct antonymous reviews and
model a review text by a pair of bags-of-words with opposite views. We make use of the original and antony-
mous views in pairs, in the training, bootstrapping and testing process, all based on a joint observation of two
views. The experimental results demonstrate the advantages of our approach, in meeting the two co-training
requirements, addressing the negation problem, and enhancing the semi-supervised sentiment classification
efficiency.
1-56 Improving social relationships in face-to-face human-agent interactions: when the agent
wants to know users likes and dislikes
Caroline Langlet and Chlo Clavel
This paper tackles the issue of the detection of users likes and dislikes in a human-agent interaction. We present
a system handling the interaction issue by jointly processing agents and users utterances. It is designed as
a rule-based and bottom-up process based on a symbolic representation of the structure of the sentence. This
article also describes the annotation campaign carried out through Amazon Mechanical Turk for the creation
of the evaluation data-set. Finally, we present all measures for rating agreement between our system and the
human reference and obtain agreement scores that correspond at least to substantial agreements.
1-57 Learning Word Representations from Scarce and Noisy Data with Embedding Subspaces
Ramn Astudillo, Silvio Amir, Wang Ling, Mario Silva, and Isabel Trancoso
We investigate a technique to adapt unsupervised word embeddings to specific applications, when only small
and noisy labeled datasets are available. Current methods use pre-trained embeddings to initialize model pa-
rameters, and then use the labeled data to tailor them for the intended task. However, this approach is prone
to overfitting when the training is performed with scarce and noisy data. To overcome this issue, we use the
supervised data to find an embedding subspace that fits the task complexity. All the word representations are
adapted through a projection into this task-specific subspace, even if they do not occur on the labeled dataset.
This approach was recently used in the SemEval 2015 Twitter sentiment analysis challenge, attaining state-of-
the-art results. Here we show results improving those of the challenge, as well as additional experiments in a
Twitter Part-Of-Speech tagging task.

1-58 Automatic Spontaneous Speech Grading: A Novel Feature Derivation Technique using the
Crowd
Vinay Shashidhar, Nishant Pandey, and Varun Aggarwal
In this paper, we address the problem of evaluating spontaneous speech using a combination of machine learn-
ing and crowdsourcing. Machine learning techniques inadequately solve the stated problem because automatic
speaker-independent speech transcription is inaccurate. The features derived from it are also inaccurate and so
is the machine learning model developed for speech evaluation. To address this, we post the task of speech tran-
84
scription to a large community of online workers crowd. We also get spoken English grades from the crowd.
We achieve 95
1-59 Driving ROVER with Segment-based ASR Quality Estimation
Shahab Jalalvand, Matteo Negri, Falavigna Daniele, and Marco Turchi
ROVER is a widely used method to combine the output of multiple automatic speech recognition ASR systems.
Though effective, the basic approach and its variants suffer from potential drawbacks: i their results depend on
the order in which the hypotheses are used to feed the combination process, ii when applied to combine long
hypotheses, they disregard possible differences in transcription quality at local level, iii they often rely on word
confidence information. We address these issues by proposing a segment-based ROVER in which hypothesis
ranking is obtained from a confidence-independent ASR quality estimation method. Our results on English
data from the IWSLT2012 and IWSLT2013 evaluation campaigns significantly outperform standard ROVER
and approximate two strong oracles.

1-60 A Hierarchical Neural Autoencoder for Paragraphs and Documents
Jiwei Li, Thang Luong, and Dan Jurafsky
Natural language generation of coherent long texts like paragraphs or longer documents is a challenging prob-
lem for recurrent networks models. In this paper, we explore an important step toward this generation task:
training an LSTM Long-short term memory auto-encoder to preserve and reconstruct multi-sentence para-
graphs. We introduce a LSTM model that hierarchically builds an embedding for a paragraph from embeddings
for sentences and words, and then decodes this embedding to reconstruct the original paragraph. We evaluate
the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that LSTM models
are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence. While only a
first step toward generating coherent text units from neural models, our work has the potential to significantly
impact natural language processing areas like generation and summarization2 .

1-61 [TACL] Domain Adaptation for Syntactic and Semantic Dependency Parsing Using Deep
Belief Networks
Haitong Yang, Tao Zhuang, and Chengqing Zong
In the current systems for syntactic and semantic dependency parsing, people usually define a very high-
dimensional feature to achieve good performance. But these systems often suffer severe performance drops
on out-of-domain test data due to the diversity of features of different domains. This paper focuses on how to
relieve the domain adaptation with the help of the unlabeled target domain data. We propose a deep learning
method to adapt both syntactic and semantic parsers. With additional unlabeled target domain data, our method
can learn a latent feature representation (LFR) that is beneficial to both domains. Experiments on English data
in CoNLL 2009 shared task show that our method largely reduced the performance drop on out-of-domain test
data. Moreover, we get a Macro F1 score that is 2.36 points higher than the best system in the CoNLL 2009
shared task in out-of-domain tests.
1-62 Joint Dependency Parsing and Multiword Expression Tokenization
Alexis Nasr, Carlos Ramisch, Jos Deulofeu, and Andr Valli
Complex conjunctions and determiners are often considered as pretokenized units in parsing. This is not
always realistic, since they can be ambiguous. We propose a model for joint dependency parsing and multi-
word expressions identification, in which complex function words are represented as individual tokens linked
with morphological dependencies. Our graph-based parser includes standard second-order features and verbal
2
Code for the three models described in this paper can be found at www.stanford.edu/~jiweil/.
85
subcategorization features derived from a syntactic lexicon. We train it on a modified version of the French
Treebank enriched with morphological dependencies. It recognizes 81.79
1-63 End-to-end learning of semantic role labeling using recurrent neural networks
Jie Zhou and Wei Xu
Semantic role labeling SRL is one of the basic natural language processing NLP problems.To this date, the
most suc cessful SRL systems were built on top of some form of parsing results Koomenet al., 2005; Palmer
et al., 2010, where pre-defined feature templates over the syntactic structure are used.The attempts of building
an end-to-end SRL learning system without using parsing were less successful Collobert et al., 2011. In this
work, we propose to use deep bi-directional recurrent network as an end-to-end system for SRL. We take only
original text information as input feature, without using any syntactic knowledge. The proposed algorithm
was evaluated on CoNLL-2005 shared task on semantic role labeling and achieved F1score of 81.07. This
result outperforms the previous state-of-the-art system which is a result from combining 5 different parsing
trees from two different parsers. Our analysis shows that our model is better at handling longer sentences than
the traditional models. And the latent variables of our model implicitly captures the syntactic structure of a
sentence.
1-64 Feature Optimization for Constituent Parsing via Neural Networks
Zhiguo Wang, Haitao Mi, and Nianwen Xue
The performance of discriminative constituent parsing relies crucially on feature engineering, and effective
features usually have to be carefully selected through a painful manual process. In this paper, we propose a
method that automatically learns a set of optimal features. Specifically, we build a feedforward neural network
model, which takes as input a few primitive units words, POS tags and certain contextual tokens from the local
context, induces the feature representation in the hidden layer and makes parsing predictions in the output layer.
The network simultaneously learns the feature representation and the prediction model parameters using a back
propagation algorithm. By pre-training the model on a large amount of automatically parsed data, our model
achieves impressive improvements. Evaluated on the standard data sets, our final performance reaches 86.6
1-65 Identifying Cascading Errors using Constraints in Dependency Parsing

Dominick Ng and James R. Curran
Dependency parsers are usually evaluated on attachment accuracy. Whilst easily interpreted, the metric does
not illustrate the cascading impact of errors, where the parser chooses an incorrect arc, and is subsequently
forced to choose further incorrect arcs elsewhere in the parse.We apply arc-level constraints to MSTparser and
ZPar, enforcing the correct analysis of specific error classes, whilst otherwise continuing with decoding. We
investigate the direct and indirect impact of applying constraints to the parser. Erroneous NP and punctuation
attachments cause the most cascading errors, while incorrect PP and coordination attachments are frequent
but less influential. Punctuation is especially challenging, as it has long been ignored in parsing, and serves a
variety of disparate syntactic roles.
1-66 A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Net-
work
Chenxi Zhu, Xipeng Qiu, Xinchi Chen, and Xuanjing Huang
In this work, we address the problem to model all the nodes words or phrases in a dependency tree with the
dense representations. We propose a recursive convolutional neural network RCNN architecture to capture
syntactic and compositional-semantic representations of phrases and words in a dependency tree. Different
with the original recursive neural network, we introduce the convolution and pooling layers, which can model
a variety of compositions by the feature maps and choose the most informative compositions by the pooling
layers. Based on RCNN, we use a discriminative model to re-rank a k-best list of candidate dependency parsing
trees. The experiments show that RCNN is very effective to improve the state-of-the-art dependency parsing
on both English and Chinese datasets.
1-67 Transition-based Neural Constituent Parsing

Taro Watanabe and Eiichiro Sumita
Constituent parsing is typically modeled by a chart-based algorithm under probabilistic context-free grammars
or by a transition-based algorithm with rich features. Previous models rely heavily on richer syntactic informa-
tion through lexicalizing rules, splitting categories, or memorizing long histories. However enriched models
incur numerous parameters and sparsity issues, and are insufficient for capturing various syntactic phenom-
ena. We propose a neural network structure that explicitly models the unbounded history of actions performed
86
on the stack and queue employed in transition-based parsing, in addition to the representations of partially
parsed tree structure. Our transition-based neural constituent parsing achieves performance comparable to the
state-of-the-art parsers, demonstrating F1 score of 90.68
1-68 Feature Selection in Kernel Space: A Case Study on Dependency Parsing

Xian Qian and Yang Liu
Given a set of basic binary features, we propose a new L1 norm SVM based feature selection method that
explicitly selects the features in their polynomial or tree kernel spaces. The efficiency comes from the anti-
monotone property of the subgradients: the subgradient with respect to a combined feature can be bounded
by the subgradient with respect to each of its component features, and a feature can be pruned safely without
further consideration if its corresponding subgradient is not steep enough. We conduct experiments on the
English dependency parsing task. Benefiting from the rich features selected in the tree kernel space, our model
achieved the best reported unlabeled attachment score of 93.72 without using any additional resource.
1-69 Semantic Role Labeling Improves Incremental Parsing

Ioannis Konstas and Frank Keller
Incremental parsing is the task of assigning a syntactic structure to an input sentence as it unfolds word by
word. Incremental parsing is more difficult than full-sentence parsing, as incomplete input increases ambiguity.
Intuitively, an incremental parser that has access to semantic information should be able to reduce ambiguity by
ruling out semantically implausible analyses, even for incomplete input. In this paper, we test this hypothesis by
combining an incremental TAG parser with an incremental semantic role labeler in a discriminative framework.
We show a substantial improvement in parsing performance compared to the baseline parser, both in full-
sentence F-score and in incremental F-score.
1-70 Discontinuous incremental shift-reduce parsing
Wolfgang Maier
We present an extension to incremental shift-reduce parsing that handles discontinuous constituents, using a
linear classifier and beam search. We achieve very high parsing speeds up to 640 sent./sec. and accurate results
up to 79.52 F1 on TiGer.
1-71 A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency

Parsing
Hao Zhou, Yue Zhang, Shujian Huang, and Jiajun Chen
Neural probabilistic parsers are attractive for their capability of automatic feature combination and small data
sizes. A transition-based greedy neural parser has given better accuracies over its linear counterpart. We
propose a neural probabilistic structured-prediction model for transition-based dependency parsing, which in-
tegrates search and learning. Beam search is used for decoding, and contrastive learning is performed for
maximizing the sentence-level log-likelihood. In standard Penn Treebank experiments, the structured neural
parser achieves a 1.8% accuracy improvement upon a competitive greedy neural parser baseline, giving perfor-
mance comparable to the best linear parser.
1-72 Parsing Paraphrases with Joint Inference

Do Kook Choe and David McClosky
Treebanks are key resources for developing accurate statistical parsers. However, building treebanks is expen-
sive and time-consuming for humans. For domains requiring deep subject matter expertise such as law and
medicine, treebanking is even more difficult. To reduce annotation costs for these domains, we develop meth-
ods to improve cross-domain parsing inference using paraphrases. Paraphrases are easier to obtain than full
syntactic analyses as they do not require deep linguistic knowledge, only linguistic fluency. A sentence and
its paraphrase may have similar syntactic structures, allowing their parses to mutually inform each other. We
present several methods to incorporate paraphrase information by jointly parsing a sentence with its paraphrase.
These methods are applied to state-of-the-art constituency and dependency parsers and provide significant im-
provements across multiple domains.
1-73 Cross-lingual Dependency Parsing Based on Distributed Representations

Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu
This paper investigates the problem of cross-lingual dependency parsing, aiming at inducing dependency
parsers for low-resource languages while using only training data from a resource-rich language e.g. En-
glish. Existing approaches typically dont include lexical features, which are not transferable across languages.
87
System Demonstrations
In this paper, we bridge the lexical feature gap by using distributed feature representations and their compo-
sition. We provide two algorithms for inducing cross-lingual distributed representations of words, which map
vocabularies from two different languages into a common vector space. Consequently, both lexical features and
non-lexical features can be used in our model for cross-lingual transfer.Furthermore, our framework is able to
incorporate additional useful features such as cross-lingual word clusters. Our combined contributions achieve
an average relative error reduction of 10.9% in labeled attachment score as compared with the delexicalized
parser, trained on English universal treebank and transferred to three other languages. It also significantly
outperforms McDonald et al. 2013 augmented with projected cluster features on identical data.
1 A System Demonstration of a Framework for Computer Assisted Pronunciation

Training
Renlong Ai and Feiyu Xu
2 IMI - A Multilingual Semantic Annotation Environment
Francis Bond, Lus Morgado da Costa, and Tun Anh L
3 In-tool Learning for Selective Manual Annotation in Large Corpora
Erik-Ln Do Dinh, Richard Eckart De Castilho, and Iryna Gurevych
4 KeLP: a Kernel-based Learning Platform for Natural Language Processing
Simone Filice, Giuseppe Castellucci, Danilo Croce, and Roberto Basili
5 Multi-modal Visualization and Search for Text and Prosody Annotations
Markus Grtner, Katrin Schweitzer, Kerstin Eckart, and Jonas Kuhn
6 NEED4Tweet: A Twitterbot for Tweets Named Entity Extraction and Disambiguation
Mena Habib and Maurice van Keulen
7 Visual Error Analysis for Entity Linking
Benjamin Heinzerling and Michael Strube
8 A Web-based Collaborative Evaluation Tool for Automatically Learned Relation
Extraction Patterns
Leonhard Hennig, Hong Li, Sebastian Krause, Feiyu Xu, and Hans Uszkoreit
9 A Dual-Layer Semantic Role Labeling System
Lun-Wei Ku, Shafqat Mumtaz Virk, and Yann-Huei Lee
10 A System for Fine-grained Aspect-based Sentiment Analysis of Chinese
Janna Lipenkova
11 Plug Latent Structures and Play Coreference Resolution
Sebastian Martschat, Patrick Claus, and Michael Strube
12 SCHNPPER: A Web Toolkit for Exploratory Relation Extraction
Thilo Michael and Alan Akbik
13 OMWEdit - The Integrated Open Multilingual Wordnet Editing System
Lus Morgado da Costa and Francis Bond
14 SACRY: Syntax-based Automatic Crossword puzzle Resolution system
Alessandro Moschitti, Massimo Nicosia, and Gianni Barlacchi
15 LEXenstein: A Framework for Lexical Simplification
Gustavo Paetzold and Lucia Specia
16 Sharing annotations better: RESTful Open Annotation
Sampo Pyysalo, Jorge Campos, Juan Miguel Cejuela, Filip Ginter, Kai Hakala,
Chen Li, Pontus Stenetorp, and Lars Juhl Jensen
17 A Data Sharing and Annotation Service Infrastructure
Stelios Piperidis, Dimitrios Galanis, Juli Bakagianni, and Sokratis Sofianopoulos
18 JoBimViz: A Web-based Visualization for Graph-based Distributional Semantic
Models
Eugen Ruppert, Manuel Kaufmann, Martin Riedl, and Chris Biemann
88
Tuesday, July 28, 2015
19 End-to-end Argument Generation System in Debating

Misa Sato, Kohsuke Yanai, Toshinori Miyoshi, Toshihiko Yanase, Makoto Iwayama,
Qinghua Sun, and Yoshiki Niwa
20 Multi-level Translation Quality Prediction with QuEst++
Lucia Specia, Gustavo Paetzold, and Carolina Scarton
21 WA-Continuum: Visualising Word Alignments across Multiple Parallel Sentences
Simultaneously
David Steele and Lucia Specia
22 A Domain-independent Rule-based Framework for Event Extraction
Marco A. Valenzuela-Escrcega, Gustave Hahn-Powell, Mihai Surdeanu, and
Thomas Hicks
23 Storybase: Towards Building a Knowledge Base for News Events
Zhaohui Wu, Chen Liang, and C. Lee Giles
24 WriteAhead: Mining Grammar Patterns in Corpora for Assisted Writing
Tzu-Hsi Yen, Jian-Cheng Wu, Jim Chang, Joanne Boisson, and Jason Chang
25 NiuParser: A Chinese Syntactic and Semantic Parsing Toolkit
Jingbo Zhu, Muhua Zhu, Qiang Wang, and Tong Xiao
89
90
Main Conference: Tuesday, July 28
4
Overview

9:00 10:00 Keynote Address: "Can Natural Language Processing Become Natural Lan-
guage Coaching?" - Marti A. Hearst Plenary Hall B
Session 5
Machine Trans- Machine Learn- Semantics, Parsing, Tag- Information
lation ing and Topic Linguistic and ging Extraction
10:30 12:00 Modeling Psycholinguis-
tic Aspects of
CL
12:00 13:30 Lunch Break
Session 6
Discourse, Machine Learn- Semantics: Se- Sentiment Grammar In-
13:30 14:45 Pragmatics ing: Embed- mantic Parsing Analysis: duction and
dings Learning Annotation
309A 309B 310 311A 311B
Session 7
Discourse, Topic Modeling Semantics: Se- Lexical Seman- Parsing
15:15 16:30
Coreference mantic Parsing tics
309A 309B 310 311A 311B
16:30 19:30 Poster and Dinner Session 2: Short Papers, Student Research Workshop Papers
Plenary Hall B
16:30 17:55 Student Research Workshop: Oral Presentations 309A
18:00 19:30 Student Research Workshop: Posters Plenary Hall B
19:45 22:00 Social Event Plenary Hall B
91
Main Conference
92
Keynote Address: Marti A. Hearst
Chair: Michael Strube
Can Natural Language Processing Become Natural Language

Coaching?
Tuesday, July 28, 2015, 9:0010:00am
Plenary Hall B
Abstract:
How we teach and learn is undergoing a revolution, due to changes in technology and connectiv-
ity. Education may be one of the best application areas for advanced NLP techniques, and NLP
researchers have much to contribute to this problem, especially in the areas of learning to write,
mastery learning, and peer learning. In this talk I consider what happens when we convert natural
language processors into natural language coaches.
Biography: Marti Hearst is a Professor at UC Berkeley in the School of Information and EECS. She
received her PhD in CS from UC Berkeley in 1994 and was a member of the research staff at Xerox
PARC form 1994-1997. Her research is in computational linguistics, search user interfaces, informa-
tion visualization, and improving learning at scale. Her NLP work includes automatic acquisition of
hypernym relations ("Hearst Patterns"), TextTiling discourse segmentation, abbreviation recognition,
and multiword semantic relations. She wrote the book "Search User Interfaces" (Cambridge) in 2009,
co-founded the ACM Conference on Learning at Scale in 2014, and was named an ACM Fellow in
2013. She has received four student-initiated Excellence in Teaching Awards, including in 2014 and
2015.
93
Session 5
Session 5 Overview Tuesday, July 28, 2015

Machine Transla- Machine Learning Semantics, Linguis- Parsing, Tagging Information Extrac-
tion and Topic Modeling tic and Psycholin- tion
guistic Aspects of
CL
Lexicon Stratifica- Efficient Learning Distributional Neu- Word-based Improving distant
10:30
tion for Translating for Undirected ral Networks for Japanese typed supervision using
Out-of-Vocabulary Topic Models Automatic Resolu- dependency parsing inference learning
Words tion of Crossword with grammatical
Gu and Li Puzzles Roller, Agirre,
Tsvetkov and Dyer function analysis
Severyn, Nicosia, Soroa, and Steven-
Tanaka and Nagata
Barlacchi, and son
Moschitti
Recurrent Neu- A Hassle-Free Un- Word Order Ty- KLcpos3 - a Lan- A Lexicalized Tree
10:45
ral Network based supervised Domain pology through guage Similarity Kernel for Open
Rule Sequence Adaptation Method Multilingual Word Measure for Delex- Information Extrac-
Model for Sta- Using Instance Alignment icalized Parser tion
tistical Machine Similarity Features Transfer
Translation Ostling Xu, Ringlstetter,
Yu and Jiang Rosa and Zabokrt- Kim, Kondrak,
Yu and Zhu
sky Goebel, and Miyao
Discriminative Pre- Dependency-based Measuring idiosyn- CCG Supertagging A Dependency-
11:00
ordering Meets Convolutional Neu- cratic interests with a Recurrent Based Neural Net-
Kendalls Maxi- ral Networks for in children with Neural Network work for Relation
mization Sentence Embed- autism Classification
ding Xu, Auli, and Clark
Hoshino, Miyao, Rouhizadeh, Liu, Wei, Li, Ji,
Sudoh, Hayashi, Ma, Huang, Zhou, Prudhommeaux, Zhou, and Wang
and Nagata and Xiang Van Santen, and
Sproat
Evaluating Ma- Non-Linear Text Frame-Semantic An Efficient Dy- Embedding Meth-
11:15
chine Translation Regression with a Role Labeling with namic Oracle for ods for Fine
Systems with Sec- Deep Convolutional Heterogeneous Unrestricted Non- Grained Entity
ond Language Pro- Neural Network Annotations Projective Parsing Type Classification
ficiency Tests Bitvai and Cohn Kshirsagar, Thom- Gmez-Rodrguez Yogatama, Gillick,
Matsuzaki, Fujita, son, Schneider, and Fernndez- and Lazic
Todo, and Arai Carbonell, Smith, Gonzlez
and Dyer
Representation A Unified Learn- Semantic Interpre- Synthetic Word Sieve-Based Enti-
11:30
Based Translation ing Framework of tation of Superla- Parsing Improves ty Linking for the
Evaluation Metrics Skip-Grams and tive Expressions via Chinese Word Seg- Biomedical Do-
Chen and Guo Global Vectors Structured Knowl- mentation main
Suzuki and Nagata edge Bases Cheng, Duh, and DSouza and Ng
Zhang, Feng, Matsumoto
Huang, Xu, Han,
and Zhao
Exploring the Plan- Pre-training of Grounding Seman- If all you have is Open IE as an In-
11:45
et of the APEs: a Hidden-Unit CRFs tics in Olfactory a bit of the Bible: termediate Struc-
Comparative Study Perception Learning POS tag- ture for Semantic
of State-of-the-art Kim, Stratos, and Tasks
gers for truly low-
Methods for MT Sarikaya Kiela, Bulat, and resource languages
Automatic Post- Stanovsky and Da-
Clark
Editing Agic, Hovy, and gan
Chatterjee, Weller, Sgaard
Negri, and Turchi
94
Parallel Session 5
Session 5A: Machine Translation

Plenary Hall B Chair: Min Zhang
Lexicon Stratification for Translating Out-of-Vocabulary Words
Yulia Tsvetkov and Chris Dyer 10:3010:45
A language lexicon can be divided into four main strata, depending on origin of words: core vocabulary words,
fully- and partially-assimilated foreign words, and unassimilated foreign words or transliterations. This paper
focuses on translation of fully- and partially-assimilated foreign words, called borrowed words. Borrowed
words or loanwords are content words found in all languages, occupying up to 70
Recurrent Neural Network based Rule Sequence Model for Statistical Machine Translation
Heng Yu and Xuan Zhu 10:4511:00
The inability to model long-distance dependency has been handicapping SMT for years.Specifically, the context
independence assumption makes it hard to capture the dependency between translation rules. In this paper, we
introduce a novel recurrent neural network based rule sequence model to incorporate arbitrary long contextual
information during estimating probabilities of rule sequences. Moreover, our model frees the translation model
from keeping huge and redundant grammars, resulting in more efficient training and decoding. Experimental
results show that our method achieves a 0.9 point BLEU gain over the baseline, and a significant reduction in
rule table size for both phrase-based and hierarchical phrase-based systems.
Discriminative Preordering Meets Kendalls Maximization

Sho Hoshino, Yusuke Miyao, Katsuhito Sudoh, Katsuhiko Hayashi, and Masaaki Nagata 11:0011:15
This paper explores a simple discriminative preordering model for statistical machine translation. Our model
traverses binary constituent trees, and classifies whether children of each node should be reordered. The model
itself is not extremely novel, but herein we introduce a new procedure to determine oracle labels so as to
maximize Kendalls . Experiments in Japanese-to-English translation revealed that our simple method is
comparable with, or superior to, state-of-the-art methods in translation accuracy.
Evaluating Machine Translation Systems with Second Language Proficiency Tests

Takuya Matsuzaki, Akira Fujita, Naoya Todo, and Noriko H. Arai 11:1511:30
A lightweight, human-in-the-loop evaluation scheme for machine translation MT systems is proposed. It ex-
trinsically evaluates MT systems using human subjects scores on second language ability test problems that
are machine-translated to the subjects native language. A large-scale experiment involving 320 subjects re-
vealed that the context-unawareness of the current MT systems severely damages human performance when
solving the test problems, while one of the evaluated MT systems performed as good as a human translation
produced in a context-unaware condition. An analysis of the experimental results showed that the extrinsic
evaluation captured a different dimension of translation quality than that captured by manual and automatic
intrinsic evaluation.
Representation Based Translation Evaluation Metrics
Boxing Chen and Hongyu Guo 11:3011:45
Precisely evaluating the quality of a translation against human references is a challenging task due to the flexible
word ordering of a sentence and the existence of a large number of synonyms for words. This paper proposes to
evaluate translations with distributed representations of words and sentences. We study several metrics based
on word and sentence representations and their combination. Experiments on the WMT metric task shows that
the metric based on the combined representations achieves the best performance, outperforming the state-of-
the-art translation metrics by a large margin. In particular, training the distributed representations only needs a
reasonable amount of monolingual, unlabeled data that is not necessary drawn from the test domain.
Exploring the Planet of the APEs: a Comparative Study of State-of-the-art Methods for MT
Automatic Post-Editing
Rajen Chatterjee, Marion Weller, Matteo Negri, and Marco Turchi 11:4512:00
Downstream processing of machine translation MT output promises to be a solution to improve translation
quality, especially when the MT systems internal decoding process is not accessible. Both rule-based and
statistical automatic post-editing APE methods have been proposed over the years, but with contrasting results.
95
Session 5
A missing aspect in previous evaluations is the assessment of different methods: i under comparable conditions,
and ii on different language pairs featuring variable levels of MT quality. Focusing on statistical APE methods
more portable across languages, we propose the first systematic analysis of two approaches. To understand
their potential, we compare them in the same conditions over six language pairs having English as source. Our
results evidence consistent improvements on all language pairs, a relation between the extent of the gain and
MT output quality, slight but statistically significant performance differences between the two methods, and
their possible complementarity.
96
Session 5B: Machine Learning and Topic Modeling

309A Chair: Percy Liang
Efficient Learning for Undirected Topic Models
Jiatao Gu and Victor O.K. Li 10:3010:45
Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic repre-
sentations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient.
This paper provides a novel estimator to speed up the learning based on Noise Contrastive Estimate, extended
for documents of variant lengths and weighted inputs. Experiments on two benchmarks show that the new
estimator achieves great learning efficiency and high accuracy on document retrieval and classification.
A Hassle-Free Unsupervised Domain Adaptation Method Using Instance Similarity Features

Jianfei Yu and Jing Jiang 10:4511:00
We present a simple yet effective unsupervised domain adaptation method that can be generally applied for
different NLP tasks. Our method uses unlabeled target domain instances to induce a set of instance similar-
ity features. These features are then combined with the original features to represent labeled source domain
instances. Using three NLP tasks, we show our method consistently outperforms a few baselines, including
SCL, an existing general unsupervised domain adaptation method widely used in NLP. More importantly, our
method is very easy to implement and incurs much less computational cost than SCL.
Dependency-based Convolutional Neural Networks for Sentence Embedding

Mingbo Ma, Liang Huang, Bowen Zhou, and Bing Xiang 11:0011:15
In sentence modeling and classification, convolutional neural network approaches have recently achieved state-
of-the-art results, but all such efforts process word vectors sequentially and neglect long-distance dependencies.
To combine deep learning with linguistic structures, we propose a dependency-based convolution approach,
making use of tree-based n-grams rather than surface ones, thus utlizing nonlocal interactions between words.
Our model improves sequential baselines on all four sentiment and question classification tasks, and achieves
the highest published accuracy on TREC.
Non-Linear Text Regression with a Deep Convolutional Neural Network

Zsolt Bitvai and Trevor Cohn 11:1511:30
Text regression has traditionally been tackled using linear models. Here we present a non-linear method based
on a deep convolutional neural network. We show that despite having millions of parameters, this model can
be trained on only a thousand documents, resulting in a 40
A Unified Learning Framework of Skip-Grams and Global Vectors

Jun Suzuki and Masaaki Nagata 11:3011:45
Log-bilinear language models such as SkipGram and GloVe have been proven to capture high quality syntactic
and semantic relationships between words in a vector space. We revisit the relationship between SkipGram
and GloVe models from a machine learning viewpoint, and show that these two methods are easily merged
into a unified form. Then, by using the unified form, we extract the factors of the configurations that they use
differently. We also empirically investigate which factor is responsible for the performance difference often
observed in widely examined word similarity and analogy tasks.
Pre-training of Hidden-Unit CRFs

Young-Bum Kim, Karl Stratos, and Ruhi Sarikaya 11:4512:00
In this paper, we apply the concept of pretraining to hidden-unit conditional random fields HUCRFs to enable
learning on unlabeled data. We present a simple yet effective pre-training technique that learns to associate
words with their clusters, which are obtained in an unsupervised manner. The learned parameters are then used
to initialize the supervised learning process. We also propose a word clustering technique based on canonical
correlation analysis CCA that is sensitive to multiple word senses, to further improve the accuracy within the
proposed framework. We report consistent gains over standard conditional random fields CRFs and HUCRFs
without pre-training in semantic tagging, named entity recognition NER, and part-of-speech POS tagging tasks,
which could indicate the task independent nature of the proposed technique.
97
Session 5
Session 5C: Semantics, Linguistic and Psycholinguistic Aspects of CL

310 Chair: Dipanjan Das
Distributional Neural Networks for Automatic Resolution of Crossword Puzzles
Aliaksei Severyn, Massimo Nicosia, Gianni Barlacchi, and Alessandro Moschitti 10:3010:45
Automatic resolution of Crossword Puzzles CPs heavily depends on the quality of the answer candidate lists
produced by a retrieval system for each clue of the puzzle grid. Previous work has shown that such lists can
be generated using Information Retrieval IR search algorithms applied to the databases containing previously
solved CPs and reranked with tree kernels TKs applied to a syntactic tree representation of the clues. In this
paper, we create a labelled dataset of 2 million clues on which we apply an innovative Distributional Neural
Network DNN for reranking clue pairs. Our DNN is computationally efficient and can thus take advantage of
such large datasets showing a large improvement over the TK approach, when the latter uses small training
data. In contrast, when data is scarce, TKs outperform DNNs.
Word Order Typology through Multilingual Word Alignment

Robert Ostling 10:4511:00
With massively parallel corpora of hundreds or thousands of translations of the same text, it is possible to auto-
matically perform typological studies of language structure using very large language samples. We investigate
the domain of word order using multilingual word alignment and high-precision annotation transfer in a corpus
with 1144 translations in 986 languages of the New Testament. Results are encouraging, with 86
Measuring idiosyncratic interests in children with autism

Masoud Rouhizadeh, Emily Prudhommeaux, Jan Van Santen, and Richard Sproat 11:0011:15
A defining symptom of autism spectrum disorder ASD is the presence of restricted and repetitive activities
and interests, which can surface in language as a perseverative focus on idiosyncratic topics. In this paper, we
use semantic similarity measures to identify such idiosyncratic topics in narratives produced by children with
and without ASD. We find that neurotypical children tend to use the same words and semantic concepts when
retelling the same narrative, while children with ASD, even when producing accurate retellings, use different
words and concepts relative not only to neurotypical children but also to other children with ASD. Our results
indicate that children with ASD not only stray from the target topic but do so in idiosyncratic ways according
to their own restricted interests.
Frame-Semantic Role Labeling with Heterogeneous Annotations
Meghana Kshirsagar, Sam Thomson, Nathan Schneider, Jaime Carbonell, Noah A. Smith, and Chris
Dyer 11:1511:30
We consider the task of identifying and labeling the semantic arguments of a predicate that evokes a FrameNet
frame. This task is challenging because there are only a few thousand fully annotated sentences for supervised
training. Our approach augments an existing model with features derived from FrameNet and PropBank and
with partially annotated exemplars from FrameNet. We observe a 4
Semantic Interpretation of Superlative Expressions via Structured Knowledge Bases

Sheng Zhang, Yansong Feng, Songfang Huang, Kun Xu, Zhe Han, and Dongyan Zhao 11:3011:45
This paper addresses a novel task of semantically analyzing the comparative constructions inherent in attribu-
tive superlative expressions against structured knowledge bases. We exploit Wikipedia and Freebase to collect
training data in an unsupervised manner, where a neural network model is learnt to select, from Freebase pred-
icates, the most appropriate comparison dimensions for a given superlative expression, and further determine
its ranking order heuristically. Experimental results show that it is possible to learn from coarsely obtained
training data to semantically characterize the comparative constructions involved in superlative expressions.
Grounding Semantics in Olfactory Perception

Douwe Kiela, Luana Bulat, and Stephen Clark 11:4512:00
Multi-modal semantics has relied on feature norms or raw image data for perceptual input. In this paper we
examine grounding semantic representations in olfactory smell data, through the construction of a novel bag
of chemical compounds model. We use standard evaluations for multi-modal semantics, including measuring
conceptual similarity and cross-modal zero-shot learning. To our knowledge, this is the first work to evaluate
semantic similarity on representations grounded in olfactory data.
98
Session 5D: Parsing, Tagging

311A Chair: Jason Eisner
Word-based Japanese typed dependency parsing with grammatical function analysis
Takaaki Tanaka and Masaaki Nagata 10:3010:45
We present a novel scheme for word-based Japanese typed dependency parsing which integrates syntactic struc-
ture analysis and grammatical function analysis such as predicate-argument structure analysis. Compared to
bunsetsu-based dependency parsing, which is predominantly used in Japanese NLP, it provides a natural way
of extracting syntactic constituents, which is useful for downstream applications such as statistical machine
translation. It also makes it possible to jointly decide dependency and predicate-argument structure, which is
usually implemented as two separate steps.By using grammatical functions as dependency labels, we achieved
a better accuracy for assigning function labels than SynCha, while keeping the converted bunsetsu-based de-
pendency accuracy as high as CaboCha, where they are ones of the state-of-the-art predicate-argument structure
analyzers and dependency parsers, respectively.
KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer

Rudolf Rosa and Zdenek Zabokrtsky 10:4511:00
We present KLcpos3, a language similarity measure based on Kullback-Leibler divergence of coarse part-of-
speech tag trigram distributions in tagged corpora. It has been designed for multilingual delexicalized parsing,
both for source treebank selection in single-source parser transfer, and for source treebank weighting in multi-
source transfer. In the selection task, KLcpos3 identifies the best source treebank in 8 out of 18 cases. In the
weighting task, it brings +4.5
CCG Supertagging with a Recurrent Neural Network

Wenduan Xu, Michael Auli, and Stephen Clark 11:0011:15
Recent work on supertagging using a feedforward neural network achieved significant improvements for CCG
supertagging and parsing Lewis and Steedman, 2014. However, their architecture is limited to considering
local contexts and does not naturally model sequences of arbitrary length. In this paper, we show how directly
capturing sequence information using a recurrent neural network leads to further accuracy improvements for
both supertagging up to 1.9
An Efficient Dynamic Oracle for Unrestricted Non-Projective Parsing

Carlos Gmez-Rodrguez and Daniel Fernndez-Gonzlez 11:1511:30
We define a dynamic oracle for the Covington non-projective dependency parser. This is not only the first
dynamic oracle that supports arbitrary non-projectivity, but also considerably more efficient On than the only
existing oracle with restricted non-projectivity support. Experiments show that training with the dynamic oracle
significantly improves parsing accuracy over the static oracle baseline on a wide range of treebanks.
Synthetic Word Parsing Improves Chinese Word Segmentation

Fei Cheng, Kevin Duh, and Yuji Matsumoto 11:3011:45
We present a novel solution to improve the performance of Chinese word segmentation CWS using a synthetic
word parser. The parser analyses the internal structure of words, and attempts to convert out-of-vocabulary
words OOVs into in-vocabulary fine-grained sub-words. We propose a pipeline CWS system that first predicts
this fine-grained segmentation, then chunks the output to reconstruct the original word segmentation standard.
We achieve competitive results on the PKU and MSR datasets, with substantial improvements in OOV recall.
If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages
eljko Agic, Dirk Hovy, and Anders Sgaard 11:4512:00
We present a simple method for learning part-of-speech taggers for languages like Akawaio, Aukan, or Cakchiquel
languages for which nothing but a translation of parts of the Bible exists. By aggregating over the tags from
a few annotated languages and spreading them via word-alignment on the verses, we learn POS taggers for
100 languages, using the languages to bootstrap each other. We evaluate our cross-lingual models on the 25
languages where test sets exist, as well as on another 10 for which we have tag dictionaries. Our approach
performs much better 20-30
99
Session 5
Session 5E: Information Extraction

311B Chair: Ming-Wei Chang
Improving distant supervision using inference learning
Roland Roller, Eneko Agirre, Aitor Soroa, and Mark Stevenson 10:3010:45
Distant supervision is a widely applied approach to automatic training of relation extraction systems and has
the advantage that it can generate large amounts of labelled data with minimal effort. However, this data
may contain errors and consequently systems trained using distant supervision tend not to perform as well
as those based on manually labelled data. This work proposes a novel method for detecting potential false
negative training examples using a knowledge inference method. Results show that our approach improves the
performance of relation extraction systems trained using distantly supervised data.
A Lexicalized Tree Kernel for Open Information Extraction

Ying Xu, Christoph Ringlstetter, Mi-Young Kim, Grzegorz Kondrak, Randy Goebel, and Yusuke Miyao
10:4511:00
In contrast with traditional relation extraction, which only considers a fixed set of relations, Open Information
Extraction Open IE aims at extracting all types of relations from text. Because of data sparseness, Open IE
systems typically ignore lexical information, and instead employ parse trees and Part-of-Speech POS tags.
However, the same syntactic structure may correspond to different relations. In this paper, we propose to use
a lexicalized tree kernel based on the word embeddings created by a neural network model. We show that the
lexicalized tree kernel model surpasses the unlexicalized model. Experiments on three datasets indicate that
our Open IE system performs better on the task of relation extraction than the state-of-the-art Open IE systems
of Xu et al. 2013 and Mesquita et al. 2013.
A Dependency-Based Neural Network for Relation Classification

Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, and Houfeng Wang 11:0011:15
Previous research on relation classification has verified the effectiveness of using dependency shortest paths
or subtrees. In this paper, we further explore how to make full use of the combination of these dependency
information. We first propose a new structure, termed augmented dependency pathADP, which is composed
of the shortest dependency path between two entities and the subtrees attached to the shortest path. To exploit
the semantic representation behind the ADP structure, we develop dependency-based neural networks DepNN:
a recursive neural network designed to model the subtrees, and a convolutional neural network to capture the
most important features on the shortest path. Experiments on the SemEval-2010 dataset show that our proposed
method achieves state-of-art results.
Embedding Methods for Fine Grained Entity Type Classification
Dani Yogatama, Daniel Gillick, and Nevena Lazic 11:1511:30
We propose a new approach to the task of fine grained entity type classifications based on label embeddings
that allows for information sharing among related labels. Specifically, we learn an embedding for each label
and each feature such that labels which frequently co-occur are close in the embedded space. We show that it
outperforms state-of-the-art methods on two fine grained entity-classification benchmarks and that the model
can exploit the finer-grained labels to improve classification of standard coarse types.
Sieve-Based Entity Linking for the Biomedical Domain

Jennifer DSouza and Vincent Ng 11:3011:45
We examine a key task in biomedical text processing, normalization of disorder mentions. We present a multi-
pass sieve approach to this task, which has the advantage of simplicity and modularity. Our approach was eval-
uated on two datasets, one comprising clinical reports and the other comprising biomedical abstracts, achieving
state-of-the-art results.
Open IE as an Intermediate Structure for Semantic Tasks
Gabriel Stanovsky and Ido Dagan 11:4512:00
Semantic applications typically extract information from intermediate structures derived from sentences, such
as dependency parse or semantic role labeling. In this paper, we study Open Information Extractions Open IE
output as an additional intermediate structure and find that for tasks such as text comprehension, word similarity
and word analogy it can be very effective. Specifically, for word analogy, Open IE-based embeddings surpass
the state of the art. We suggest that semantic applications will likely benefit from adding Open IE format to
their set of potential sentence-level structures.
100

Discourse, Machine Learn- Semantics: Se- Sentiment Anal- Grammar In-
Pragmatics ing: Embed- mantic Parsing ysis: Learning duction and An-
dings notation
309A 309B 310 311A 311B
Machine Com- Model-based Scalable Se- Predicting A convex and
13:30
prehension with Word Embed- mantic Parsing Polarities of feature-rich
Discourse Rela- dings from De- with Partial On-Tweets by discriminative
tions compositions of tologies Composing approach to de-
Narasimhan Count Matrices Choi, Word Embed- pendency gram-
and Barzilay Stratos, Collins, Kwiatkowski, dings with Long mar induction
and Hsu and Zettlemoyer Short-Term Grave and El-
Memory hadad
Wang, Liu,
Sun, Wang, and
Wang
Implicit Role Entity Hierar- Semantic Pars- Topic Modeling Parse Imputa-
13:55
Linking on Chi- chy Embedding ing via Staged based Senti- tion for Depen-
nese Discourse: Hu, Huang, Query Graph ment Analysis dency Annota-
Exploiting Ex- Deng, Gao, and Generation: on Social Media tions
plicit Roles and Xing Question An- for Stock Mar- Mielens, Sun,
Frame-to-Frame swering with ket Prediction and Baldridge
Relations Knowledge Nguyen and
Li, Wu, Wang, Base Shirai
and Chai Yih, Chang, He,
and Gao
Discourse- Orthogonality Building a Se- Learning Tag Probing the
14:20
sensitive Au- of Syntax and mantic Parser Embeddings Linguistic
tomatic Identifi- Semantics with- Overnight and Tag-specific Strengths and
cation of Gener- in Distributional Wang, Berant, Composition Limitations of
ic Expressions Spaces and Liang Functions in Unsupervised
Friedrich and Mitchell and Recursive Neu- Grammar In-
Pinkal Steedman ral Network duction
Qian, Tian, Bisk and Hock-
Huang, Liu, enmaier
Zhu, and Zhu
101
Session 6
Parallel Session 6
Session 6A: Discourse, Pragmatics

309A Chair: Anette Frank
Machine Comprehension with Discourse Relations
Karthik Narasimhan and Regina Barzilay 13:3013:55
This paper proposes a novel approach for incorporating discourse information into machine comprehension
applications. Traditionally, such information is computed using off-the-shelf discourse analyzers. This de-
sign provides limited opportunities for guiding the discourse parser based on the requirements of the target
task. In contrast, our model induces relations between sentences while optimizing a task-specific objective.
This approach enables the model to benefit from discourse information without relying on explicit annotations
of discourse structure during training. The model jointly identifies relevant sentences, establishes relations
between them and predicts an answer. We implement this idea in a discriminative framework with hidden vari-
ables that capture relevant sentences and relations unobserved during training. Our experiments demonstrate
that the discourse aware model outperforms state-of-the-art machine comprehension systems.
Implicit Role Linking on Chinese Discourse: Exploiting Explicit Roles and Frame-to-Frame
Relations
Ru Li, Juan Wu, Zhiqiang Wang, and Qinghua Chai 13:5514:20
There is a growing interest in researching null instantiations, which are those implicit semantic arguments.
Many of these implicit arguments can be linked to referents in context, and their discoveries are of great benefits
to semantic processing. We address the issue of automatically identifying and resolving implicit arguments in
Chinese discourse. For their resolutions, we present an approach that combines the information about overtly
labeled arguments and frame-to-frame relations defined by FrameNet. Experimental results on our created
corpus demonstrate the effectiveness of our approach.
Discourse-sensitive Automatic Identification of Generic Expressions

Annemarie Friedrich and Manfred Pinkal 14:2014:45
This paper describes a novel sequence labeling method for identifying generic expressions, which refer to
kinds or arbitrary members of a class, in discourse context. The automatic recognition of such expressions is
important for any natural language processing task that requires text understanding. Prior work has focused
on identifying generic noun phrases; we present a new corpus in which not only subjects but also clauses are
annotated for genericity according to an annotation scheme motivated by semantic theory. Our context-aware
approach for automatically identifying generic expressions uses conditional random fields and outperforms
previous work based on local decisions when evaluated on this corpus and on related data sets ACE-2 and
ACE-2005.
102
Session 6B: Machine Learning: Embeddings

309B Chair: Xiaodan Zhu
Model-based Word Embeddings from Decompositions of Count Matrices
Karl Stratos, Michael Collins, and Daniel Hsu 13:3013:55
This work develops a new statistical understanding of word embeddings induced from transformed count data.
Using the class of hidden Markov models HMMs underlying Brown clustering as a generative model, we
demonstrate how canonical correlation analysis CCA and certain count transformations permit efficient and
effective recovery of model parameters with lexical semantics. We further show in experiments that these
techniques empirically outperform existing spectral methods on word similarity and analogy tasks, and are also
competitive with other popular methods such as WORD2VEC and GLOVE.
Entity Hierarchy Embedding

Zhiting Hu, Poyao Huang, Yuntian Deng, Yingkai Gao, and Eric Xing 13:5514:20
Existing distributed representations are limited in utilizing structured knowledge to improve semantic related-
ness modeling. We propose a principled framework of embedding entities that integrates hierarchical infor-
mation from large-scale knowledge bases. The novel embedding model associates each category node of the
hierarchy with a distance metric. To capture structured semantics, the entity similarity of context prediction are
measured under the aggregated metrics of relevant categories along all inter-entity paths. We show that both
the entity vectors and category distance metrics encode meaningful semantics. Experiments in entity linking
and entity search show superiority of the proposed method.
Orthogonality of Syntax and Semantics within Distributional Spaces

Jeff Mitchell and Mark Steedman 14:2014:45
A recent distributional approach to word-analogy problems [mikolovetal2013] exploits interesting regularities
in the structure of the space of representations. Investigating further, we find that performance on this task can
be related to orthogonality within the space. Explicitly designing such structure into a neural network model
results in representations that decompose into orthogonal semantic and syntactic subspaces. We demonstrate
that learning from word-order and morphological structure within English Wikipedia text to enable this de-
composition can produce substantial improvements on semantic-similarity, pos-induction and word-analogy
tasks.
103
Session 6
Session 6C: Semantics: Semantic Parsing

310 Chair: Oscar Tckstrm
Scalable Semantic Parsing with Partial Ontologies
Eunsol Choi, Tom Kwiatkowski, and Luke Zettlemoyer 13:3013:55
We consider the problem of building scalable Freebase semantic parsers, and present a new approach for
learning to do partial analyses that ground as much of the input text as possible without requiring that all
content words be mapped to Freebase concepts. We study this problem on two newly introduced large-scale
noun phrase datasets, and introduce a new semantic parsing model and semi-supervised learning approach
for reasoning with partial ontological support. Experiments demonstrate strong performance on two tasks:
referring expression resolution and entity attribute extraction. In both cases, the partial analyses allow us
to improve precision over strong baselines, while parsing many phrases that would be ignored by existing
techniques.
Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge
Base
Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao 13:5514:20
We propose a novel semantic parsing framework for question answering using a knowledge base. We define
a query graph that resembles subgraphs of the knowledge base and can be directly mapped to a logical form.
Semantic parsing is reduced to query graph generation, formulated as a staged search problem. Unlike tradi-
tional approaches, our method leverages the knowledge base in an early stage to prune the search space and
thus simplifies the semantic matching problem. By applying an advanced entity linking system and a deep
convolutional neural network model that matches questions and predicate sequences, our system outperforms
previous methods substantially, and achieves an F1 measure of 52.5
Building a Semantic Parser Overnight

Yushi Wang, Jonathan Berant, and Percy Liang 14:2014:45
How do we build a semantic parser in a new domain starting with zero training examples? We introduce a new
methodology for this setting, which first uses a domain-general grammar and a domain-specific seed lexicon
to generate logical forms paired with canonical utterances that capture the meaning of the logical forms. By
construction, the grammar ensures complete coverage of the desired set of compositional operators. We then
use crowdsourcing to paraphrase these canonical utterances into natural utterances. The resulting data is used
to train the semantic parser. We further study the role of compositionality in the resulting paraphrases. Finally,
we test our methodology on seven domains and show that we can build an adequate semantic parser in just a
few hours.
104
Session 6D: Sentiment Analysis: Learning

311A Chair: Lun-Wei Ku
Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Mem-
ory
Xin Wang, Yuanchao Liu, Chengjie Sun, Baoxun Wang, and Xiaolong Wang 13:3013:55
In this paper, we introduce Long Short-Term Memory LSTM recurrent network for twitter sentiment predic-
tion. With the help of gates and constant error carousels in the memory block structure, the model could handle
interactions between words through a flexible compositional function. Experiments on a public noisy labelled
data show that our model outperforms several feature-engineering approaches, with the result comparable to
the current best data-driven technique. According to the evaluation on a generated negation phrase test set,
the proposed architecture doubles the performance of non-neural model based on bag-of-word features. Fur-
thermore, words with special functions such as negation and transition are distinguished and the dissimilarities
of words with opposite sentiment are magnified. An insightful case study on negation expression processing
shows a promising potential of the architecture dealing with complex sentiment phrases.
Topic Modeling based Sentiment Analysis on Social Media for Stock Market Prediction
Thien Hai Nguyen and Kiyoaki Shirai 13:5514:20
The goal of this research is to build a model to predict stock price movement using sentiments on social media.
A new feature which captures topics and their sentiments simultaneously is introduced in the prediction model.
In addition, a new topic model TSLDA is proposed to obtain this feature. Our method outperformed a model
using only historical prices by about 6.07
Learning Tag Embeddings and Tag-specific Composition Functions in Recursive Neural Net-
work
Qiao Qian, Bo Tian, Minlie Huang, Yang Liu, Xuan Zhu, and Xiaoyan Zhu 14:2014:45
Recursive neural network is one of the most successful deep learning models for natural language processing
due to the compositional nature of text. The model recursively composes the vector of a parent phrase from
those of child words or phrases, with a key component named composition function. Although a variety of
composition functions have been proposed, the syntactic information has not been fully encoded in the com-
position process. We propose two models, Tag Guided RNN TG-RNN for short which chooses a composition
function according to the part-of-speech tag of a phrase, and Tag Embedded RNN/RNTN TE-RNN/RNTN for
short which learns tag embeddings and then combines tag and word embeddings together. In the fine-grained
sentiment classification, experiment results show the proposed models obtain remarkable improvement: TG-
RNN/TE-RNN obtain remarkable improvement over baselines, TE-RNTN obtains the second best result among
all the top performing models, and all the proposed models have much less parameters/complexity than their
counterparts.
105
Session 6
Session 6E: Grammar Induction and Annotation

311B Chair: Fei Xia
A convex and feature-rich discriminative approach to dependency grammar induction
Edouard Grave and Nomie Elhadad 13:3013:55
In this paper, we introduce a new method for the problem of unsupervised dependency parsing. Most cur-
rent approaches are based on generative models. Learning the parameters of such models relies on solving
a non-convex optimization problem, thus making them sensitive to initialization. We propose a new convex
formulation to the task of dependency grammar induction. Our approach is discriminative, allowing the use
of different kinds of features. We describe an efficient optimization algorithm to learn the parameters of our
model, based on the Frank-Wolfe algorithm. Our method can easily be generalized to other unsupervised learn-
ing problems. We evaluate our approach on ten languages belonging to four different families, showing that
our method is competitive with other state-of-the-art methods.
Parse Imputation for Dependency Annotations

Jason Mielens, Liang Sun, and Jason Baldridge 13:5514:20
Syntactic annotation is a hard task, but it can be made easier by allowing annotators flexibility to leave many
aspects of a sentence underspecified. Unfortunately, partial annotations are not directly usable for training
parsers. We describe a method for imputing missing dependencies from sentences that have been partially
annotated using the Graph Fragment Language, such that a standard dependency parser can then be trained on
all annotations. We show that this strategy improves performance over not using partial annotations for English,
Chinese, Portuguese and Kinyarwanda, and that performance competitive with state-of-the-art unsupervised
and weakly-supervised parsers can be reached with just a few hours of annotation.
Probing the Linguistic Strengths and Limitations of Unsupervised Grammar Induction

Yonatan Bisk and Julia Hockenmaier 14:2014:45
Work in grammar induction should help shed light on the amount of syntactic structure that is discoverable
from raw word or tag sequences. But since most current grammar induction algorithms produce unlabeled
dependencies, it is difficult to analyze what types of constructions these algorithms can or cannot capture,
and, therefore, to identify where additional supervision may be necessary. This paper provides an in-depth
analysis of the errors made by unsupervised CCG parsers by evaluating them against the labeled dependencies
in CCGbank, hinting at new research directions necessary for progress in grammar induction.
106

Discourse, Topic Modeling Semantics: Se- Lexical Seman- Parsing
Coreference mantic Parsing tics
309A 309B 310 311A 311B
Entity-Centric Tea Party in [TACL] Effi- Sparse Over- Parsing as Re-
15:15
Coreference the House: A cient Inference complete Word duction
Resolution with Hierarchical and Structured Vector Repre- Fernndez-
Model Stacking Ideal Point Top- Learning for sentations Gonzlez and
Clark and Man- ic Model and Semantic Role Faruqui, Martins
ning Its Application Labeling Tsvetkov, Yo-
to Republican Tckstrm, gatama, Dyer,
Legislators Ganchev, and and Smith
in the 112th Das
Congress
Nguyen, Boyd-
Graber, Resnik,
and Miler
Learning KB-LDA: Joint- Compositional Learning Se- Optimal Shift-
15:40
Anaphoricity ly Learning Semantic Pars- mantic Word Reduce Con-
and Antecedent a Knowledge ing on Semi- Embeddings stituent Parsing
Ranking Fea- Base of Hierar- Structured Ta- based on Ordi- with Structured
tures for Coref- chy, Relations, bles nal Knowledge Perceptron
erence Resolu- and Facts Pasupat and Constraints Thang, Noji,
tion Movshovitz- Liang Liu, Jiang, Wei, and Miyao
Wiseman, Rush, Attias and Co- Ling, and Hu
Shieber, and hen
Weston
Transferring A Computation- Graph parsing Adding Seman- A Data-Driven,
16:05
Coreference ally Efficient with s-graph tics to Data- Factorization
Resolvers with Algorithm for grammars Driven Para- Parser for CCG
Posterior Regu- Learning Topi- Groschwitz, phrasing Dependency
larization cal Collocation Koller, and Te- Pavlick, Bos, Structures
Martins Models ichmann Nissim, Beller, Du, Sun, and
Zhao, Du, Van Durme, and Wan
Brschinger, Callison-Burch
Pate, Ciarami-
ta, Steedman,
and Johnson
107
Session 7
Parallel Session 7
Session 7A: Discourse, Coreference

309A Chair: Greg Durrett
Entity-Centric Coreference Resolution with Model Stacking
Kevin Clark and Christopher D. Manning 15:1515:40
Mention pair models that predict whether or not two mentions are coreferent have historically been very ef-
fective for coreference resolution, but do not make use of entity-level information. However, we show that the
scores produced by such models can be aggregated to define powerful entity-level features between clusters of
mentions. Using these features, we train an entity-centric coreference system that learns an effective policy for
building up coreference chains incrementally. The mention pair scores are also used to prune the search space
the system works in, allowing for efficient training with an exact loss function. We evaluate our system on the
English portion of the 2012 CoNLL Shared Task dataset and show that it improves over the current state of the
art.
Learning Anaphoricity and Antecedent Ranking Features for Coreference Resolution
Sam Wiseman, Alexander M. Rush, Stuart Shieber, and Jason Weston 15:4016:05
We introduce a simple, non-linear mention-ranking model for coreference resolution that attempts to learn
distinct feature representations for anaphoricity detection and antecedent ranking, which we encourage by pre-
training on a pair of corresponding subtasks. Although we use only simple, unconjoined features, the model is
able to learn useful representations, and we report the best overall score on the CoNLL 2012 English test set to
date.
Transferring Coreference Resolvers with Posterior Regularization
Andr F. T. Martins 16:0516:30
We propose a cross-lingual framework for learning coreference resolvers for resource-poor target languages,
given a resolver in a source language. Our method uses word-aligned bitext to project information from the
source to the target. To handle task-specific costs, we propose a softmax-margin variant of posterior regulariza-
tion, and we use it to achieve robustness to projection errors. We show empirically that this strategy outperforms
competitive cross-lingual methods, such as delexicalized transfer with bilingual word embeddings, bitext direct
projection, and vanilla posterior regularization.
108
Session 7B: Topic Modeling

309B Chair: Jun Xu
Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Repub-
lican Legislators in the 112th Congress
Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Kristina Miler 15:1515:40
We introduce the Hierarchical Ideal Point Topic Model, which provides a rich picture of policy issues, fram-
ing, and voting behavior using a joint model of votes, bill text, and the language that legislators use when
debating bills. We use this model to look at the relationship between Tea Party Republicans and establishment
Republicans in the U.S. House of Representatives during the 112th Congress.
KB-LDA: Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts

Dana Movshovitz-Attias and William W. Cohen 15:4016:05
Many existing knowledge bases KBs, including Freebase, Yago, and NELL, rely on a fixed ontology, given
as an input to the system, which defines the data to be cataloged in the KB, i.e., a hierarchy of categories
and relations between them. The system then extracts facts that match the predefined ontology. We propose
an unsupervised model that jointly learns a latent ontological structure of an input corpus, and identifies facts
from the corpus that match the learned structure. Our approach combines mixed membership stochastic block
models and topic models to infer a structure by jointly modeling text, a latent concept hierarchy, and latent
semantic relationships among the entities mentioned in the text. As a case study, we apply the model to a
corpus of Web documents from the software domain, and evaluate the accuracy of the various components of
the learned ontology.
A Computationally Efficient Algorithm for Learning Topical Collocation Models

Zhendong Zhao, Lan Du, Benjamin Brschinger, John K Pate, Massimiliano Ciaramita, Mark Steed-
man, and Mark Johnson 16:0516:30
Most existing topic models make the bag-of-words assumption that words are generated independently, and so
ignore potentially useful information about word order. Previous attempts to use collocations short sequences
of adjacent words in topic models have either relied on a pipeline approach, restricted attention to bigrams,
or resulted in models whose inference does not scale to large corpora. This paper studies how to simultane-
ously learn both collocations and their topic assignments. We present an efficient reformulation of the Adaptor
Grammar-based topical collocation model AG-colloc Johnson, 2010, and develop a point-wise sampling al-
gorithm for posterior inference in this new formulation. We further improve the efficiency of the sampling
algorithm by exploiting sparsity and parallelising inference. Experimental results derived in text classification,
information retrieval and human evaluation tasks across a range of datasets show that this reformulation scales
to hundreds of thousands of documents while maintaining the good performance of the AG-colloc model.
109
Session 7
Session 7C: Semantics: Semantic Parsing

310 Chair: Luke Zettlemoyer
[TACL] Efficient Inference and Structured Learning for Semantic Role Labeling
Oscar Tckstrm, Kuzman Ganchev, and Dipanjan Das 15:1515:40
We present a dynamic programming algorithm for efficient constrained inference in semantic role labeling. The
algorithm tractably captures a majority of the structural constraints examined by prior work in this area, which
has resorted to either approximate methods or off-the-shelf integer linear programming solvers. In addition,
it allows training a globally-normalized log-linear model with respect to constrained conditional likelihood.
We show that the dynamic program is several times faster than an off-the-shelf integer linear programming
solver, while reaching the same solution. Furthermore, we show that our structured model results in significant
improvements over its local counterpart, achieving state-of-the-art results on both PropBank- and FrameNet-
annotated corpora.
Compositional Semantic Parsing on Semi-Structured Tables

Panupong Pasupat and Percy Liang 15:4016:05
Two important aspects of semantic parsing for question answering are the breadth of the knowledge source and
the depth of logical compositionality. While existing work trades off one aspect for another, this paper simul-
taneously makes progress on both fronts through a new task: answering complex questions on semi-structured
tables using question-answer pairs as supervision. The central challenge arises from two compounding fac-
tors: the broader domain results in an open-ended set of relations, and the deeper compositionality results in
a combinatorial explosion in the space of logical forms. We propose a logical-form driven parsing algorithm
guided by strong typing constraints and show that it obtains significant improvements over natural baselines.
For evaluation, we created a new dataset of 22,033 complex questions on Wikipedia tables, which is made
publicly available.
Graph parsing with s-graph grammars

Jonas Groschwitz, Alexander Koller, and Christoph Teichmann 16:0516:30
A key problem in semantic parsing with graph-based semantic representations is graph parsing, i.e. computing
all possible analyses of a given graph according to a grammar. This problem arises in training synchronous
string-to-graph grammars, and when generating strings from them. We present two algorithms for graph pars-
ing bottom-up and top-down with s-graph grammars. On the related problem of graph parsing with hyperedge
replacement grammars, our implementations outperform the best previous system by several orders of magni-
tude.
110
Session 7D: Lexical Semantics

311A Chair: German Rigau
Sparse Overcomplete Word Vector Representations
Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, and Noah A. Smith 15:1515:40
Current distributed representations of words show little resemblance to theories of lexical semantics. The
former are dense and uninterpretable, the latter largely based on familiar, discrete classes e.g., supersenses and
relations e.g., synonymy and hypernymy. We propose methods that transform word vectors into sparse and
optionally binary vectors. The resulting representations are more similar to the interpretable features typically
used in NLP, though they are discovered automatically from raw corpora. Because the vectors are highly sparse,
they are computationally easy to work with. Most importantly, we find that they outperform the original vectors
on benchmark tasks.
Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints
Quan Liu, Hui Jiang, Si Wei, Zhen-Hua Ling, and Yu Hu 15:4016:05
In this paper, we propose a general framework to incorporate semantic knowledge into the popular data-driven
learning process of word embeddings to improve the quality of them. Under this framework, we represent
semantic knowledge as many ordinal ranking inequalities and formulate the learning of semantic word em-
beddings SWE as a constrained optimization problem, where the data-derived objective function is optimized
subject to all ordinal knowledge inequality constraints extracted from available knowledge resources such as
Thesaurus, WordNet, knowledge graphs, etc. We have demonstrated that this constrained optimization prob-
lem can be efficiently solved by the stochastic gradient descent SGD algorithm, even for a large number of
inequality constraints. Experimental results on four standard NLP tasks, including word similarity measure,
sentence completion, name entity recognition, and the TOEFL synonym selection, have all demonstrated that
the quality of learned word vectors can be significantly improved after semantic knowledge is incorporated as
inequality constraints during the learning process of word embeddings.
Adding Semantics to Data-Driven Paraphrasing

Ellie Pavlick, Johan Bos, Malvina Nissim, Charley Beller, Benjamin Van Durme, and Chris Callison-
Burch 16:0516:30
We add an interpretable semantics to the paraphrase database PPDB. To date, the relationship between the
phrase pairs in the database has been weakly defined as approximately equivalent. We show that in fact
these pairs represent a variety of relations, including directed entailment little girl/girl and exclusion no-
body/someone. We automatically assign semantic entailment relations to entries in PPDB using features de-
rived from past work on discovering inference rules from text and semantic taxonomy induction. We demon-
strate that our model assigns these entailment relations with high accuracy. In a downstream RTE task, our
labels rival relations from WordNet and improve the coverage of a proof-based RTE system by 17
111
Session 7
Session 7E: Parsing

311B Chair: Anders Sgaard
Parsing as Reduction
Daniel Fernndez-Gonzlez and Andr F. T. Martins 15:1515:40
We reduce phrase-based parsing to dependency parsing. Our reduction is grounded on a new intermediate
representation, head-ordered dependency trees, shown to be isomorphic to constituent trees. By encoding order
information in the dependency labels, we show that any off-the-shelf, trainable dependency parser can be used
to produce constituents. When this parser is non-projective, we can perform discontinuous parsing in a very
natural manner. Despite the simplicity of our approach, experiments show that the resulting parsers are on
par with strong baselines, such as the Berkeley parser for English and the best non-reranking system in the
SPMRL-2014 shared task. Results are particularly striking for discontinuous parsing of German, where we
surpass the current state of the art by a wide margin.
Optimal Shift-Reduce Constituent Parsing with Structured Perceptron

Le Quang Thang, Hiroshi Noji, and Yusuke Miyao 15:4016:05
We present a constituent shift-reduce parser with a structured perceptron that finds the optimal parse in a practi-
cal runtime. The key ideas are new feature templates that facilitate state merging of dynamic programming and
A* search. Our system achieves 91.1 F1 on a standard English experiment, a level which cannot be reached by
other beam-based systems even with large beam sizes.
A Data-Driven, Factorization Parser for CCG Dependency Structures

Yantao Du, Weiwei Sun, and Xiaojun Wan 16:0516:30
This paper is concerned with building grounded, semantics-oriented deep dependency structures with a data-
driven, factorization model. Three types of factorization together with different higher-order features are de-
signed to capture different syntacto-semantic properties of functor-argument dependencies. Integrating hetero-
geneous factorizations results in intractability in decoding. We propose a principled method to obtain optimal
graphs based on dual decomposition. Our parser obtains an unlabeled f-score of 93.23 on the CCGBank data,
resulting in an error reduction of 6.5
112
Poster session P2

2-1 Recovering dropped pronouns from Chinese text messages
Yaqin Yang, Yalin Liu, and Nianwen Xue
Pronouns are frequently dropped in Chinese sentences, especially in informal data such as text messages. In
this work we propose a solution to recover dropped pronouns in SMS data. We manually annotate dropped
pronouns in 684 SMS files and apply machine learning algorithms to recover these dropped pronouns, lever-
aging lexical, contextual and syntactic information as features. We believe this is the first work on recovering
dropped pronouns in Chinese text messages.
2-2 The Users Who Say Ni: Audience Identification in Chinese-language Restaurant Reviews
Rob Voigt and Dan Jurafsky
We give an algorithm for disambiguating generic versus referential uses of second-person pronouns in restau-
rant reviews in Chinese. Reviews in this domain use the you pronoun either generically or to refer to shop-
keepers, readers, or for self-reference in reported conversation. We first show that linguistic features of the local
context drawn from prior literature help in disambigation. We then show that document-level features n-grams
and document-level embeddings - not previously used in the referentiality literature - actually give the largest
gain in performance, and suggest this is because pronouns in this domain exhibit one-sense-per-discourse.
Our work highlights an important case of discourse effects on pronoun use, and may suggest practical implica-
tions for audience extraction and other sentiment tasks in online reviews.
2-3 Chinese Zero Pronoun Resolution: A Joint Unsupervised Discourse-Aware Model Rivaling
State-of-the-Art Resolvers
Chen Chen and Vincent Ng
We propose an unsupervised probabilistic model for zero pronoun resolution. To our knowledge, this is the
first such model that 1 is trained on zero pronouns in an unsupervised manner; 2 jointly identifies and resolves
anaphoric zero pronouns; and 3 exploits discourse information provided by a salience model. Experiments
demonstrate that our unsupervised model significantly outperforms its state-of-the-art unsupervised counterpart
when resolving the Chinese zero pronouns in the OntoNotes corpus.

2-4 Co-Simmate: Quick Retrieving All Pairwise Co-Simrank Scores
Yu Weiren and Julie McCann
Co-Simrank is a useful Simrank-like measure of similarity based on graph structure. The existing method itera-
tively computes each pair of Co-Simrank score from a dot product of two Pagerank vectors, entailing Olog1/en3
time to compute all pairs of Co-Simranks in a graph with n nodes, to attain a desired accuracy e. In this study,
we devise a model, Co-Simmate, to speed up the retrieval of all pairs of Co-Simranks to Olog2log1/en3 time.
Moreover, we show the optimality of Co-Simmate among other hop-uk variations, and integrate it with a matrix
decomposition based method on singular graphs to attain higher efficiency. The viable experiments verify the
superiority of Co-Simmate to others.
2-5 Retrieval of Research-level Mathematical Information Needs: A Test Collection and Tech-
nical Terminology Experiment
Yiannos Stathopoulos and Simone Teufel
In this paper, we present a test collection for mathematical information retrieval composed of real-life, research-
level mathematical information needs. Topics and relevance judgements have been procured from the on-line
collaboration website MathOverflow by delegating domain-specific decisions to experts on-line. With our
113
test collection, we construct a baseline using Lucenes vector-space model implementation and conduct an
experiment to investigate how prior extraction of technical terms from mathematical text can affect retrieval
efficiency. We show that by boosting the importance of technical terms, statistically significant improvements
in retrieval performance can be obtained over the baseline.
2-6 Learning to Mine Query Subtopics from Query Log

Zhenzhong Zhang, Le Sun, and Xianpei Han
Many queries in web search are ambiguous or multifaceted. Identifying the major senses or facets of queries
is very important for web search. In this paper, we represent the major senses or facets of queries as subtopics
and re-fer to indentifying senses or facets of queries as query subtopic mining, where query subtop-ic are
represented as a number of clusters of queries. Then the challenges of query subtopic mining are how to
measure the similarity be-tween queries and group them semantically. This paper proposes an approach for
mining subtopics from query log, which jointly learns a similarity measure and groups queries by explicitly
modeling the structure among them. Compared with previous approaches using manually defined similarity
measures, our ap-proach produces more desirable query subtop-ics by learning a similarity measure. Experi-
mental results on real queries collected from a search engine log confirm the effectiveness of the proposed
approach in mining query sub-topics.

2-7 Learning Topic Hierarchies for Wikipedia Categories
Linmei Hu, Xuzhong Wang, Mengdi Zhang, Juanzi Li, Xiaoli Li, Chao Shao, Jie Tang, and Yongbin
Liu
Existing studies have utilized Wikipeida for various knowledge acquisition tasks. However, no attempts have
been made to explore multi-level topic knowledge contained in Wikipedia articles Contents tables. The ar-
ticles with similar subjects are grouped together into Wikipedia categories. In this work, we propose novel
methods to automatically construct comprehensive topic hierarchies for given categories based on the struc-
tured Contents tables as well as corresponding unstructured text descriptions. Such a hierarchy is important for
information browsing, document organization and topic prediction. Experimental results show our proposed
approach, incorporating both the structural and textual information, achieves high quality topic hierarchies.
2-8 Semantic Clustering and Convolutional Neural Network for Short Text Categorization
Peng Wang, Jiaming Xu, Bo Xu, Chenglin Liu, Heng Zhang, Fangyuan Wang, and Hongwei Hao
Short texts usually encounter data sparsity and ambiguity problems in representations for their lack of con-
text. In this paper, we propose a novel method to model short texts based on word embedding clustering and
convolutional neural network. Particularly, we first discover semantic cliques in embedding spaces by fast
clustering. Then, multi-scale semantic units are detected under the supervision of semantic cliques, which
introduce useful external knowledge for short texts. These meaningful semantic units are combined and fed
into convolutional layer, followed by max-pooling operation. Experimental results on two open benchmarks
validate the effectiveness of the proposed method.
2-9 Document Level Time-anchoring for TimeLine Extraction

Egoitz Laparra, Itziar Aldabe, and German Rigau
This paper investigates the contribution of document level processing of time-anchors for TimeLine event
extraction. We developed and tested two different systems. The first one is a baseline system that captures
explicit time-anchors. The second one extends the baseline system by also capturing implicit time relations.
We have evaluated both approaches in the SemEval 2015 task 4 TimeLine: Cross-Document Event Ordering.
We empirically demonstrate that the document-based approach obtains a much more complete time anchoring.
Moreover, this approach almost doubles the performance of the systems that participated in the task.
2-10 Event Detection and Domain Adaptation with Convolutional Neural Networks
Thien Huu Nguyen and Ralph Grishman
We study the event detection problem using convolutional neural networks CNNs that overcome the two fun-
damental limitations of the traditional feature-based approaches to this task: complicated feature engineering
114
for rich feature sets and error propagation from the preceding stages which generate these features. The exper-
imental results show that the CNNs outperform the best reported feature-based systems in the general setting
as well as the domain adaptation setting without resorting to extensive external resources.
2-11 Seed-Based Event Trigger Labeling: How far can event descriptions get us?
Ofer Bronstein, Ido Dagan, Qi Li, Heng Ji, and Anette Frank
The task of event trigger labeling is typically addressed in the standard supervised setting: triggers for each
target event type are annotated as training data, based on annotation guidelines. We propose an alternative
approach, which takes the example trigger terms mentioned in the guidelines as seeds, and then applies an
event-independent similarity-based classifier for trigger labeling. This way we can skip manual annotation for
new event types, while requiring only minimal annotated training data for few example events at system setup.
Our method is evaluated on the ACE-2005 dataset, achieving 5.7
2-12 An Empirical Study of Chinese Name Matching and Applications

Nanyun Peng, Mo Yu, and Mark Dredze
Methods for name matching, an important component to support downstream tasks such as entity linking and
entity clustering, have focused on alphabetic languages, primarily English. In contrast, logogram languages
such as Chinese remain untested. We evaluate methods for name matching in Chinese, including both string
matching and learning approaches. Our approach, based on a new Chinese representation for Chinese, improves
both name matching and a downstream entity clustering task.
2-13 Language Identification and Modeling in Specialized Hardware

Kenneth Heafield, Rohan Kshirsagar, and Santiago Barona
We repurpose network security hardware to perform language identification and language modeling tasks. The
hardware is a deterministic pushdown transducer since it executes regular expressions and has a stack. One core
is 2.4 times as fast at language identification and 1.8 to 6 times as fast at part-of-speech language modeling.
2-14 Cross-lingual Transfer of Named Entity Recognizers without Parallel Corpora

Ayah Zirikly and Masato Hagiwara
We propose an approach to cross-lingual named entity recognition model transfer without the use of parallel
corpora. In addition to global de-lexicalized features, we introduce multilingual gazetteers that are generated
using graph propagation, and cross-lingual word representation mappings without the use of parallel data. We
target the e-commerce domain, which is challenging due to its unstructured and noisy nature. The experiments
have shown that our approaches beat the strong MT baseline, where the English model is transferred to two
languages: Spanish and Chinese.
2-15 Robust Multi-Relational Clustering via `1 -Norm Symmetric Nonnegative Matrix Factor-
ization
Kai Liu and Hua Wang
In this paper, we propose an `1 -norm Symmetric Nonnegative Matrix Tri-Factorization `1 S-NMTF framework
to cluster multi-type relational data by utilizing their interrelatedness. Due to introducing the `1 -norm distances
in our new objective function, the proposed approach is robust against noise and outliers, which are inherent in
multi-relational data. We also derive the solution algorithm and rigorously analyze its correctness and conver-
gence. The promising experimental results of the algorithm applied to text clustering on IMDB dataset validate
the proposed approach.

2-16 Painless Labeling with Application to Text Mining
Sajib Dasgupta
Labeled data is not readily available for many natural language domains, and it typically requires expensive
human effort with considerable domain knowledge to produce a set of labeled data. In this paper, we propose
a simple unsupervised system that helps us create a categorical labeled resource for a document set using only
fifteen minutes of human input. We utilize the labeled resources to discover important insights about the data.
115
The entire process is domain independent, and demands no prior annotation samples, or rules specific to an
annotation.
2-17 FrameNet+: Fast Paraphrastic Tripling of FrameNet
Ellie Pavlick, Travis Wolfe, Pushpendre Rastogi, Chris Callison-Burch, Mark Dredze, and Benjamin
Van Durme
We increase the lexical coverage of FrameNet through automatic paraphrasing. We use crowdsourcing to
manually filter out bad paraphrases in order to ensure a high-precision resource. Our expanded FrameNet
contains an additional 22K lexical units, a 3-fold increase over the current FrameNet, and achieves 40
2-18 IWNLP: Inverse Wiktionary for Natural Language Processing
Matthias Liebeck and Stefan Conrad
Nowadays, there are a lot of natural language processing pipelines that are based on training data created by a
few experts. This paper examines how the proliferation of the internet and its collaborative application possi-
bilities can be practically used for NLP. For that purpose, we examine how the German version of Wiktionary
can be used for a lemmatization task. We introduce IWNLP, an open-source parser for Wiktionary, that reim-
plements several MediaWiki markup language templates for conjugated verbs and declined adjectives. The
lemmatization task is evaluated on three German corpora on which we compare our results with existing soft-
ware for lemmatization. With Wiktionary as a resource, we obtain a high accuracy for the lemmatization of
nouns and can even improve on the results of existing software for the lemmatization of nouns.
2-19 TR9856: A Multi-word Term Relatedness Benchmark

Ran Levy, Liat Ein-Dor, Shay Hummel, Ruty Rinott, and Noam Slonim
Measuring word relatedness is an important ingredient of many NLP applications.Several datasets have been
developed in order to evaluate such measures. The main drawback of existing datasets is the focus on single
words, although natural language contains a large proportion of multi-word terms. We propose the new TR9856
dataset which focuses on multi-word terms and is significantly larger than existing datasets. The new dataset
includes many real world terms such as acronyms and named entities, and further handles term ambiguity by
providing topical context for all term pairs. We report baseline results for common relatedness methods over
the new data, and exploit its magnitude to demonstrate that a combination of these methods outperforms each
individual method.
2-20 PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embed-
dings, and style classification
Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-
Burch
We present a new release of the Paraphrase Database. PPDB 2.0 includes a discriminatively re-ranked set
of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0s heuristic rankings.
Each paraphrase pair in the database now also includes fine- grained entailment relations, word embedding
similarities, and style annotations.
2-21 Automatic Discrimination between Cognates and Borrowings

Alina Maria Ciobanu and Liviu P. Dinu
Identifying the type of relationship between words provides a deeper insight into the history of a language and
allows a better characterization of language relatedness. In this paper, we propose a computational approach
for discriminating between cognates and borrowings. We show that orthographic features have discriminative
power and we analyze the underlying linguistic factors that prove relevant in the classification task. To our
knowledge, this is the first attempt of this kind.
2-22 The Media Frames Corpus: Annotations of Frames Across Issues

Dallas Card, Amber E. Boydstun, Justin H. Gross, Philip Resnik, and Noah A. Smith
We describe the first version of the Media Frames Corpus: several thousand news articles on three policy issues,
annotated in terms of the media framing. We motivate framing as a phenomenon of study for computational
linguistics and describe our annotation process.
2-23 deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Tar-
gets
Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, Michael Auli, Chris Quirk, Margaret
Mitchell, Jianfeng Gao, and Bill Dolan
116
We introduce Discriminative BLEU deltaBLEU, a novel metric for intrinsic evaluation of generated text in
tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on
a scale of [ , +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses,
deltaBLEU correlates reasonably with human judgments and outperforms sentence-level and IBM BLEU in
terms of both Spearmans who and Kendalls tau.
2-24 Tibetan Unknown Word Identification from News Corpora for Supporting Lexicon-based
Tibetan Word Segmentation
Minghua Nuo, Huidan Liu, Congjun Long, and Jian Wu
In Tibetan, as words are written consecutively without delimiters, finding unknown word boundary is difficult.
This paper presents a hybrid approach for Tibetan unknown word identification for offline corpus processing.
Firstly, Tibetan named entity is preprocessed based on natural annotation. Secondly, other Tibetan unknown
words are extracted from word segmentation fragments using MTC, the combination of a statistical metric
and a set of context sensitive rules. In addition, the preliminary experimental results on Tibetan News Corpus
are reported. Lexicon-based Tibetan word segmentation system SegT with proposed unknown word extension
mechanism is indeed helpful to promote the performance of Tibetan word segmentation. It increases the F-score
of Tibetan word segmentation by 4.15

2-25 Learning Lexical Embeddings with Syntactic and Lexicographic Knowledge
Tong Wang, Abdelrahman Mohamed, and Graeme Hirst
We propose two improvements on lexical association used in embedding learning: factorizing individual de-
pendency relations and using lexicographic knowledge from monolingual dictionaries. Both proposals provide
low-entropy lexical co-occurrence information, and are empirically shown to improve embedding learning by
performing notably better than several popular embedding models in similarity tasks.
2-26 Non-distributional Word Vector Representations

Manaal Faruqui and Chris Dyer
Data-driven representation learning for words is a technique of central importance in NLP. While indisputably
useful as a source of features in downstream tasks, such vectors tend to consist of uninterpretable components
whose relationship to the categories of traditional lexical semantic theories is tenuous at best. We present
a method for constructing interpretable word vectors from hand-crafted linguistic resources like WordNet,
FrameNet etc. These vectors are binary i.e, contain only 0 and 1 and are 99.9% sparse. We analyze their
performance on state-of-the-art evaluation methods for distributional models of word vectors and find they are
competitive to standard distributional approaches.
2-27 Early and Late Combinations of Criteria for Reranking Distributional Thesauri
Olivier Ferret
In this article, we first propose to exploit a new criterion for improving distributional thesauri. Following a
bootstrapping perspective, we select relations between the terms of similar nominal compounds for building
in an unsupervised way the training set of a classifier performing the reranking of a thesaurus. Then, we
evaluate several ways to combine thesauri reranked according to different criteria and show that exploiting the
complementary information brought by these criteria leads to significant improvements.

2-28 Dependency length minimisation effects in short spans: a large-scale analysis of adjective
placement in complex noun phrases
Kristina Gulordava, Paola Merlo, and Benoit Crabb
117
It has been extensively observed that languages minimise the distance between two related words. Dependency
length minimisation effects are explained as a means to reduce memory load and for effective communication.
In this paper, we ask whether they hold in typically short spans, such as noun phrases, which could be thought
of being less subject to efficiency pressure. We demonstrate that minimisation does occur in short spans, but
also that it is a complex effect: it is not only the length of the dependency that is at stake, but also the effect of
the surrounding dependencies.
2-29 Tagging Performance Correlates with Author Age

Dirk Hovy and Anders Sgaard
Many NLP tools for English and German are based on manually annotated articles from the Wall Street Journal
and Frankfurter Rundschau. The average readers of these two newspapers are middle-aged 55 and 47 years old,
respectively, and the annotated articles are more than 20 years old by now. This leads us to speculate whether
tools induced from these resources such as part-of-speech taggers put older language users at an advantage.
We show that this is actually the case in both languages, and that the cause goes beyond simple vocabulary
differences. In our experiments, we control for gender and region.

2-30 User Based Aggregation for Biterm Topic Model
Weizheng Chen, Jinpeng Wang, Yan Zhang, Hongfei Yan, and Xiaoming Li
Biterm Topic Model BTM is designed to model the generative process of the word co-occurrence patterns in
short texts such as tweets. However, two aspects of BTM may restrict its performance: 1 users personalities
are ignored to obtain the corpus level words co-occurrence patterns; and 2 the strong assumptions that two
co-occurring words will be assigned the same topic label could not distinguish background words from topical
words. In this paper, we propose Twitter-BTM model to address those issues by considering user level per-
sonalization in BTM. Firstly, we use user based biterms aggregation to learn user specific topic distribution.
Secondly, each users preference between background words and topical words is estimated by incorporating a
background topic. Experiments on a largescale real-world dataset show that Twitter-BTM outperforms several
state-of-the-art baselines.
2-31 The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language
Models
ShiLiang Zhang, Hui Jiang, MingBin Xu, Junfeng Hou, and LiRong Dai
In this paper, we propose the new fixed-size ordinally-forgetting encoding FOFE method, which can almost
uniquely encode any variable-length sequence of words into a fixed-size representation. FOFE can model the
word order in a sequence using a simple ordinally-forgetting mechanism according to the positions of words.
In this work, we have applied FOFE to feedforward neural network language models FNN-LMs. Experimen-
tal results have shown that without using any recurrent feedbacks, FOFE based FNN-LMs can significantly
outperform not only the standard fixed-input FNN-LMs but also the popular RNN-LMs.
2-32 Unsupervised Decomposition of a Multi-Author Document Based on Naive-Bayesian

Model
Khaled Aldebei, Xiangjian He, and Jie Yang
This paper proposes a new unsupervised method for decomposing a multi-author document into authorial com-
ponents. We assume that we do not know anything about the document and the authors, except the number of
the authors of that document. The key idea is to exploit the difference in the posterior probability of the Naive-
Bayesian model to increase the precision of the clustering assignment and the accuracy of the classification
process of our method. Experimental results show that the proposed method outperforms two state-of-the-art
methods.
2-33 Extended Topic Model for Word Dependency
Tong Wang, Vish Viswanath, and Ping Chen
Topic Model such as Latent Dirichlet AllocationLDA makes assumption that topic assignment of different
words are conditionally independent. In this paper, we propose a new model Extended Global Topic Random
118
Field EGTRF to model non-linear dependencies between words. Specifically, we parse sentences into depen-
dency trees and represent them as a graph, and assume the topic assignment of a word is influenced by its
adjacent words and distance-2 words. Word similarity information learned from large corpus is incorporated to
enhance word topic assignment. Parameters are estimated efficiently by variational inference and experimental
results on two datasets show EGTRF achieves lower perplexity and higher log predictive probability.
2-34 Dependency Recurrent Neural Language Models for Sentence Completion

Piotr Mirowski and Andreas Vlachos
Recent work on language modelling has shifted focus from count-based models to neural models. In these
works, the words in each sentence are always considered in a left-to-right order. In this paper we show how
we can improve the performance of the recurrent neural network RNN language model by incorporating the
syntactic dependencies of a sentence, which have the effect of bringing relevant contexts closer to the word
being predicted. We evaluate our approach on the Microsoft Research Sentence Completion Challenge and
show that the dependency RNN proposed improves over the RNN by about 10 points in accuracy. Furthermore,
we achieve results comparable with the state-of-the-art models on this task.
2-35 Point Process Modelling of Rumour Dynamics in Social Media

Michal Lukasik, Trevor Cohn, and Kalina Bontcheva
Rumours on social media exhibit complex temporal patterns. This paper develops a model of rumour preva-
lence using a point process, namely a log-Gaussian Cox process, to infer an underlying continuous temporal
probabilistic model of post frequencies. To generalize over different rumours, we present a multi-task learning
method parametrized by the text in posts which allows data statistics to be shared between groups of similar
rumours. Our experiments demonstrate that our model outperforms several strong baseline methods for rumour
frequency prediction evaluated on tweets from the 2014 Ferguson riots.
2-36 Learning Hidden Markov Models with Distributed State Representations for Domain
Adaptation
Min Xiao and Yuhong Guo
Recently, a variety of representation learning approaches have been developed in the literature to induce latent
generalizable features across two domains. In this paper, we extend the standard hidden Markov models HMMs
to learn distributed state representations to improve cross-domain prediction performance. We reformulate the
HMMs by mapping each discrete hidden state to a distributed representation vector and employ an expectation-
maximization algorithm to jointly learn distributed state representations and model parameters. We empirically
investigate the proposed model on cross-domain part-of-speech tagging and noun-phrase chunking tasks. The
experimental results demonstrate the effectiveness of the distributed HMMs on facilitating domain adaptation.

2-37 MT Quality Estimation for Computer-assisted Translation: Does it Really Help?
Marco Turchi, Matteo Negri, and Marcello Federico
The usefulness of translation quality estimation QE to increase productivity in a computer-assisted translation
CAT framework is a widely held assumption Specia, 2011; Huang et al., 2014. So far, however, the validity
of this assumption has not been yet demonstrated through sound evaluations in realistic settings. To this aim,
we report on an evaluation involving professional translators operating with a CAT tool in controlled but nat-
ural conditions. Contrastive experiments are carried out by measuring post-editing time differences when: i
translation suggestions are presented together with binary quality estimates, and ii the same suggestions are
presented without quality indicators. Translators productivity in the two conditions is analysed in a principled
way, accounting for the main factors e.g. differences in translators behaviour, quality of the suggestions that
directly impact on time measurements. While the general assumption about the usefulness of QE is verified,
significance testing results reveal that real productivity gains can be observed only under specific conditions.
2-38 Context-Dependent Translation Selection Using Convolutional Neural Network

Baotian Hu, Zhaopeng Tu, Zhengdong Lu, Hang Li, and Qingcai Chen
119
We propose a novel method for translation selection in statistical machine translation, in which a convolutional
neural network is employed to judge the similarity between a phrase pair in two languages. The specifically
designed convolutional architecture encodes not only the semantic similarity of the translation pair, but also
the context containing the phrase in the source language. Therefore, our approach is able to capture context-
dependent semantic similarities of translation pairs. We adopt a curriculum learning strategy to train the model:
we classify the training examples into easy, medium, and difficult categories, and gradually build the ability
of representing phrases and sentence-level contexts by using training examples from easy to difficult. Ex-
perimental results show that our approach significantly outperforms the baseline system by up to 1.4 BLEU
points.
2-39 Learning Word Reorderings for Hierarchical Phrase-based Statistical Machine Transla-
tion
Jingyi Zhang, Masao Utiyama, Eiichiro Sumita, and Hai Zhao
Statistical models for reordering source words have been used to enhance the hierarchical phrase-based statisti-
cal machine translation system. Existing word reordering models learn the reordering for any two source words
in a sentence or only for two continuous words. This paper proposes a series of separate sub-models to learn
reorderings for word pairs with different distances. Our experiments demonstrate that reordering sub-models
for word pairs with distances less than a specific threshold are useful to improve translation quality. Com-
pared with previous work, our method may more effectively and efficiently exploit helpful word reordering
information.
2-40 UNRAVELA Decipherment Toolkit
Malte Nuhn, Julian Schamper, and Hermann Ney
In this paper we present the UNRAVEL toolkit. It implements many of the recently published works on
decipherment, including decipherment for deterministic ciphers like e.g. the ZODIAC-408 cipher and Part two
of the BEALE ciphers, as well as decipherment of probabilistic ciphers and unsupervised training for machine
translation. It also includes data and example configuration files so that the previously published experiments
are easy to reproduce.
2-41 Multi-Pass Decoding With Complex Feature Guidance for Statistical Machine Translation
Benjamin Marie and Aurlien Max
In Statistical Machine Translation, some complex features are still difficult to integrate during decoding and
usually used through the reranking of the k-best hypotheses produced by the decoder. We propose a translation
table partitioning method that exploits the result of this reranking to iteratively guide the decoder in order to
produce a new k-best list more relevant to some complex features. We report experiments on two translation
domains and two translations directions which yield improvements of up to 1.4 BLEU over the reranking
baseline using the same set of complex features. On a practical viewpoint, our approach allows SMT system
developers to easily integrate complex features into decoding rather than being limited to their use in one-time
k-best list reranking.
2-42 Whats in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Trans-
lation
Marlies Van der Wees, Arianna Bisazza, Wouter Weerkamp, and Christof Monz
Domain adaptation is an active field of research in statistical machine translation SMT, but so far most work has
ignored the distinction between the topic and genre of documents. In this paper we quantify and disentangle
the impact of genre and topic differences on translation quality by introducing a new data set that has controlled
topic and genre distributions. In addition, we perform a detailed analysis showing that differences across topics
only explain to a limited degree translation performance differences across genres, and that genre-specific errors
are more attributable to model coverage than to suboptimal scoring of translation candidates.
2-43 Learning Cross-lingual Word Embeddings via Matrix Co-factorization

Tianze Shi, Zhiyuan Liu, Yang Liu, and Maosong Sun
A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic fea-
tures. In this paper, we present a matrix co-factorization framework for learning cross-lingual word embed-
dings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce
cross-lingual constraints for simultaneously factorizing monolingual matrices. The cross-lingual constraints
can be derived from parallel corpora, with or without word alignments. Empirical results on a task of cross-
lingual document classification show that our method is effective to encode cross-lingual knowledge as con-
120
straints for cross-lingual word embeddings.

2-44 Improving Pivot Translation by Remembering the Pivot
Akiva Miura, Graham Neubig, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura
Pivot translation allows for translation of language pairs with little or no parallel data by introducing a third
language for which data exists. In particular, the triangulation method, which translates by combining source-
pivot and pivot-target translation models into a source-target model is known for its high translation accuracy.
However, after the conventional triangulation method, information of pivot phrases is forgotten, and not used
in the translation process. In this paper, we propose a novel approach to remember the pivot phrases in the
triangulation stage, and use a pivot language model as an additional information source at translation time.
Experimental results on the Europarl corpus showed gains of 0.4-1.2 BLEU points in all tested combinations
of languages.
2-45 BrailleSUM: A News Summarization System for the Blind and Visually Impaired People
Xiaojun Wan and Yue Hu
In this article, we discuss the challenges of document summarization for the blind and visually impaired people
and then propose a new system called BrailleSUM to produce better summaries for the blind and visually
impaired people. Our system considers the factor of braille length of each sentence in news articles into the
ILP-based summarization method. Evaluation results on a DUC dataset show that BrailleSUM can produce
shorter braille summaries than existing methods, meanwhile, it does not sacrifice the content quality of the
summaries.
2-46 Automatic Identification of Age-Appropriate Ratings of Song Lyrics
Anggi Maulidyani and Ruli Manurung
This paper presents a novel task, namely the automatic identification of age appropriate ratings of a musical
track, or album, based on its lyrics. Details are provided regarding the construction of a dataset of lyrics from
12,242 tracks across 1,798 albums along with age-appropriate ratings obtained from various web resources,
along with results from various text classification experiments. The best accuracy of 71.02
2-47 Ground Truth for Grammaticality Correction Metrics

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault
How do we know which grammatical error correction GEC system is best? A number of metrics have been
proposed over the years, each motivated by weaknesses of previous metrics; however, the metrics themselves
have not been compared to an empirical gold standard grounded in human judgments. We conducted the first
human evaluation of GEC system outputs, and show that the rankings produced by metrics such as MaxMatch
and I-measure do not correlate well with this ground truth. As a step towards better metrics, we also propose
GLEU, a simple variant of BLEU, modified to account for both the source and the reference, and show that it
hews much more closely to human judgments.
2-48 Radical Embedding: Delving Deeper to Chinese Radicals

Xinlei Shi, Junjie Zhai, Xudong Yang, Zehua Xie, and Chao Liu
Chinese and other agglutinating languages alike are mostly processed at word level. Inspired by recent success
of deep learning, we delve deeper to character and radical levels for Chinese language processing. We propose
a new deep learning techniques, called adical embedding with proper justifications based on Chinese linguis-
tics, and validate its feasibility and utility through a set of three experiments: two in-house standard experi-
ments on short-text categorization STC and Chinese word segmentation CWS, and one in-field experiment on
search ranking. We show that radical embedding achieves comparable, and sometimes even better, results than
competing methods.
2-49 Automatic Detection of Sentence Fragments

Chak Yan Yeung and John Lee
121
We present and evaluate a method for automatically detecting sentence fragments in English texts written
by non-native speakers. Our method combines syntactic parse tree patterns and parts-of-speech information
produced by a tagger to detect this phenomenon. When evaluated on a corpus of authentic learner texts,
our best model achieved a precision of 0.84 and a recall of 0.62, a statistically significant improvement over
baselines using non-parse features, as well as a popular grammar checker.

2-50 A Computational Approach to Automatic Prediction of Drunk-Texting
Aditya Joshi, Abhijit Mishra, Balamurali AR, Pushpak Bhattacharyya, and Mark James Carman
Alcohol abuse may lead to unsociable behavior such as crime, drunk driving, or privacy leaks. We introduce
automatic drunk-texting prediction as the task of identifying whether a text was written when under the influ-
ence of alcohol. We experiment with tweets labeled using hashtags as distant supervision. Our classifiers use
a set of N-gram and stylistic features to detect drunk tweets. Our observations present the first quantitative
evidence that text contains signals that can be exploited to detect drunk-texting.
2-51 Reducing infrequent-token perplexity via variational corpora

Yusheng Xie, Pranjal Daga, Yu Cheng, Kunpeng Zhang, Ankit Agrawal, and Alok Choudhary
Recurrent neural network RNN is recognized as a powerful language model LM. We investigate deeper into its
performance portfolio, which performs well on frequent grammatical patterns but much less so on less frequent
terms. Such portfolio is expected and desirable in applications like autocomplete, but is less useful in social
advertising where many creative, unexpected usages occur e.g., URL insertion. We adapt a generic RNN model
and show that, with variational training corpora and epoch unfolding, the model improves its URL insertion
suggestions.
2-52 A Hierarchical Knowledge Representation for Expert Finding on Social Media

Yanran Li, Wenjie Li, and Sujian Li
Expert finding on social media benefits both individuals and commercial services. In this paper, we exploit a 5-
level tree representation to model the posts on social media and cast the expert finding problem to the matching
problem between the learned user tree and domain tree. We enhance the traditional approximate tree matching
algorithm and incorporate word embeddings to improve the matching result. The experimental results show
the effectiveness of our work.
2-53 Tackling Sparsity, the Achilles Heel of Social Networks: Language Model Smoothing via
Social Regularization
Rui Yan, Xiang Li, Mengwen Liu, and Xiaohua Hu
Online social networks nowadays have the worldwide prosperity, as they have revolutionized the way for people
to discover, to share, and to diffuse information. Social networks are powerful, yet they still have Achilles Heel:
extreme data sparsity. Individual posting documents, e.g., a microblog less than 140 characters, seem to be too
sparse to make a difference under various scenarios, while in fact they are quite different. We propose to tackle
this specific weakness of social networks by smoothing the posting document language model based on social
regularization. We formulate an optimization framework with a social regularizer. Experimental results on the
Twitter dataset validate the effectiveness and efficiency of our proposed model.
2-54 Twitter User Geolocation Using a Unified Text and Network Prediction Model
Afshin Rahimi, Trevor Cohn, and Timothy Baldwin
We propose a label propagation approach to geolocation prediction based on Modified Adsorption, with two
enhancements: 1 the removal of celebrity nodes to increase location homophily and boost tractability; and
2 the incorporation of text-based geolocation priors for test users. Experiments over three Twitter benchmark
datasets achieve state-of-the-art results, and demonstrate the effectiveness of the enhancements.
2-55 Automatic Keyword Extraction on Twitter
Luis Marujo, Wang Ling, Isabel Trancoso, Chris Dyer, Alan W Black, Anatole Gershman, David
Martins De Matos, Joo Neto, and Jaime Carbonell
122
In this paper, we build a corpus of tweets from Twitter annotated with keywords using crowdsourcing methods.
We identify key differences between this domain and the work performed on other domains, such as news,
which makes existing approaches for automatic keyword extraction not generalize well on Twitter datasets.
These datasets include the small amount of content in each tweet, the frequent usage of lexical variants and the
high variance of the cardinality of keywords present in each tweet. We propose methods for addressing these
issues, which leads to solid improvements on this dataset for this task.
2-56 Towards a Contextual Pragmatic Model to Detect Irony in Tweets

Jihen Karoui, Benamara Farah, Vronique Moriceau, Nathalie Aussenac-Gilles, and Lamia Hadrich-
Belguith
This paper proposes an approach to capture the pragmatic context needed to infer irony in tweets. We aim to test
the validity of two main hypotheses: 1 the presence of negations, as an internal propriety of an utterance, can
help to detect the disparity between the literal and the intended meaning of an utterance, 2 a tweet containing
an asserted fact of the form NotP1 is ironic if and only if one can assess the absurdity of P1. Our first results
are encouraging and show that deriving a pragmatic contextual model is feasible.
2-57 Annotation and Classification of an Email Importance Corpus

Fan Zhang and Kui Xu
This paper presents an email importance corpus annotated through Amazon Mechanical Turk AMT. Annotators
annotate the email content type and email importance for three levels of hierarchy senior manager, middle
manager and employee. Each email is annotated by 5 turkers. Agreement study shows that the agreed AMT
annotations are close to the expert annotations. The annotated dataset demonstrates difference in proportions
of content type between different levels. An email importance prediction system is trained on the dataset and
identifies the unimportant emails at minimum 0.55 precision with only text-based features.
2-58 Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings
Luchen Tan, Haotian Zhang, Charles Clarke, and Mark Smucker
Compared with carefully edited prose, the language of social media is informal in the extreme. The application
of NLP techniques in this context may require a better understanding of word usage within social media. In this
paper, we compute a word embedding for a corpus of tweets, comparing it to a word embedding for Wikipedia.
After learning a transformation of one vector space to the other, and adjusting similarity values according to
term frequency, we identify words whose usage differs greatly between the two corpora. For any given word,
the set of words closest to it in a particular embedding provides a characterization for that words usage within
the corresponding corpora.

2-59 The Discovery of Natural Typing Annotations: User-produced Potential Chinese Word
Delimiters
Dakui Zhang, Yu Mao, Yang Liu, Hanshi Wang, Chuyuan Wei, and Shiping Tang
Human labeled corpus is indispensable for the training of supervised word segmenters. However, it is time-
consuming and labor-intensive to label corpus manually. During the process of typing Chinese text by Pingyin,
people usually need to type space or nu-meric keys to choose the words due to homo-phones, which can be
viewed as a cue for segmentation. We argue that such a process can be used to build a labeled corpus in a more
natural way. Thus, in this paper, we in-vestigate Natural Typing Annotations NTAs that are potential word de-
limiters produced by users while typing Chinese. A detailed analysis on over three hundred user-produced texts
containing NTAs reveals that high-quality NTAs mostly agree with gold segmentation and, consequently, can be
used for improving the performance of supervised word segmentation model in out-of-domain. Experiments
show that a classification model combined with a voting mechanism can reli-ably identify the high-quality
NTAs texts that are more readily available labeled corpus. Furthermore, the NTAs might be particularly useful
to deal with out-of-vocabulary OOV words such as proper names and neo-logisms.
2-60 One Tense per Scene: Predicting Tense in Chinese Conversations

Tao Ge, Heng Ji, Baobao Chang, and Zhifang Sui
123
We study the problem of predicting tense in Chinese conversations. The unique challenges include: 1 Chinese
verbs do not have explicit lexical or grammatical forms to indicate tense; 2 Tense information is often implicitly
hidden outside of the target sentence. To tackle these challenges, we first propose a set of novel sentence-level
local features using rich linguistic resources and then propose a new hypothesis of ne tense per scene to
incorporate scene-level global evidence to enhance the performance. Experimental results demonstrate the
power of this hybrid approach, which can serve as a new and promising benchmark.
2-61 A Language-Independent Feature Schema for Inflectional Morphology

John Sylak-Glassman, Christo Kirov, David Yarowsky, and Roger Que
This paper presents a universal morphological feature schema that represents the finest distinctions in meaning
that are expressed by overt, affixal inflectional morphology across languages. This schema is used to universal-
ize data extracted from Wiktionary via a robust multidimensional table parsing algorithm and feature mapping
algorithms, yielding 883,965 instantiated paradigms in 352 languages. These data are shown to be effective for
training morphological analyzers, yielding significant accuracy gains when applied to Durrett and DeNeros
2013 paradigm learning framework.

2-62 Rhetoric Map of an Answer to Compound Queries
Boris Galitsky, Dmitry Ilvovsky, and Sergey O. Kuznetsov
Given a discourse tree for a text as a candidate answer to a compound query, we propose a rule system for valid
and invalid occurrence of the query keywords in this tree. To be a valid answer to a query, its keywords need to
occur in a chain of elementary discourse unit of this answer so that these units are fully ordered and connected
by nucleus satellite relations. An answer might be invalid if the queries keywords occur in the answers satel-
lite discourse units only. We build the rhetoric map of an answer to prevent it from firing by queries whose
keywords occur in non-adjacent areas of the answer map. We evaluate the improvement of search relevance by
filtering out search results not satisfying the proposed rule system, demonstrating a 4
2-63 Thread-Level Information for Comment Classification in Community Question Answer-

ing
Alberto Barrn-Cedeo, Simone Filice, Giovanni Da San Martino, Shafiq Joty, Llus Mrquez,
Preslav Nakov, and Alessandro Moschitti
Community Question Answering cQA is a new application of QA in a practical context, i.e., social forums.
It presents new interesting challenges and research directions, e.g., exploiting the dependencies between the
different comments of a thread to select the best answer for a given question. In this paper, we explored two
ways of modeling such dependencies: i by designing specific features looking globally at the thread; and ii by
applying structure prediction models. We trained and evaluated our models on data from SemEval-2015 Task
3 on Answer Selection in cQA. Our experiments show that: i the thread-level features consistently improve the
performance for a variety of learning algorithms and evaluation measures, yielding state-of-the-art results; and
ii using the sequential dependencies between the answer labels is not enough to improve the results, indicating
that more information is needed in the joint model.
2-64 Learning Hybrid Representations to Retrieve Semantically Equivalent Questions

Cicero dos Santos, Luciano Barbosa, Dasha Bogdanova, and Bianca Zadrozny
Retrieving similar questions in online QA community sites is a difficult task because different users may formu-
late the same question in a variety of ways, using different vocabulary and structure. In this work, we propose
a new neural network architecture to perform the task of semantically equivalent question retrieval. The pro-
posed architecture, which we call BOW-CNN, combines a bag-of-words BOW representation with a distributed
vector representation created by a convolutional neural network CNN. We perform experiments using data col-
lected from two Stack Exchange communities. Our experimental results evidence that: 1 BOW-CNN is more
effective than BOW based information retrieval methods such as TFIDF; 2 BOW-CNN is more robust than the
pure CNN for long texts.
124
2-65 Machine Comprehension with Syntax, Frames, and Semantics

Hai Wang, Mohit Bansal, Kevin Gimpel, and David McAllester
We demonstrate significant improvement on the MCTest question answering task Richardson et al., 2013 by
augmenting baseline features with features based on syntax, frame semantics, coreference, and word embed-
dings, and combining them in a max-margin learning framework. We achieve the best results we are aware of
on this dataset, outperforming concurrently-published results. These results demonstrate a significant perfor-
mance gradient for the use of linguistic structure in machine comprehension.
2-66 A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering
Di Wang and Eric Nyberg
In this paper, we present an approach that address the answer sentence selection problem for question an-
swering. The proposed method uses a stacked bidirectional Long-Short Term Memory BLSTM network to
sequentially read words from question and answer sentences, and then outputs their relevance scores. Unlike
prior work, this approach does not require any syntactic parsing or external knowledge resources such as Word-
Net which may not be available in some domains or languages. The full system is based on a combination of the
stacked BLSTM relevance model and keywords matching. The results of our experiments on a public bench-
mark dataset from TREC show that our system outperforms previous work which requires syntactic features
and external knowledge resources.
2-67 Answer Sequence Learning with Neural Networks for Answer Selection in Community
Question Answering
Xiaoqiang Zhou, Baotian Hu, Qingcai Chen, Buzhou Tang, and Xiaolong Wang
In this paper, the answer selection problem in community question answering CQA is regarded as an answer
sequence labeling task, and a novel approach is proposed based on the recurrent architecture for this problem.
Our approach applies convolution neural networks CNNs to learning the joint representation of question-answer
pair firstly, and then uses the joint representation as input of the long short-term memory LSTM to learn the
answer sequence of a question for labeling the matching quality of each answer. Experiments conducted on the
SemEval 2015 CQA dataset shows the effectiveness of our approach.

2-68 Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilin-
gual Lexicon Induction
Ivan Vulic and Marie-Francine Moens
We propose a simple yet effective approach to learning bilingual word embeddings BWEs from non-parallel
document-aligned data based on the omnipresent skip-gram model, and its application to bilingual lexicon in-
duction BLI. We demonstrate the utility of the induced BWEs in the BLI task by reporting on benchmarking
BLI datasets for three language pairs: 1 We show that our BWE-based BLI models significantly outperform
the MuPTM-based and context-counting models in this setting, and obtain the best reported BLI results for all
three tested language pairs; 2 We also show that our BWE-based BLI models outperform other BLI models
based on recently proposed BWEs that require parallel data for bilingual training.
2-69 How Well Do Distributional Models Capture Different Types of Semantic Knowledge?
Dana Rubinstein, Effi Levi, Roy Schwartz, and Ari Rappoport
In recent years, distributional models DMs have shown great success in representing lexical semantics. In this
work we show that the extent to which DMs represent semantic knowledge is highly dependent on the type of
knowledge. We pose the task of predicting properties of concrete nouns in a supervised setting, and compare
between learning taxonomic properties e.g., animacy and attributive properties e.g., size, color. We employ
four state-of-the-art DMs as sources of feature representation for this task, and show that they all yield poor
results when tested on attributive properties, achieving no more than an average F-score of 0.37 in the binary
property prediction task, compared to 0.73 on taxonomic properties. Our results suggest that the distributional
hypothesis may not be equally applicable to all types of semantic information.
2-70 Low-Rank Tensors for Verbs in Compositional Distributional Semantics

Daniel Fried, Tamara Polajnar, and Stephen Clark
125
Several compositional distributional semantic methods use tensors to model multi-way interactions between
vectors. Unfortunately, the size of the tensors can make their use impractical in large-scale implementations. In
this paper, we investigate whether we can match the performance of full tensors with low-rank approximations
that use a fraction of the original number of parameters. We investigate the effect of low-rank tensors on the
transitive verb construction where the verb is a third-order tensor. The results show that, while the low-rank
tensors require about two orders of magnitude fewer parameters per verb, they achieve performance comparable
to, and occasionally surpassing, the unconstrained-rank tensors on sentence similarity and verb disambiguation
tasks.
2-71 Constrained Semantic Forests for Improved Discriminative Semantic Parsing
Wei Lu
In this paper, we present a model for improved discriminative semantic parsing. The model addresses an
important limitation associated with our previous state-of-the-art discriminative semantic parsing model - the
relaxed hybrid tree model by introducing our constrained semantic forests. We show that our model is able to
yield new state-of-the-art results on standard datasets even with simpler features. Our system is available for
download from http://statnlp.org/research/sp/.
2-72 Automatic Identification of Rhetorical Questions

Shohini Bhattasali, Jeremy Cytryn, Elana Feldman, and Joonsuk Park
A question may be asked not only to elicit information, but also to make a statement. Questions serving the
latter purpose, called rhetorical questions, are often lexically and syntactically indistinguishable from other
types of questions. Still, it is desirable to be able to identify rhetorical questions, as it is relevant for many
NLP tasks, including information extraction and text summarization. In this paper, we explore the largely
understudied problem of rhetorical question identification. Specifically, we present a simple n-gram based
language model to classify rhetorical questions in the Switchboard Dialogue Act Corpus. We find that a special
treatment of rhetorical questions which incorporates contextual information achieves the highest performance.

2-73 Lifelong Learning for Sentiment Classification
Zhiyuan Chen, Nianzu Ma, and Bing Liu
This paper proposes a novel lifelong learning LL approach to sentiment classification. LL mimics the human
continuous learning process, i.e., retaining the knowledge learned from past tasks and use it to help future
learning. In this paper, we first discuss LL in general and then LL for sentiment classification in particular.
The proposed LL approach adopts a Bayesian optimization framework based on stochastic gradient descent.
Our experimental results show that the proposed method outperforms baseline methods significantly, which
demonstrates that lifelong learning is a promising research direction.
2-74 Harnessing Context Incongruity for Sarcasm Detection

Aditya Joshi, Vinita Sharma, and Pushpak Bhattacharyya
The relationship between context incongruity and sarcasm has been studied in linguistics. We present a com-
putational approach to harness context incongruity as a basis for sarcasm detection. Our statistical sarcasm
classifiers incorporate two kinds of incongruity features: explicit and implicit. We show the benefit of our in-
congruity features for two text forms - tweets and discussion forum posts. Our approach also outperforms two
past works with F-score improvement of 10-20%. We also show how our features can capture inter-sentential
incongruity.
2-75 Emotion Detection in Code-switching Texts via Bilingual and Sentimental Information
Zhongqing Wang, Sophia Lee, Shoushan Li, and Guodong Zhou
Code-switching is commonly used in the free-form text environment, such as social media, and it is especially
favored in emotion expressions. Emotions in code-switching texts differ from monolingual texts in that they
can be expressed in either monolingual or bilingual forms. In this paper, we first utilize two kinds of knowledge,
i.e. bilingual and sentimental information to bridge the gap between different languages. Moreover, we use a
term-document bipartite graph to incorporate both bilingual and sentimental information, and propose a label
126
propagation based approach to learn and predict in the bipartite graph. Empirical studies demonstrate the
effectiveness of our proposed approach in detecting emotion in code-switching texts.
2-76 Model Adaptation for Personalized Opinion Analysis

Mohammad Al Boni, Keira Zhou, Hongning Wang, and Matthew S. Gerber
Humans are idiosyncratic and variable: towards the same topic, they might hold different opinions or express
the same opinion in various ways. It is hence important to model opinions at the level of individual users;
however it is impractical to estimate independent sentiment classification models for each user with limited
data. In this paper, we adopt a model-based transfer learning solutionusing linear transformations over
a generic modelfor personalized opinion analysis. Experimental results on a large collection of Amazon
reviews confirm our method significantly outperforms a user-independent generic model as well as several
state-of-the-art transfer learning algorithms.
2-77 Linguistic Template Extraction for Recognizing Reader-Emotion and Emotional Reso-
nance Writing Assistance
Yung-Chun Chang, Cen-Chieh Chen, Yu-lun Hsieh, Chien Chin Chen, and Wen-Lian Hsu
In this paper, we propose a flexible principle-based approach PBA for reader-emotion classification and writing
assistance. PBA is a highly automated process that learns emotion templates from raw texts to characterize an
emotion and is comprehensible for humans. These templates are adopted to predict reader-emotion, and may
further assist in emotional resonance writing. Experiment results demonstrate that PBA can effectively detect
reader-emotions by exploiting the syntactic structures and semantic associations in the context, thus outper-
forming well-known statistical text classification methods and the state-of-the-art reader-emotion classification
method. Moreover, writers are able to create more emotional resonance in articles under the assistance of the
generated emotion templates. These templates have been proven to be highly interpretable, which is an attribute
that is difficult to accomplish in traditional statistical methods.
2-78 Aspect-Level Cross-lingual Sentiment Classification with Constrained SMT

Patrik Lambert
Most cross-lingual sentiment classification CLSC research so far has been performed at sentence or document
level. Aspect-level CLSC, which is more appropriate for many applications, presents the additional difficulty
that we consider subsentential opinionated units which have to be mapped across languages. In this paper, we
extend the possible cross-lingual sentiment analysis settings to aspect-level specific use cases. We propose a
method, based on constrained SMT, to transfer opinionated units across languages by preserving their bound-
aries. We show that cross-language sentiment classifiers built with this method achieve comparable results to
monolingual ones, and we compare different cross-lingual settings.
2-79 Predicting Valence-Arousal Ratings of Words Using a Weighted Graph Method

Liang-Chih Yu, Jin Wang, K. Robert Lai, and Xue-jie Zhang
Compared to the categorical approach that represents affective states as several discrete classes e.g., positive
and negative, the dimensional approach represents affective states as continuous numerical values on multiple
dimensions, such as the valence-arousal VA space, thus allowing for more fine-grained sentiment analysis.
In building dimensional sentiment applications, affective lexicons with valence-arousal ratings are useful re-
sources but are still very rare. Therefore, this study proposes a weighted graph model that considers both the
relations of multiple nodes and their similarities as weights to automatically determine the VA ratings of af-
fective words. Experiments on both English and Chinese affective lexicons show that the proposed method
yielded a smaller error rate on VA prediction than the linear regression, kernel method, and pagerank algorithm
used in previous studies.

2-80 Multi-domain Dialog State Tracking using Recurrent Neural Networks
Nikola Mrkic, Diarmuid OSaghdha, Blaise Thomson, Milica Gasic, Pei-Hao Su, David Vandyke,
Tsung-Hsien Wen, and Steve Young
127
Dialog state tracking is a key component of many modern dialog systems, most of which are designed with a
single, well-defined domain in mind. This paper shows that dialog data drawn from different dialog domains
can be used to train a general belief tracking model which can operate across all of these domains, exhibiting
superior performance to each of the domain-specific models. We propose a training procedure which uses
out-of-domain data to initialise belief tracking models for entirely new domains. This procedure leads to im-
provements in belief tracking performance regardless of the amount of in-domain data available for training the
model.
2-81 Dialogue Management based on Sentence Clustering
Wendong Ge and Bo Xu
Dialogue Management DM is a key issue in Spoken Dialogue System SDS. Most of the existing studies on DM
use Dialogue Act DA to represent semantic information of sentence, which might not represent the nuanced
meaning sometimes. In this paper, we model DM based on sentence clusters which have more powerful seman-
tic representation ability than DAs. Firstly, sentences are clustered not only based on the internal information
such as words and sentence structures, but also based on the external information such as context in dialogue
via Recurrent Neural Networks. Additionally, the DM problem is modeled as a Partially Observable Markov
Decision Processes POMDP with sentence clusters. Finally, experimental results illustrate that the proposed
DM scheme is superior to the existing one.
2-82 Compact Lexicon Selection with Spectral Methods

Young-Bum Kim, Karl Stratos, Xiaohu Liu, and Ruhi Sarikaya
In this paper, we introduce the task of selecting compact lexicon from large, noisy gazetteers. This scenario
arises often in practice, in particular spoken language understanding SLU. We propose a simple and effective
solution based on matrix decomposition techniques: canonical correlation analysis CCA and rank-revealing
QR RRQR factorization. CCA is first used to derive low-dimensional gazetteer embeddings from domain-
specific search logs. Then RRQR is used to find a subset of these embeddings whose span approximates the
entire lexicon space. Experiments on slot tagging show that our method yields a small set of lexicon entities
with average relative error reduction of 50
2-83 The Impact of Listener Gaze on Predicting Reference Resolution

Nikolina Koleva, Martin Villalba, Maria Staudte, and Alexander Koller
We investigate the impact of listener gaze on predicting reference resolution in situated interactions. We extend
an existing model that predicts to which entity in the environment listeners will resolve a referring expression
RE. Our model makes use of features that capture which objects were looked at and for how long, reflecting
listeners visual behavior. We improve a probabilistic model that considers a basic set of features for monitoring
listeners movements in a virtual environment. Particularly, in complex referential scenes, where more objects
next to the target are possible referents, gaze turns out to be beneficial and helps deciphering listeners intention.
We evaluate performance at several prediction times before the listener performs an action, obtaining a highly
significant accuracy gain.
2-84 A Simultaneous Recognition Framework for the Spoken Language Understanding Module
of Intelligent Personal Assistant Software on Smart Phones
Changsu Lee, Youngjoong Ko, and Jungyun Seo
The intelligent personal assistant soft-ware such as the Apples Siri and Sam-sungs S-Voice has been issued
these days. This paper introduces a novel Spoken Language Understanding SLU module to predict users
intention for determining system actions of the intelligent personal assistant software. The SLU module usually
consists of several connected recognition tasks on a pipeline framework, whereas the proposed SLU module
simultaneously recognizes four recognition tasks on a recognition framework using Conditional Random Fields
CRF. The four tasks include named entity, speech-act, target and operation recognition. In the experiments, the
new simultaneous recognition method achieves the higher performance of 4
128
2-85 A Deeper Exploration of the Standard PB-SMT Approach to Text Simplification and its
Evaluation
Sanja Stajner, Hannah Bechara, and Horacio Saggion
In the last few years, there has been a growing number of studies addressing the Text Simplification TS task as
a monolingual machine translation MT problem which translates from riginal to imple language. Motivated
by those results, we investigate the influence of quality vs quantity of the training data on the effectiveness of
such a MT approach to text simplification. We conduct 40 experiments on the aligned sentences from English
Wikipedia and Simple English Wikipedia, controlling for: 1 the similarity between the original and simplified
sentences in the training and development datasets, and 2 the sizes of those datasets. The results suggest that
in the standard PB-SMT approach to text simplification the quality of the datasets has a greater impact on the
system performance. Additionally, we point out several important differences between cross-lingual MT and
monolingual MT used in text simplification, and show that BLEU is not a good measure of system performance
in text simplification task.
2-86 Learning Summary Prior Representation for Extractive Summarization

Ziqiang Cao, Furu Wei, Sujian Li, Wenjie Li, Ming Zhou, and Houfeng Wang
In this paper, we propose the concept of summary prior to define how much a sentence is appropriate to be
selected into summary without consideration of its context. Different from previous work using manually com-
piled document-independent features, we develop a novel summary system called PriorSum, which applies the
enhanced convolutional neural networks to capture the summary prior features derived from length-variable
phrases. Under a regression framework, the learned prior features are concatenated with document-dependent
features for sentence ranking. Experiments on the DUC generic summarization benchmarks show that Prior-
Sum can discover different aspects supporting the summary prior and outperform state-of-the-art baselines.
2-87 A Methodology for Evaluating Timeline Generation Algorithms based on Deep Semantic
Units
Sandro Bauer and Simone Teufel
Timeline generation is a summarisation task which transforms a narrative, roughly chronological input text into
a set of timestamped summary sentences, each expressing an atomic historical event. We present a methodology
for evaluating systems which create such timelines, based on a novel corpus consisting of 36 human-created
timelines. Our evaluation relies on deep semantic units which we call historical content units. An advantage of
our approach is that it does not require human annotation of new system summaries.
2-88 Unsupervised extractive summarization via coverage maximization with syntactic and se-
mantic concepts
Natalie Schluter and Anders Sgaard
Coverage maximization with bigram concepts is a state-of-the-art approach to unsupervised extractive summa-
rization. It has been argued that such concepts are adequate and, in contrast to more linguistic concepts such
as named entities or syntactic dependencies, more robust, since they do not rely on automatic processing. In
this paper, we show that while this seems to be the case for a commonly used newswire dataset, syntactic and
semantic concepts lead to significant improvements in performance in other domains.

2-89 Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Net-
work Parser
Long Duong, Trevor Cohn, Steven Bird, and Paul Cook
Training a high-accuracy dependency parser requires a large treebank. However, these are costly and time-
consuming to build. We propose a learning method that needs less data, based on the observation that there
are underlying shared structures across languages. We exploit cues from a different source language in order
to guide the learning process. Our model saves at least half of the annotation effort to reach the same accuracy
compared with using the purely supervised method.
2-90 Semantic Structure Analysis of Noun Phrases using Abstract Meaning Representation
Yuichiro Sawai, Hiroyuki Shindo, and Yuji Matsumoto
129
Student Research Workshop
We propose a method for semantic structure analysis of noun phrases using Abstract Meaning Representation
AMR. AMR is a graph representation for the meaning of a sentence, in which noun phrases NPs are manually
annotated with internal structure and semantic relations. We extract NPs from the AMR corpus and construct a
data set of NP semantic structures. We also propose a transition-based algorithm which jointly identifies both
the nodes in a semantic structure tree and semantic relations between them. Compared to the baseline, our
method improves the performance of NP semantic structure analysis by 2.7 points, while further incorporating
external dictionary boosts the performance by 7.1 points.
2-91 Boosting Transition-based AMR Parsing with Refined Actions and Auxiliary Analyzers
Chuan Wang, Nianwen Xue, and Sameer Pradhan
We report improved AMR parsing results by adding a new action to a transition-based AMR parser to infer
abstract concepts and by incorporating richer features produced by auxiliary analyzers such as a semantic role
labeler and a coreference resolver. We report final AMR parsing results that show an improvement of 7
2-92 Generative Incremental Dependency Parsing with Neural Networks

Jan Buys and Phil Blunsom
We propose a neural network model for scalable generative transition-based dependency parsing. A probability
distribution over both sentences and transition sequences is parameterised by a feed-forward neural network.
The model surpasses the accuracy and speed of previous generative dependency parsers, reaching 91.1% UAS.
Perplexity results show a strong improvement over n-gram language models, opening the way to the efficient
integration of syntax into neural models for language generation.
2-93 Labeled Grammar Induction with Minimal Supervision

Yonatan Bisk, Christos Christodoulopoulos, and Julia Hockenmaier
Nearly all work in unsupervised grammar induction aims to induce unlabeled dependency trees from gold part-
of-speech-tagged text. These clean linguistic classes provide a very important, though unrealistic, inductive
bias. Conversely, induced clusters are very noisy. We show here, for the first time, that very limited human
supervision three frequent words per cluster may be required to induce labeled dependencies from automatically
induced word clusters.
2-94 On the Importance of Ezafe Construction in Persian Parsing
Alireza Nourian, Mohammad Sadegh Rasooli, Mohsen Imany, and Heshaam Faili
Ezafe construction is an idiosyncratic phenomenon in the Persian language. It is a good indicator for phrase
boundaries and dependency relations but mostly does not appear in the text. In this paper, we show that adding
information about Ezafe construction can give 4.6
Student Research Workshop
16:30 17:55 Oral Presentations

16:30 16:45 Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue
Systems
Yun-Nung Chen
16:45 16:55 Leveraging Compounds to Improve Noun Phrase Translation from Chinese and
German
Xiao Pu, Laura Mascarell, Andrei Popescu-Belis, Mark Fishel, Ngoc-Quang Luong,
and Martin Volk
16:55 17:10 Learning Representations for Text-level Discourse Parsing
Gregor Weiss
17:10 17:20 Transition-based Dependency DAG Parsing Using Dynamic Oracles
Alper Tokgoz and Gulsen Eryigit
130
17:20 17:30 Disease Event Detection based on Deep Modality Analysis

Yoshiaki Kitagawa, Mamoru Komachi, Eiji Aramaki, Naoaki Okazaki, and
Hiroshi Ishikawa
17:30 17:40 Evaluation Dataset and System for Japanese Lexical Simplification
Tomoyuki Kajiwara and Kazuhide Yamamoto
17:40 17:55 Learning to Map Dependency Parses to Abstract Meaning Representations
Wei-Te Chen
18:00 19:30 Dinner and Poster Session (alongside main conference short poster session)
131
Social Event
Tuesday, July 28, 2015, 19:4522:00
Plenary Hall B
The ACL 2015 Social Event will be held immediately following the Tuesday Poster Session and
dinner in the China national convention center (CNCC). Here you will enjoy desserts, coffee, tea and
wine. Bring your boots and hats and enjoy dancing. Enjoy networking with colleagues and have a
relaxing evening!
We hope to make your conference experience not only enlightening but also entertaining and enjoy-
able!
132
Main Conference: Wednesday, July 29
5
Overview

9:00 10:00 Keynote Address: "Construction and Mining of Heterogenous Information Net-
works from Data" - Jiawei Han Plenary Hall B
Session 8
Machine Learn- Automatic Linguistic and NLP for the Text Cat-
ing: Neural Summarization Psycholinguis- Web: Social egoriza-
10:30 11:45
Networks tic Aspects of Media tion/Information
NLP Retrieval
11:45 13:00 Lunch Break
13:00 14:35 ACL Business Meeting Plenary Hall B
Session 9
Multilinguality Word Segmen- Morphology, NLP for the POS Tagging
14:35 15:25
tation Phonology Web: Twitter
15:55 17:10 Session BP: Best Paper Session Plenary Hall B
17:10 18:30 Lifetime Achievement Award Plenary Hall B
18:30 19:00 Closing Session Plenary Hall B
133
Main Conference
134
Wednesday, July 29, 2015
Keynote Address: Jiawei Han
Chair: Chengqing Zong
Construction and Mining of Heterogeneous Information Networks

from Text Data
Wednesday, July 29, 2015, 9:0010:00am
Plenary Hall B
Abstract:
The real-world data are unstructured but interconnected. The majority of such data is in the form of
natural language text. One of the grand challenges is to turn such massive data into actionable knowl-
edge. In this talk, we present our vision on how to turn massive unstructured, text-rich, but intercon-
nected data into knowledge. We propose a data-to-network-to-knowledge (i.e., D2N2K) paradigm,
which is to first turn data into relatively structured heterogeneous information networks, and then
mine such text-rich and structure-rich heterogeneous networks to generate useful knowledge. We
show why such a paradigm represents a promising direction and present some recent progress on the
development of effective methods for construction and mining of structured heterogeneous informa-
tion networks from text data.
Biography: Jiawei Han is Abel Bliss Professor in the Department of Computer Science, University
of Illinois at Urbana-Champaign. He has been researching into data mining, information network
analysis, database systems, and data warehousing, with over 600 journal and conference publica-
tions. He has chaired or served on many program committees of international conferences, including
PC co-chair for KDD, SDM, and ICDM conferences, and Americas Coordinator for VLDB confer-
ences. He also served as the founding Editor-In-Chief of ACM Transactions on Knowledge Dis-
covery from Data and is serving as the Director of Information Network Academic Research Center
supported by U.S. Army Research Lab, and Director of KnowEnG, an NIH funded Center of Excel-
lence in Big Data Computing. He is a Fellow of ACM and Fellow of IEEE, and received 2004 ACM
SIGKDD Innovations Award, 2005 IEEE Computer Society Technical Achievement Award, 2009
IEEE Computer Society Wallace McDowell Award, and 2011 Daniel C. Drucker Eminent Faculty
Award at UIUC. His co-authored book "Data Mining: Concepts and Techniques" has been adopted
as a textbook popularly worldwide.
135
Session 8
Session 8 Overview Wednesday, July 29, 2015

Machine Learn- Automatic Sum- Linguistic and NLP for the Text Cat-
ing: Neural marization Psycholinguis- Web: Social egoriza-
Networks tic Aspects of Media tion/Information
NLP Retrieval
Improved Se- Abstractive Unsupervised Linguistic SOLAR: Scal-
10:30
mantic Repre- Multi- Prediction ofHarbingers of able Online

sentations From Document AcceptabilityBetrayal: A Learning Al-
Tree-Structured Summariza- Judgements Case Study on gorithms for
Long Short- tion via Phrase an Online Strat-
Lau, Clark, and Ranking
Term Memory Selection and Lappin egy Game Wang, Wan,
Networks Merging Niculae, Kumar, Zhang, and Hoi
Tai, Socher, and Bing, Li, Liao, Boyd-Graber,
Manning Lam, Guo, and and Danescu-
Passonneau Niculescu-Mizil
genCNN: A Joint Graph- A Frame of Who caught a Deep Un-
10:55
Convolutional ical Models Mind: Using cold ? - Identi- ordered Com-

Architecture for for Date Selec-
Statistical Mod- fying the sub- position Rivals
Word Sequence tion in Timeline els for Detec- ject of a symp- Syntactic Meth-
Prediction Summarization tion of Framing tom ods for Text
Wang, Lu, Li, Tran, Herder, and Agenda Kanouchi, Ko- Classification
Jiang, and Liu and Markert Setting Cam- machi, Okazaki, Iyyer, Manju-
paigns Aramaki, and natha, Boyd-
Tsur, Calacci, Ishikawa Graber, and
and Lazer Daum III
Neural Re- Predicting Why discourse Weakly Su- Text Catego-
11:20
sponding Ma- Salient Updates affects speak- pervised Role rization as a

chine for Short- for Disaster ers choice of Identification Graph Classifi-
Text Conversa- Summarization referring ex- in Teamwork cation Problem
tion Kedzie, McKe- pressions Interactions Rousseau, Kia-
Shang, Lu, and own, and Diaz Orita, Vornov, Yang, Wen, and gias, and Vazir-
Li Feldman, and Rose giannis
Daum III
136
Parallel Session 8
Session 8A: Machine Learning: Neural Networks

Plenary Hall B Chair: Xavier Carreras
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Net-
works
Kai Sheng Tai, Richard Socher, and Christopher D. Manning 10:3010:55
Because of their superior ability to preserve sequence information over time, Long Short-Term Memory LSTM
networks, a type of recurrent neural network with a more complex computational unit, have obtained strong
results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored
so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine
words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topolo-
gies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the
semantic relatedness of two sentences SemEval 2014, Task 1 and sentiment classification Stanford Sentiment
Treebank.
genCNN: A Convolutional Architecture for Word Sequence Prediction
Mingxuan Wang, Zhengdong Lu, Hang Li, Wenbin Jiang, and Qun Liu 10:5511:20
We propose a novel convolutional architecture, named genCNN, for word sequence prediction. Different from
previous work on neural network-based language modeling and generation e.g., RNN or LSTM, we choose
not to greedily summarize the history of words as a fixed length vector. Instead, we use a convolutional neural
network to predict the next word with the history of words of variable length. Also different from the existing
feedforward networks for language modeling, our model can effectively fuse the local correlation and global
correlation in the word sequence, with a convolution-gating strategy specifically designed for the task. We
argue that our model can give adequate representation of the history, and therefore can naturally exploit both
the short and long range dependencies. Our model is fast, easy to train, and readily parallelized. Our extensive
experiments on text generation and n-best re-ranking in machine translation show that genCNN outperforms
the state-of-the-arts with big margins.
Neural Responding Machine for Short-Text Conversation

Lifeng Shang, Zhengdong Lu, and Hang Li 11:2011:45
We propose Neural Responding MachineNRM, a neural network-based response generator for Short-Text Con-
versation. NRM takes the general encoder-decoder framework: it formalizes the generation of response as a
decoding process based on the latent representation of the input text, while both encoding and decoding are re-
alized with recurrent neural networksRNN. The NRM is trained with a large amount of one-round conversation
data collected from a microblogging service. Empirical study shows that NRM can generate grammatically
correct and content-wise appropriate responses to over 75
137
Session 8
Session 8B: Automatic Summarization

309A Chair: Min-Yen Kan
Abstractive Multi-Document Summarization via Phrase Selection and Merging
Lidong Bing, Piji Li, Yi Liao, Wai Lam, Weiwei Guo, and Rebecca Passonneau 10:3010:55
We propose an abstraction-based multi-document summarization framework that can construct new sentences
by exploring more fine-grained syntactic units than sentences, namely, noun/verb phrases. Different from
existing abstraction-based approaches, our method first constructs a pool of concepts and facts represented by
phrases from the input documents. Then new sentences are generated by selecting and merging informative
phrases to maximize the salience of phrases and meanwhile satisfy the sentence construction constraints. We
employ integer linear optimization for conducting phrase selection and merging simultaneously in order to
achieve the global optimal solution for a summary. Experimental results on the benchmark data set TAC 2011
show that our framework outperforms the state-of-the-art models under automated pyramid evaluation metric,
and achieves reasonably well results on manual linguistic quality evaluation.
Joint Graphical Models for Date Selection in Timeline Summarization

Giang Tran, Eelco Herder, and Katja Markert 10:5511:20
Automatic timeline summarization TLS generates precise, dated overviews over often prolonged events, such
as wars or economic crises. One subtask of TLS selects the most important dates for an event within a certain
time frame. Date selection has up to now been handled via supervised machine learning approaches that
estimate the importance of each date separately, using features such as the frequency of date mentions in news
corpora. This approach neglects interactions between different dates that occur due to connections between
subevents. We therefore suggest a joint graphical model for date selection. Even unsupervised versions of this
model perform as well as supervised state-of-the-art approaches. With parameter tuning on training data, it
outperforms prior supervised models by a considerable margin.
Predicting Salient Updates for Disaster Summarization

Chris Kedzie, Kathleen McKeown, and Fernando Diaz 11:2011:45
During crises such as natural disasters or other human tragedies, information needs of both civilians and re-
sponders often require urgent, specialized treatment. Moni- toring and summarizing a text stream during
such an event remains a difficult problem. We present a system for update summarization which predicts the
salience of sentences with respect to an event and then uses these predictions to directly bias a clustering algo-
rithm for sentence selection, increasing the quality of the updates. We use novel, disaster-specific features
for salience prediction, including geo-locations and language models representing the language of disaster.
Our evaluation on a standard set of retrospective events using ROUGE shows that salience prediction provides
a significant improvement over other approaches.
138
Session 8C: Linguistic and Psycholinguistic Aspects of NLP

310 Chair: Dirk Hovy
Unsupervised Prediction of Acceptability Judgements
Jey Han Lau, Alexander Clark, and Shalom Lappin 10:3010:55
In this paper we present the task of unsupervised prediction of speakers acceptability judgements. We use a
test set generated from the British National Corpus BNC containing both grammatical sentences and sentences
containing a variety of syntactic infelicities introduced by round trip machine translation. This set was an-
notated for acceptability judgements through crowd sourcing. We trained a variety of unsupervised language
models on the original BNC, and tested them to see the extent to which they could predict mean speakers
judgements on the test set. To map probability to acceptability, we experimented with several normalisation
functions to neutralise the effects of sentence length and word frequencies. We found encouraging results with
the unsupervised models predicting acceptability across two different datasets. Our methodology is highly
portable to other domains and languages, and the approach has potential implications for the representation
and the acquisition of linguistic knowledge.
A Frame of Mind: Using Statistical Models for Detection of Framing and Agenda Setting Cam-
paigns
Oren Tsur, Dan Calacci, and David Lazer 10:5511:20
Framing is a sophisticated form of discourse in which the speaker tries to induce a cognitive bias through
consistent linkage between a topic and a specific context frame. We build on political science and communi-
cation theory and use probabilistic topic models combined with time series regression analysis autoregressive
distributed-lag models to gain insights about the language dynamics in the political processes. Processing four
years of public statements issued by members of the U.S. Congress, our results provide a glimpse into the
complex dynamic processes of framing, attention shifts and agenda setting, commonly known as spin. We
further provide new evidence for the divergence in party discipline in U.S. politics.
Why discourse affects speakers choice of referring expressions

Naho Orita, Eliana Vornov, Naomi Feldman, and Hal Daum III 11:2011:45
We propose a language production model that uses dynamic discourse information to account for speakers
choices of referring expressions. Our model extends previous rational speech act models Frank and Good-
man, 2012 to more naturally distributed linguistic data, instead of assuming a controlled experimental setting.
Simulations show a close match between speakers utterances and model predictions, indicating that speakers
behavior can be modeled in a principled way by considering the probabilities of referents in the discourse and
the information conveyed by each word.
139
Session 8
Session 8D: NLP for the Web: Social Media

311A Chair: Alan Ritter
Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game
Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil 10:30
10:55
Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore
linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where
players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely
to last and examine temporal patterns that foretell betrayal.We reveal that subtle signs of imminent betrayal
are encoded in the conversational patterns of the dyad, even if the victim is not aware of the relationships
fate. In particular, we find that lasting friendships exhibit a form of balance that manifests itself through
language. In contrast, sudden changes in the balance of certain conversational attributessuch as positive
sentiment, politeness, or focus on future planningsignal impending betrayal.
Who caught a cold ? - Identifying the subject of a symptom

Shin Kanouchi, Mamoru Komachi, Naoaki Okazaki, Eiji Aramaki, and Hiroshi Ishikawa 10:5511:20
The development and proliferation of social media services has led to the emergence of new approaches for
surveying the population and addressing social issues. One popular application of social media data is health
surveillance, e.g., predicting the outbreak of an epidemic by recognizing diseases and symptoms from text
messages posted on social media platforms. In this paper, we propose a novel task that is crucial and generic
from the viewpoint of health surveillance: estimating a subject carrier of a disease or symptom mentioned in a
Japanese tweet. By designing an annotation guideline for labeling the subject of a disease/symptom in a tweet,
we perform annotations on an existing corpus for public surveillance. In addition, we present a supervised
approach for predicting the subject of a disease/symptom. The results of our experiments demonstrate the
impact of subject identification on the effective detection of an episode of a disease/symptom. Moreover, the
results suggest that our task is independent of the type of disease/symptom.
Weakly Supervised Role Identification in Teamwork Interactions

Diyi Yang, Miaomiao Wen, and Carolyn Rose 11:2011:45
In this paper, we model conversational roles in terms of distributions of turn level behaviors, including con-
versation acts and stylistic markers, as they occur over the whole interaction. This work presents a lightly su-
pervised approach to inducing role definitions over sets of contributions within an extended interaction, where
the supervision comes in the form of an outcome measure from the interaction. The identified role definitions
enable a mapping from behavior profiles of each participant in an interaction to limited sized feature vectors
that can be used effectively to predict the teamwork outcome. An empirical evaluation applied to two Massive
Open Online Course MOOCs datasets demonstrates that this approach yields superior performance in learning
representations for predicting the teamwork outcome over several baselines.
140
Session 8E: Text Categorization/Information Retrieval

311B Chair: Maarten De Rijke
SOLAR: Scalable Online Learning Algorithms for Ranking
Jialei Wang, Ji Wan, Yongdong Zhang, and Steven Hoi 10:3010:55
Traditional learning to rank methods learn ranking models from training data in a batch and offline learning
mode, which suffers from some critical limitations, e.g., poor scalability as the model has to be re-trained from
scratch whenever new training data arrives. This is clearly non-scalable for many real applications in practice
where training data often arrives sequentially and frequently. To overcome the limitations, this paper presents
SOLAR a new framework of Scalable Online Learning Algorithms for Ranking, to tackle the challenge
of scalable learning to rank. Specifically, we propose two novel SOLAR algorithms and analyze their IR
measure bounds theoretically. We conduct extensive empirical studies by comparing our SOLAR algorithms
with conventional learning to rank algorithms on benchmark testbeds, in which promising results validate the
efficacy and scalability of the proposed novel SOLAR algorithms.
Deep Unordered Composition Rivals Syntactic Methods for Text Classification

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daum III 10:5511:20
Many existing deep learning models for natural language processing tasks focus on learning the composition-
ality of their inputs, which requires many expensive computations. We present a simple deep neural network
that competes with and, in some cases, outperforms such models on sentiment analysis and factoid question
answering tasks while taking only a fraction of the training time. While our model is syntactically-ignorant,
we show significant improvements over previous bag-of-words models by deepening our network and applying
a novel variant of dropout. Moreover, our model performs better than syntactic models on datasets with high
syntactic variance. We show that our model makes similar errors to syntactically-aware models, indicating
that for the tasks we consider, nonlinearly transforming the input is more important than tailoring a network to
incorporate word order and syntax.
Text Categorization as a Graph Classification Problem

Francois Rousseau, Emmanouil Kiagias, and Michalis Vazirgiannis 11:2011:45
In this paper, we consider the task of text categorization as a graph classification problem. By representing
textual documents as graph-of-words instead of historical n-gram bag-of-words, we extract more discriminative
features that correspond to long-distance n-grams through frequent subgraph mining. Moreover, by capitalizing
on the concept of k-core, we reduce the graph representation to its densest part - its main core - speeding up
the feature extraction step for little to no cost in prediction performances. Experiments on four standard text
classification datasets show statistically significant higher accuracy and macro-averaged F1-score compared to
baseline approaches.
141
ACL Business Meeting
Date: Wednesday, July 29, 2015

Time: 13:0014:30
Venue: Plenary Hall B
All attendees are encouraged to participate in the business meeting.
142
Session 9 Overview Wednesday, July 29, 2015

Multilinguality Word Segmenta- Morphology, NLP for the POS Tagging
tion Phonology Web: Twitter
Inverted index- Accurate [TACL] An An analysis of Inducing Word
14:35
ing for cross- Linear-Time Unsupervised the user occu- and Part-of-
lingual NLP Chinese Word Method for pational class Speech with
Sgaard, Agic, Segmentation Uncovering through Twitter Pitman-Yor
Alonso, Plank, via Embedding Morphological content Hidden Semi-
Bohnet, and Matching Chains Preotiuc-Pietro, Markov Models
Johannsen Ma and Hin- Narasimhan, Lampos, and Uchiumi,
richs Barzilay, and Aletras Tsukahara, and
Jaakkola Mochihashi
Multi-Task Gated Recur- [TACL] Mod- Tracking un- Coupled Se-
15:00
Learning for sive Neural eling Word bounded Topic quence Label-
Multiple Lan- Network for Forms Using Streams ing on Hetero-
guage Transla- Chinese Word Latent Underly- Wurzer, geneous An-
tion Segmentation ing Morphs and Lavrenko, and notations: POS
Dong, Wu, He, Chen, Qiu, Zhu, Phonology Osborne Tagging as a
Yu, and Wang and Huang Cotterell, Peng, Case Study
and Eisner Li, Chao,
Zhang, and
Chen
143
Session 9
Parallel Session 9
Session 9A: Multilinguality

Plenary Hall B Chair: Keh-Yih Su
Inverted indexing for cross-lingual NLP
Anders Sgaard, eljko Agic, Hctor Martnez Alonso, Barbara Plank, Bernd Bohnet, and Anders
Johannsen 14:3515:00
We present a novel, count-based approach to obtaining inter-lingual word representations based on inverted
indexing of Wikipedia. We present experiments applying these representations to 17 datasets in document
classification, POS tagging, dependency parsing, and word alignment. Our approach has the advantage that it
is simple, computationally efficient and almost parameter-free, and, more importantly, it enables multi-source
cross-lingual learning. In 14/17 cases, we improve over using state-of-the-art bilingual embeddings.
Multi-Task Learning for Multiple Language Translation

Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang 15:0015:25
In this paper, we investigate the problem of learning a machine translation model that can simultaneously
translate sentences from one source language to multiple target languages. Our solution is inspired by the re-
cently proposed neural machine translation model which generalizes machine translation as a sequence learning
problem. We extend the neural machine translation to a multi-task learning framework which shares source lan-
guage representation and separates the modeling of different target language translation. Our framework can
be applied to situations where either large amounts of parallel data or limited parallel data is available. Exper-
iments show that our multi-task learning model is able to achieve significantly higher translation quality over
individually learned model in both situations on the data sets publicly available.
144
Session 9B: Word Segmentation

309A Chair: Qun Liu
Accurate Linear-Time Chinese Word Segmentation via Embedding Matching
Jianqiang Ma and Erhard Hinrichs 14:3515:00
This paper proposes an embedding matching approach to Chinese word segmentation, which generalizes the
traditional sequence labeling framework and takes advantage of distributed representations. The training and
prediction algorithms of the model have linear-time complexity. Based on the proposed model, a greedy
segmenter is developed and evaluated on benchmark corpora. Experiments show that our greedy segmenter
achieves improved results over previous embedding-based word segmenters, and its performance is compet-
itive with state- of-the-art methods, despite its simple feature set and the absence of external resources for
training.
Gated Recursive Neural Network for Chinese Word Segmentation

Xinchi Chen, Xipeng Qiu, Chenxi Zhu, and Xuanjing Huang 15:0015:25
Recently, neural network models for natural language processing tasks have been increasingly focused on for
their ability of alleviating the burden of manual feature engineering. However, the previous neural models
cannot extract the complicated feature compositions as the traditional methods with discrete features. In this
paper, we propose a gated recursive neural network GRNN for Chinese word segmentation, which contains
reset and update gates to incorporate the complicated combinations of the context characters. Since GRNN is
relative deep, we also use a supervised layer-wise training method to avoid the problem of gradient diffusion.
Experiments on the benchmark datasets show that our model outperforms the previous neural network models
as well as the state-of-the-art methods.
145
Session 9
Session 9C: Morphology, Phonology

310 Chair: Grzegorz Kondrak
[TACL] An Unsupervised Method for Uncovering Morphological Chains
Karthik Narasimhan, Regina Barzilay, and Tommi Jaakkola 14:3515:00
Most state-of-the-art systems today produce morphological analysis based only on orthographic patterns. In
contrast, we propose a model for unsupervised morphological analysis that integrates orthographic and se-
mantic views of words. We model word formation in terms of morphological chains, from base words to the
observed words, breaking the chains into parent-child relations. We use log-linear models with morpheme and
word-level features to predict possible parents, including their modifications, for each word. The limited set
of candidate parents for each word render contrastive estimation feasible. Our model consistently matches or
outperforms five state-of-the-art systems on Arabic, English and Turkish.
[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology
Ryan Cotterell, Nanyun Peng, and Jason Eisner 15:0015:25
The observed pronunciations or spellings of words are often explained as arising from the "underlying forms"
of their morphemes. These forms are latent strings that linguists try to reconstruct by hand. We propose to
reconstruct them automatically at scale, enabling generalization to new words. Given some surface word types
of a concatenative language along with the abstract morpheme sequences that they express, we show how to
recover consistent underlying forms for these morphemes, together with the (stochastic) phonology that maps
each concatenation of underlying forms to a surface form. Our technique involves loopy belief propagation in a
natural directed graphical model whose variables are unknown strings and whose conditional distributions are
encoded as finite-state machines with trainable weights. We define training and evaluation paradigms for the
task of surface word prediction, and report results on subsets of 6 languages.
146
Session 9D: NLP for the Web: Twitter

311A Chair: Oren Tsur
An analysis of the user occupational class through Twitter content
Daniel Preotiuc-Pietro, Vasileios Lampos, and Nikolaos Aletras 14:3515:00
Social media content can be used as a complementary source to the traditional methods for extracting and
studying collective social attributes. This study focuses on the prediction of the occupational class for a public
user profile. Our analysis is conducted on a new annotated corpus of Twitter users, their respective job titles,
posted textual content and platform-related attributes. We frame our task as classification using latent feature
representations such as word clusters and embeddings. The employed linear and, especially, non-linear methods
can predict a users occupational class with strong accuracy for the coarsest level of a standard occupation
taxonomy which includes nine classes. Combined with a qualitative assessment, the derived results confirm the
feasibility of our approach in inferring a new user attribute that can be embedded in a multitude of downstream
applications.
Tracking unbounded Topic Streams

Dominik Wurzer, Victor Lavrenko, and Miles Osborne 15:0015:25
Tracking topics on social media streams is non-trivial as the number of topics mentioned grows without bound.
This complexity is compounded when we want to track such topics against other fast moving streams. We
go beyond traditional small scale topic tracking and consider a stream of topics against another document
stream. We introduce two tracking approaches which are fully applicable to true streaming environments.
When tracking 4.4 million topics against 52 million documents in constant time and space, we demonstrate
that counter to expectations, simple single-pass clustering can outperform locality sensitive hashing for nearest
neighbour search on streams.
147
Session 9
Session 9E: POS Tagging

311B Chair: Yue Zhang
Inducing Word and Part-of-Speech with Pitman-Yor Hidden Semi-Markov Models
Kei Uchiumi, Hiroshi Tsukahara, and Daichi Mochihashi 14:3515:00
We propose a nonparametric Bayesian model for joint unsupervised word segmentation and part-of-speech
tagging. Extending previous model for word segmentation, our model is called a Pitman-Yor Hidden Semi-
Markov Model PYHSMM and considered as a method to build a class n-gram language model directly on
raw strings, while integrating character and word level information. Experimental results on standard datasets
on Japanese, Chinese and Thai revealed it outperforms previous results to yield the state-of-the-art accuracies.
This model will also serve to analyze a structure of a language whose words are not identified a priori.
Coupled Sequence Labeling on Heterogeneous Annotations: POS Tagging as a Case Study

Zhenghua Li, Jiayuan Chao, Min Zhang, and Wenliang Chen 15:0015:25
In order to effectively utilize multiple datasets with heterogeneous annotations, this paper proposes a coupled
sequence labeling model that can directly learn and infer two heterogeneous annotations simultaneously, and
to facilitate discussion we use Chinese part-of-speech POS tagging as our case study. The key idea is to bundle
two sets of POS tags together e.g. [NN,n], and build a conditional random field CRF based tagging model
in the enlarged space of bundled tags with the help of ambiguous labelings. To train our model on two non-
overlapping datasets that each has only one-side tags, we transform a one-side tag into a set of bundled tags
by considering all possible mappings at the missing side and derive an objective function based on ambiguous
labelings. The key advantage of our coupled model is to provide us with the flexibility of 1 incorporating
joint features on the bundled tags to implicitly learn the loose mapping between heterogeneous annotations,
and 2 exploring separate features on one-side tags to overcome the data sparseness problem of using only
bundled tags. Experiments on benchmark datasets show that our coupled model significantly outperforms the
state-of-the-art baselines on both one-side POS tagging and annotation conversion tasks. The codes and newly
annotated data are released for non-commercial usage.
148
Best Paper Session

Session BP: Best Paper Session
309A Chairs: Chengqing Zong and Michael Strube
AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes
Sascha Rothe and Hinrich Schtze 15:5516:20
We present AutoExtend, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take
any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings
obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees
efficiency and parallelizability. We use WordNet as a lexical resource, but AutoExtend can be easily applied to
other resources like Freebase. AutoExtend achieves state-of-the-art performance on word similarity and word
sense disambiguation tasks.
Improving Evaluation of Machine Translation Quality Estimation

Yvette Graham 16:2016:45
Quality estimation evaluation commonly takes the form of measurement of the error that exists between pre-
dictions and gold standard labels for a particular test set of translations. Issues can arise during comparison of
quality estimation prediction score distributions and gold label distributions, however. In this paper, we provide
an analysis of methods of comparison and identify areas of concern with respect to widely used measures, such
as the ability to gain by prediction of aggregate statistics specific to gold label distributions or by optimally
conservative variance in prediction score distributions. As an alternative, we propose the use of the unit-free
Pearson correlation, in addition to providing an appropriate method of significance testing improvements over
a baseline. Components of WMT-13 and WMT-14 quality estimation shared tasks are replicated to reveal
substantially increased conclusivity in system rankings, including identification of outright winners of tasks.
149
Session 9
150
CoNLL
6
Conference Program
Thursday, July 30, 2015
08:45-09:00 Opening Remarks
09:00-10:10 Session 1.a: Embedding input and output representations
Multichannel Variable-Size Convolution for Sentence Classification

Wenpeng Yin and Hinrich Schtze
Task-Oriented Learning of Word Embeddings for Semantic Relation Classification

Kazuma Hashimoto, Pontus Stenetorp, Makoto Miwa and Yoshimasa Tsuruoka
Symmetric Pattern Based Word Embeddings for Improved Word Similarity

Prediction
Roy Schwartz, Roi Reichart and Ari Rappoport
A Coactive Learning View of Online Structured Prediction in Statistical Machine

Translation
Artem Sokolov, Stefan Riezler and Shay B. Cohen
10:10-10:30 Session 1.b: Entity Linking (spotlight presentations)
A Joint Framework for Coreference Resolution and Mention Head Detection

Haoruo Peng, Kai-Wei Chang and Dan Roth
Entity Linking Korean Text: An Unsupervised Learning Approach using Semantic

Relations
Youngsik Kim and Key-Sun Choi
Linking Entities Across Images and Text

Rebecka Weegar, Kalle strm and Pierre Nugues
151
Co-located Conference: CoNLL
Recovering Traceability Links in Requirements Documents

Zeheng Li, Mingrui Chen, LiGuo Huang and Vincent Ng
10:30-11:00 Coffee Break
11:00-12:00 Session 2.a: Keynote Speech
On Spectral Graphical Models, and a New Look at Latent Variable Modeling in

Natural Language Processing
Eric Xing, Carnegie Mellon University
12:00-12:30 Session 2.b: Short Paper Spotlights
Deep Neural Language Models for Machine Translation

Thang Luong, Michael Kayser and Christopher D. Manning
Reading behavior predicts syntactic categories

Maria Barrett and Anders Sgaard
One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction
Kaveh Taghipour and Hwee Tou Ng
Model Selection for Type-Supervised Learning with Application to POS Tagging

Kristina Toutanova, Waleed Ammar, Pallavi Choudhury and Hoifung Poon
Feature Selection for Short Text Classification using Wavelet Packet Transform
Anuj Mahajan, Sharmistha Jat and Shourya Roy
Do dependency parsing metrics correlate with human judgments?

Barbara Plank, Hctor Martnez Alonso, eljko Agic, Danijela Merkler and Anders
Sgaard
Learning Adjective Meanings with a Tensor-Based Skip-Gram Model

Jean Maillard and Stephen Clark
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting

Translatable Context Pairs
Shonosuke Ishiwatari, Nobuhiro Kaji, Naoki Yoshinaga, Masashi Toyoda and
Masaru Kitsuregawa
Finding Opinion Manipulation Trolls in News Community Forums

Todor Mihaylov, Georgi Georgiev and Preslav Nakov
12:30-14:00 Lunch Break
14:00-15:30 Session 3: CoNLL Shared Task
The CoNLL-2015 Shared Task on Shallow Discourse Parsing

Nianwen Xue, Hwee Tou Ng, Sameer Pradhan, Rashmi Prasad, Christopher Bryant,
Attapol Rutherford
152
ThursdayFriday, July 3031, 2015
A Refined End-to-End Discourse Parser

Jianxiang Wang and Man Lan
The UniTN Discourse Parser in CoNLL 2015 Shared Task: Token-level Sequence
Labeling with Argument-specific Models
Evgeny Stepanov, Giuseppe Riccardi and Ali Orkan Bayer
The SoNLP-DP System in the CoNLL-2015 shared Task

Fang Kong, Sheng Li and Guodong Zhou
16:00-17:10 Session 4.a: Syntactic Parsing

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning
Dan Garrette, Chris Dyer, Jason Baldridge and Noah A. Smith
Transition-based Spinal Parsing

Miguel Ballesteros and Xavier Carreras
Cross-lingual Transfer for Unsupervised Dependency Parsing Without Parallel Data

Long Duong, Trevor Cohn, Steven Bird and Paul Cook
Incremental Recurrent Neural Network Dependency Parser with Search-based

Discriminative Training
Majid Yazdani and James Henderson
17:10-17:30 Session 4.b: Cross-language studies (spotlight presentations)
Structural and lexical factors in adjective placement in complex noun phrases

across Romance languages
Kristina Gulordava and Paola Merlo
Instance Selection Improves Cross-Lingual Model Training for Fine-Grained

Sentiment Analysis
Roman Klinger and Philipp Cimiano
Annotation Projection-based Representation Learning for Cross-lingual

Dependency Parsing
Friday, July 31, 2015
09:00-10:10 Session 5.a: Semantics
An Iterative Similarity based Adaptation Technique for Cross-domain Text

Classification
Himanshu Sharad Bhatt, Deepali Semwal and Shourya Roy
Detecting Semantically Equivalent Questions in Online User Forums

Dasha Bogdanova, Cicero dos Santos, Luciano Barbosa and Bianca Zadrozny
153
Making the Most of Crowdsourced Document Annotations: Confused Supervised

LDA
Paul Felt, Eric Ringger, Jordan Boyd-Graber and Kevin Seppi
Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The
Impact of Word Representations on Sequence Labelling Tasks
Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider and
Timothy Baldwin
10:10-10:30 Session 5.b: Extraction and Labeling (spotlight presentations)
Labeled Morphological Segmentation with Semi-Markov Models

Ryan Cotterell, Thomas Mller, Alexander Fraser and Hinrich Schtze
Opinion Holder and Target Extraction based on the Induction of Verbal Categories
Michael Wiegand and Josef Ruppenhofer
Temporal Information Extraction from Korean Texts

Young-Seob Jeong, Zae Myung Kim, Hyun-Woo Do, Chae-Gyun Lim and Ho-Jin
Choi
11:00-12:00 Session 6.a: Keynote Speech
Does the success of deep neural network language processing mean finally! the
end of theoretical linguistics?
Paul Smolensky, Johns Hopkins University
12:00-12:30 Session 6.b: Business Meeting
12:30-14:00 Lunch Break
14:00-15:10 Session 7.a: Language Structure
Cross-lingual syntactic variation over age and gender

Anders Johannsen, Dirk Hovy and Anders Sgaard
A Synchronous Hyperedge Replacement Grammar based approach for AMR parsing

Xiaochang Peng, Linfeng Song and Daniel Gildea
Contrastive Analysis with Predictive Power: Typology Driven Estimation of

Grammatical Error Distributions in ESL
Yevgeni Berzak, Roi Reichart and Boris Katz
AIDA2: A Hybrid Approach for Token and Sentence Level Dialect Identification in
Arabic
Mohamed Al-Badrashiny, Heba Elfardy and Mona Diab
154
15:10-15:30 Session 7.b: CoNLL Mix (spotlight presentations)
Analyzing Optimization for Statistical Machine Translation: MERT Learns

Verbosity, PRO Learns Length
Francisco Guzmn, Preslav Nakov and Stephan Vogel
Learning to Exploit Structured Resources for Lexical Inference

Vered Shwartz, Omer Levy, Ido Dagan and Jacob Goldberger
Quantity, Contrast, and Convention in Cross-Situated Language Comprehension

Ian Perera and James Allen
155
16:00-17:30 Session 8.a: Joint Poster Presentation (long, short and shared task papers)
Long Papers
A Joint Framework for Coreference Resolution and Mention Head Detection
Haoruo Peng, Kai-Wei Chang and Dan Roth
Entity Linking Korean Text: An Unsupervised Learning Approach using Semantic

Relations
Youngsik Kim and Key-Sun Choi
Linking Entities Across Images and Text

Rebecka Weegar, Kalle strm and Pierre Nugues
Recovering Traceability Links in Requirements Documents

Zeheng Li, Mingrui Chen, LiGuo Huang and Vincent Ng
Structural and lexical factors in adjective placement in complex noun phrases
across Romance languages
Kristina Gulordava and Paola Merlo
Instance Selection Improves Cross-Lingual Model Training for Fine-Grained

Sentiment Analysis
Roman Klinger and Philipp Cimiano
Annotation Projection-based Representation Learning for Cross-lingual

Dependency Parsing
Labeled Morphological Segmentation with Semi-Markov Models

Ryan Cotterell, Thomas Mller, Alexander Fraser and Hinrich Schtze
Opinion Holder and Target Extraction based on the Induction of Verbal Categories
Michael Wiegand and Josef Ruppenhofer
Temporal Information Extraction from Korean Texts

Young-Seob Jeong, Zae Myung Kim, Hyun-Woo Do, Chae-Gyun Lim and Ho-Jin
Choi
Analyzing Optimization for Statistical Machine Translation: MERT Learns

Verbosity, PRO Learns Length
Francisco Guzmn, Preslav Nakov and Stephan Vogel
Learning to Exploit Structured Resources for Lexical Inference

Vered Shwartz, Omer Levy, Ido Dagan and Jacob Goldberger
Quantity, Contrast, and Convention in Cross-Situated Language Comprehension

Ian Perera and James Allen
Short Papers
Deep Neural Language Models for Machine Translation
Thang Luong, Michael Kayser and Christopher D. Manning
Reading behavior predicts syntactic categories
156
Maria Barrett and Anders Sgaard
One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction
Kaveh Taghipour and Hwee Tou Ng
Model Selection for Type-Supervised Learning with Application to POS Tagging

Kristina Toutanova, Waleed Ammar, Pallavi Choudhury and Hoifung Poon
Feature Selection for Short Text Classification using Wavelet Packet Transform
Anuj Mahajan, Sharmistha Jat and Shourya Roy
Do dependency parsing metrics correlate with human judgments?

Barbara Plank, Hctor Martnez Alonso, eljko Agic, Danijela Merkler and Anders
Sgaard
Learning Adjective Meanings with a Tensor-Based Skip-Gram Model

Jean Maillard and Stephen Clark
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting

Translatable Context Pairs
Shonosuke Ishiwatari, Nobuhiro Kaji, Naoki Yoshinaga, Masashi Toyoda and
Masaru Kitsuregawa
Finding Opinion Manipulation Trolls in News Community Forums

Todor Mihaylov, Georgi Georgiev and Preslav Nakov
Shared Task Papers

A Hybrid Discourse Relation Parser in CoNLL 2015
Sobha Lalitha Devi, Sindhuja Gopalan, Lakshmi S, Pattabhi RK Rao, Vijay Sundar
Ram and Malarkodi C.S.
A Minimalist Approach to Shallow Discourse Parsing and Implicit Relation

Recognition
Christian Chiarcos and Niko Schenk
A Shallow Discourse Parsing System Based On Maximum Entropy Model

Jia Sun, Peijia Li, Weiqun Xu and Yonghong Yan
Hybrid Approach to PDTB-styled Discourse Parsing for CoNLL-2015

Yasuhisa Yoshida, Katsuhiko Hayashi, Tsutomu Hirao and Masaaki Nagata
Improving a Pipeline Architecture for Shallow Discourse Parsing

Yangqiu Song, Haoruo Peng, Parisa Kordjamshidi, Mark Sammons and Dan Roth
JAIST: A two-phase machine learning approach for identifying discourse relations

in newswire texts
Son Nguyen, Quoc Ho and Minh Nguyen
Shallow Discourse Parsing Using Constituent Parsing Tree

Changge Chen, Peilu Wang and Hai Zhao
Shallow Discourse Parsing with Syntactic and (a Few) Semantic Features

Shubham Mukherjee, Abhishek Tiwari, Mohit Gupta and Anil Kumar Singh
157
The CLaC Discourse Parser at CoNLL-2015

Majid Laali, Elnaz Davoodi and Leila Kosseim
The DCU Discourse Parser for Connective, Argument Identification and Explicit
Sense Classification
Longyue Wang, Chris Hokamp, Tsuyoshi Okita, Xiaojun Zhang and Qun Liu
The DCU Discourse Parser: A Sense Classification Task

Tsuyoshi Okita, Longyue Wang and Qun Liu
17:30-17:45 Session 8.b: Best Paper Award and Closing
158
Workshops
7
ThursdayFriday
401 Eighth SIGHAN Workshop on Chinese Language Processing p.160
Thursday
301A Arabic Natural Language Processing p.163
301B Grammar Engineering Across Frameworks (GEAF 2015) p.165
302A Eighth Workshop on Building and Using Comparable Corpora p.166
302B Semantics-Driven Statistical Machine Translation: Theory and p.168
Practice
303A Novel Computational Approaches to Keyphrase Extraction p.169
303B SIGHUM Workshop on Language Technology for Cultural p.170
Heritage, Social Sciences, and Humanities
305 BioNLP 2015 p.172
Friday
301A The Fifth Named Entities Workshop p.174
301B Third Workshop on Continuous Vector Space Models and their p.176
Compositionality
302A Fourth Workshop on Hybrid Approaches to Translation p.177
302B Fourth Workshop on Linked Data in Linguistics: Resources and p.179
Applications
303A Workshop on Noisy User-generated Text p.180
303B The 2nd Workshop on Natural Language Processing Techniques p.183
for Educational Applications
305 The First Workshop on Computing News Storylines p.185
159
Two-day Workshops
Workshop 1: Eighth SIGHAN Workshop on Chinese Language

Processing
Organizers: Liang-Chih Yu, Zhifang Sui, Yue Zhang, and Vincent Ng

Venue: 401

09:0009:10 Opening Session
09:1010:30 Invited Talk
09:1010:30 Discourse and Machine Translation
10:3010:50 Coffee Break
10:5012:30 Workshop Session
10:5011:10 Sequential Annotation and Chunking of Chinese Discourse Structure
Frances Yung, Kevin Duh, and Yuji Matsumoto
11:1011:30 Create a Manual Chinese Word Segmentation Dataset Using Crowdsourcing
Method
Shichang Wang, Chu-Ren Huang, Yao Yao, and Angel Chan
11:3011:50 Chinese Named Entity Recognition with Graph-based Semi-supervised
Learning Model
Aaron Li-Feng Han, Xiaodong Zeng, Derek F. Wong, and Lidia S. Chao
11:5012:10 Sentence selection for automatic scoring of Mandarin proficiency
Jiahong Yuan, Xiaoying Xu, Wei Lai, Weiping Ye, Xinru Zhao, and
Mark Liberman
12:1012:30 ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer
Ting-Hao Huang, Yun-Nung Chen, and Lingpeng Kong
12:3014:30 Lunch
14:3015:30 From Lexical to Compositional Chinese Sentiment Analysis
16:0017:20 Bake-off Task 1: Chinese Spelling Check
16:0016:20 Introduction to SIGHAN 2015 Bake-off for Chinese Spelling Check
Yuen-Hsien Tseng, Lung-Hao Lee, Li-Ping Chang, and Hsin-Hsi Chen
16:2016:40 HANSpeller++: A Unified Framework for Chinese Spelling Correction
Shuiyuan Zhang, Jinhua Xiong, Jianpeng Hou, Qiao Zhang, and Xueqi Cheng
16:4017:00 Word Vector/Conditional Random Field-based Chinese Spelling Error
Detection for SIGHAN 2015 Bake-off Evaluation
Yih-Ru Wang and Yuan-Fu Liao
17:0017:20 Introduction to a Proofreading Tool for Chinese Spelling Check Task of
SIGHAN-8
Tao-Hsing Chang, Hsueh-Chih Chen, and Cheng-Han Yang
160

09:0010:30 Intelligent Q&A System and NLP Open Platform
11:0012:20 Bake-off Task 2: Topic-Based Chinese Message Polarity Classification
11:0011:20 Overview of Topic-based Chinese Message Polarity Classification in SIGHAN
2015
Xiangwen Liao, Binyang Li, and Liheng Xu
11:2011:40 A Joint Model for Chinese Microblog Sentiment Analysis
Yuhui Cao, Zhao Chen, Ruifeng Xu, Tao Chen, and Lin Gui
11:4012:00 Learning Salient Samples and Distributed Representations for Topic-Based
Chinese Message Polarity Classification
Xin Kang, Yunong Wu, and Zhifei Zhang
12:0012:20 An combined sentiment classification system for SIGHAN-8
Qiuchi Li, Qiyu Zhi, and Miao Li
12:2014:00 Lunch
14:0015:20 Poster Session
14:0014:05 Linguistic Knowledge-driven Approach to Chinese Comparative Elements
Extraction
MinJun Park and Yulin Yuan
14:0514:10 A CRF Method of Identifying Prepositional Phrases in Chinese Patent Texts
Hongzheng Li and Yaohong Jin
14:1014:15 Emotion in Code-switching Texts: Corpus Construction and Analysis
Sophia Lee and Zhongqing Wang
14:1514:20 Chinese in the Grammatical Framework: Grammar, Translation, and Other
Applications Anonymous
Aarne Ranta, Tian Yan, and Haiyan Qiao
14:2014:25 KWB: An Automated Quick News System for Chinese Readers
Yiqi Bai, Wenjing Yang, Hao Zhang, Jingwen Wang, Ming Jia, Roland Tong,
and Jie Wang
14:2514:30 Chinese Semantic Role Labeling using High-quality Syntactic Knowledge
Gongye Jin, Daisuke Kawahara, and Sadao Kurohashi
14:3514:40 Chinese Spelling Check System Based on N-gram Model
Weijian Xie, Peijie Huang, Xinrui Zhang, Kaiduo Hong, Qiang Huang,
Bingzhou Chen, and Lei Huang
14:4014:45 NTOU Chinese Spelling Check System in Sighan-8 Bake-off
Wei-Cheng Chu and Chuan-Jie Lin
14:4514:50 Topic-Based Chinese Message Sentiment Analysis: A Multilayered Analysis
System
Hongjie Li, Zhongqian Sun, and Wei Yang
14:5014:55 Rule-Based Weibo Messages Sentiment Polarity Classification towards Given
Topics
Hongzhao Zhou, Yonglin Teng, Min Hou, Wei He, Hongtao Zhu, Xiaolin Zhu,
and Yanfei Mu
14:5515:00 Topic-Based Chinese Message Polarity Classification System at
SIGHAN8-Task2
Chun Liao, Chong Feng, Sen Yang, and Heyan Huang
15:0015:05 CT-SPA: Text sentiment polarity prediction model using semi-automatically
expanded sentiment lexicon
Tao-Hsing Chang, Ming-Jhih Lin, Chun-Hsien Chen, and Shao-Yu Wang
161
One-day Workshops
15:0515:10 Chinese Microblogs Sentiment Classification using Maximum Entropy

Dashu Ye, Peijie Huang, Kaiduo Hong, Zhuoying Tang, Weijian Xie, and
Guilong Zhou
15:1015:15 NDMSCS: A Topic-Based Chinese Microblog Polarity Classification System
Yang Wang, Yaqi Wang, Shi Feng, Daling Wang, and Yifei Zhang
15:1515:20 NEUDM: A System for Topic-Based Message Polarity Classification
Yaqi Wang, Shi Feng, Daling Wang, and Yifei Zhang
15:2015:30 Closing Session
162
Workshop 2: Arabic Natural Language Processing
Organizers: Nizar Habash, Stephan Vogel, and Kareem Darwish

Venue: 301A

09:0010:00 Main Workshop Papers - Oral Presentations - Session 1
09:0009:20 Classifying Arab Names Geographically
Hamdy Mubarak and Kareem Darwish
09:2009:40 Deep Learning Models for Sentiment Analysis in Arabic
Ahmad Al Sallab, Hazem Hajj, Gilbert Badaro, Ramy Baly, Wassim El Hajj,
and Khaled Bashir Shaban
09:4010:00 A Light Lexicon-based Mobile Application for Sentiment Mining of Arabic
Tweets
Gilbert Badaro, Ramy Baly, Rana Akel, Linda Fayad, Jeffrey Khairallah,
Hazem Hajj, Khaled Bashir Shaban, and Wassim El-Hajj
10:0010:30 Shared Task Talk
10:0010:30 The Second QALB Shared Task on Automatic Text Correction for Arabic
Alla Rozovskaya, Houda Bouamor, Nizar Habash, Wajdi Zaghouani,
Ossama Obeid, and Behrang Mohit
10:3011:00 Break
11:0011:20 Natural Language Processing for Dialectical Arabic: A Survey
Abdulhadi Shoufan and Sumaya Alameri
11:2011:40 DIWAN: A Dialectal Word Annotation Tool for Arabic
Faisal Al-Shargi and Owen Rambow
11:4012:00 POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools
Ahmed Hamdi, Alexis Nasr, Nizar Habash, and Nuria Gala
12:0014:00 Lunch + Poster Setup
14:0015:30 Posters: Main Workshop Papers
A Conventional Orthography for Algerian Arabic
Houda Saadane and Nizar Habash
A Pilot Study on Arabic Multi-Genre Corpus Diacritization
Houda Bouamor, Wajdi Zaghouani, Mona Diab, Ossama Obeid,
Kemal Oflazer, Mahmoud Ghoneim, and Abdelati Hawwari
Annotating Targets of Opinions in Arabic using Crowdsourcing
Noura Farra, Kathy McKeown, and Nizar Habash
Best Practices for Crowdsourcing Dialectal Arabic Speech Transcription
Samantha Wray, Hamdy Mubarak, and Ahmed Ali
Joint Arabic Segmentation and Part-Of-Speech Tagging
Shabib AlGahtani and John McNaught
Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study
for Egyptian ASR
Ahmed Ali, Walid Magdy, and Steve Renals
163
One-day Workshops
14:0015:30 Posters: Shared Task Papers

Arib@QALB-2015 Shared Task: A Hybrid Cascade Model for Arabic Spelling
Error Detection and Correction
Nouf AlShenaifi, Rehab AlNefie, Maha Al-Yahya, and Hend Al-Khalifa
CUFE@QALB-2015 Shared Task: Arabic Error Correction System
Michael Nawar
GWU-HASP-2015@QALB2015 Shared Task: Priming Spelling Candidates
with Probability
Mohammed Attia, Mohamed Al-Badrashiny, and Mona Diab
QCMUQ@QALB-2015 Shared Task: Combining Character level MT and
Error-tolerant Finite-State Recognition for Arabic Spelling Correction
Houda Bouamor, Hassan Sajjad, Nadir Durrani, and Kemal Oflazer
QCRI@QALB-2015 Shared Task: Correction of Arabic Text for Native and
Non-Native Speakers Errors
Hamdy Mubarak, Kareem Darwish, and Ahmed Abdelali
SAHSOH@QALB-2015 Shared Task: A Rule-Based Correction Method of
Common Arabic Native and Non-Native Speakers Errors
Wajdi Zaghouani, Taha Zerrouki, and Amar Balla
TECHLIMED@QALB-Shared Task 2015: a hybrid Arabic Error Correction
System
Djamel Mostefa, Jaber Abualasal, Omar Asbayou, Mahmoud Gzawi, and
Ramzi Abbes
UMMU@QALB-2015 Shared Task: Character and Word level SMT pipeline
for Automatic Error Correction of Arabic Text
Fethi Bougares and Houda Bouamor
15:3016:00 Break
16:0016:20 Robust Part-of-speech Tagging of Arabic Text
Hanan Aldarmaki and Mona Diab
16:2016:40 Answer Selection in Arabic Community Question Answering: A Feature-Rich
Approach
Yonatan Belinkov, Alberto Barrn-Cedeo, and Hamdy Mubarak
16:4017:00 EDRAK: Entity-Centric Data Resource for Arabic Knowledge
Mohamed H. Gad-elrab, Mohamed Amir Yosef, and Gerhard Weikum
17:0018:00 Workshop Group Discussion
164
Workshop 3: Grammar Engineering Across Frameworks (GEAF

2015)
Organizers: Emily M. Bender, Lori Levin, Stefan Mller, Yannick Parmentier, and Aarne Ranta
Venue: 301B

9:159:30 Opening session
9:3010:30 Session 1
09:3010:00 Grammar Engineering for a Customer: a Case Study with Five Languages
Aarne Ranta, Christina Unger, and Daniel Vidal Hussey
10:0010:30 Building an HPSG-based Indonesian Resource Grammar (INDRA)
David Moeljadi, Francis Bond, and Sanghoun Song
10:3011:00 Coffee break
11:0012:30 Session 2
11:0011:30 An HPSG-based Shared-Grammar for the Chinese Languages: ZHONG
Zhenzhen Fan, Sanghoun Song, and Francis Bond
11:3012:00 Parsing Chinese with a Generalized Categorial Grammar
Manjuan Duan and William Schuler
12:0012:30 Orthography Engineering in Grammatical Framework
Krasimir Angelov
12:3014:00 Lunch break
14:0015:30 Session 3
14:0014:30 A Cloud-Based Editor for Multilingual Grammars
Thomas Hallgren, Ramona Enache, and Aarne Ranta
14:3015:00 Formalising the Swedish Constructicon in Grammatical Framework
Normunds Gruzitis, Dana Dannells, Benjamin Lyngfelt, and Aarne Ranta
15:0015:30 Representing Honorifics via Individual Constraints
Sanghoun Song
16:0018:00 Session 4
16:0016:30 Resumption and Extraction in an Implemented HPSG of Hausa
Berthold Crysmann
16:4518:00 Panel discussion: the future of grammar engineering
165
One-day Workshops
Workshop 4: Eighth Workshop on Building and Using Comparable

Corpora
Organizers: Pierre Zweigenbaum, Serge Sharoff, and Reinhard Rapp

Venue: 302A

09:0009:05 Introduction to the BUCC Workshop
09:0510:05 Invited talk:
09:0510:05 Augmented Comparative Corpora and Monitoring Corpus in Chinese: LIVAC
and Sketch Search Engine Compared
Benjamin K. Tsou
10:0510:30 A Factory of Comparable Corpora from Wikipedia
Alberto Barrn-Cedeo, Cristina Espana-Bonet, Josu Boldoba, and
Lluis Marquez
11:0011:25 Knowledge-lean projection of coreference chains across languages
Yulia Grishina and Manfred Stede
11:2511:50 Projective methods for mining missing translations in DBpedia
Laurent Jakubina and Phillippe Langlais
11:5012:05 Attempting to Bypass Alignment from Comparable Corpora via Pivot
Language
Alexis Linard, Beatrice Daille, and Emmanuel Morin
12:0512:20 Application of a Corpus to Identify Gaps between English Learners and Native
Speakers
Katsunori Kotani and Takehiko Yoshimi
14:0014:25 A Generative Model for Extracting Parallel Fragments from Comparable
Documents
Somayeh Bakhshaei, Shahram Khadivi, and Reza Safabakhsh
14:2514:50 Evaluating Features for Identifying Japanese-Chinese Bilingual Synonymous
Technical Terms from Patent Families
Zi Long, Takehito Utsuro, Tomoharu Mitsuhashi, and Mikio Yamamoto
14:5015:05 Extracting Bilingual Lexica from Comparable Corpora Using Self-Organizing
Maps
Hyeong-Won Seo, Minah Cheon, and Jae-Hoon Kim
15:0515:20 Obtaining SMT dictionaries for related languages
Miguel Rios and Serge Sharoff
16:0016:15 BUCC Shared Task: Cross-Language Document Similarity
Serge Sharoff, Pierre Zweigenbaum, and Reinhard Rapp
16:1516:30 AUT Document Alignment Framework for BUCC Workshop Shared Task
Atefeh Zafarian, Amir Pouya Agha Sadeghi, Fatemeh Azadi, Sonia Ghiasifard,
Zeinab Ali Panahloo, Somayeh Bakhshaei, Mohammadzadeh Ziabary, and
Seyed Mohammad
16:3016:45 LINA: Identifying Comparable Documents from Wikipedia
Emmanuel Morin, Amir Hazem, Florian Boudin, and
Elizaveta Loginova-Clouet
166
16:4517:00 Shared Task: General Discussion
167
One-day Workshops
Workshop 5: Semantics-Driven Statistical Machine Translation:

Theory and Practice
Organizers: Deyi Xiong, Kevin Duh, Christian Hardmeier, and Roberto Navigli
Venue: 302B

8:459:00 Opening Remarks
9:0010:30 Session 1
9:0010:00 Keynote Speech: Semantic Parsing as, via, and for Machine Translation
Percy Liang
10:0010:30 Round trips with meaning stopovers
Alastair Butler
11:0012:30 Session 2
11:0012:00 Keynote Speech: Learning Multilingual Semantics from Big Data on the Web
Gerard De Melo
12:0012:30 Conceptual Annotations Preserve Structure Across Translations: A
French-English Case Study
Elior Sulem, Omri Abend, and Ari Rappoport
12:3013:30 Lunch
13:3015:30 Session 3
13:3014:30 Keynote Speech: Sequence to Sequence Learning for Language Understanding
Quoc V. Le
14:3015:00 Integrating Case Frame into Japanese to Chinese Hierarchical Phrase-based
Translation Model
Jinan Xu, Jiangming Liu, Yufeng Chen, Yujie Zhang, Fang Ming, and
Shaotong Li
15:0015:30 A Discriminative Model for Semantics-to-String Translation
Ales Tamchyna, Chris Quirk, and Michel Galley
16:0017:45 Session 4
16:0017:00 Keynote Speech: Machine Translation and Deep Language Engineering
Approaches
Antonio Branco
17:0017:45 Panel: Semantics and Statistical Machine Translation: Gaps and Challenges
Quirk Chris, Eduard Hovy, Percy Liang, Antonio Branco, and Quoc V. Le
17:45 Closing
168
Workshop 6: Novel Computational Approaches to Keyphrase

Extraction
Organizers: Sujatha Das Gollapalli, Cornelia Caragea, C. Lee Giles, and Xiaoli Li
Venue: 303A

9:10 9:30 Welcome and Opening Remarks
Keywords, phrases, clauses and sentences: topicality, indicativeness and
informativeness at scales
Min-Yen Kan
11:0011:30 Technical Term Extraction Using Measures of Neology
Christopher Norman and Akiko Aizawa
11:3012:00 Counting What Counts: Decompounding for Keyphrase Extraction
Nicolai Erbs, Pedro Bispo Santos, Torsten Zesch, and Iryna Gurevych
12:0014:00 Lunch Break
The Web as an Implicit Training Set: Application to Noun Compounds Syntax
and Semantics
Preslav Nakov
15:0015:30 Reducing Over-generation Errors for Automatic Keyphrase Extraction using
Integer Linear Programming
Florian Boudin
16:0016:30 TwittDict: Extracting Social Oriented Keyphrase Semantics from Twitter
Suppawong Tuarob, Wanghuan Chu, Dong Chen, and Conrad Tucker
16:3017:00 Identification and Classification of Emotional Key Phrases from Psychological
Texts
Apurba Paul and Dipankar Das
169
One-day Workshops
Workshop 7: SIGHUM Workshop on Language Technology for

Cultural Heritage, Social Sciences, and Humanities
Organizers: Kalliopi A. Zervanou, Marieke van Erp, and Beatrice Alex

Venue: 303B

9:0010:30 Session I
9:009:10 Welcome
Nils Reiter
9:109:40 Catching the Red Priest: Using Historical Editions of Encyclopaedia Britannica
to Track the Evolution of Reputations
Yen-Fu Luo, Anna Rumshisky, and Mikhail Gronas
9:4010:00 Five Centuries of Monarchy in Korea: Mining the Text of the Annals of the
Joseon Dynasty
JinYeong Bak and Alice Oh
10:0010:30 Analyzing Sentiment in Classical Chinese Poetry
Yufang Hou and Anette Frank
11:0012:35 Session II
11:0011:30 Measuring the Structural and Conceptual Similarity of Folktales using Plot
Graphs
Victoria Anugrah Lestari and Ruli Manurung
11:3011:50 Towards Annotating Narrative Segments
Nils Reiter
11:5012:20 Ranking Relevant Verb Phrases Extracted from Historical Text
Eva Pettersson, Beata Megyesi, and Joakim Nivre
12:2012:40 Ranking election issues through the lens of social media
Stephen Wan and Cecile Paris
12:4014:00 Lunch
14:0014:40 Session III: SIGHUM Annual Meeting
Chair: Nils Reiter
14:4015:10 Session IV: Poster Boosters
14:4014:45 Word Embeddings Pointing the Way for Late Antiquity
Johannes Bjerva and Raf Praet
14:4514:50 Enriching interlinear Text using Automatically Constructed Annotators
Ryan Georgi, Fei Xia, and William Lewis
14:4514:55 Automatic interlinear glossing as two-level sequence classification
Tanja Samardzic, Robert Schikowski, and Sabine Stoll
14:5515:00 Enriching Digitized Medieval Manuscripts: Linking image and Text and
Lexical Knowledge
Aitor Arronte Alvarez
15:0015:05 A preliminary study on similarity-preserving digital book identifiers
Klemo Vladimir, Marin Silic, Nenad Romic, Goran Delac, and Sinisa Srbljic
170
15:0515:10 When Translation Requires interpretation: Collaborative Computer-Assisted

Translation of Ancient Texts
Andrea Bellandi, Davide Albanesi, Giulia Benotto, Emiliano Giovannetti, and
Gianfranco Di Segni
15:1016:00 Poster Session & Coffee
16:0017:30 Session V
16:0016:20 Integrating Query Performance Prediction in Term Scoring for Diachronic
Thesaurus
Chaya Liebeskind and Ido Dagan
16:2016:50 Minoan linguistic resources: The Linear A Digital Corpus
Tommaso Petrolito, Ruggero Petrolito, Francesco Perono Cacciafoco, and
Gregoire Winterstein
16:5017:20 Lexicon-assisted tagging and lemmatization in Latin: A comparison of six
taggers and two lemmatization models
Tim Vor der Bruck, Steffen Eger, and Alexander Mehler
17:2017:30 Closing
Nils Reiter
171
One-day Workshops
Workshop 8: BioNLP 2015
Organizers: Kevin Bretonnel Cohen, Dina Demner-Fushman, Sophia Ananiadou, and Junichi Tsujii
Venue: 305

08:0008:20 Welcome to BioNLP 15
08:2010:20 Reading biomedical literature
08:2008:40 Complex Event Extraction using DRUM
James Allen, Will De Beaumont, Lucian Galescu, and Choh Man Teng
08:4009:00 Making the most of limited training data using distant supervision
Roland Roller and Mark Stevenson
09:0009:20 An extended dependency graph for relation extraction in biomedical texts
Yifan Peng, Samir Gupta, Cathy Wu, and Vijay Shanker
09:2009:40 Event Extraction in pieces:Tackling the partial event identification problem on
unseen corpora
Chrysoula Zerva and Sophia Ananiadou
09:4010:00 Extracting Biological Pathway Models From NLP Event Representations
Michael Spranger, Sucheendra Palaniappan, and Samik Ghosh
10:0010:20 Shallow Training is cheap but is it good enough? Experiments with Medical
Fact Coding
Ramesh Nallapati and Radu Florian
11:0011:45 Keynote: "Machine Reading: Attempting to model and understand
biological processes" - Christopher Manning
11:4512:30 Keynote: "The DARPA Big Mechanism Program" - Kevin Knight
15:0015:30 Invited Talk: "Overview of BioCreative V Challenge Tasks" - Zhiyong Lu
16:0018:00 Clinical text processing
16:0016:20 Stacked Generalization for Medical Concept Extraction from Clinical Notes
Youngjun Kim and Ellen Riloff
16:2016:40 Extracting Disease-Symptom Relationships by Learning Syntactic Patterns
from Dependency Graphs
Mohsen Hassan, Olfa Makkaoui, Adrien Coulet, and Yannick Toussain
16:4017:00 Extracting Time Expressions from Clinical Text
Timothy Miller, Steven Bethard, Dmitriy Dligach, Chen Lin, and
Guergana Savova
17:0017:20 Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical
Abbreviation Expansion
Yue Liu, Tao Ge, Kusum Mathews, Heng Ji, and Deborah McGuinness
17:2017:40 Semantic Type Classification of Common Words in Biomedical Noun Phrases
Amy Siu and Gerhard Weikum
17:4018:00 CoMAGD: Annotation of Gene-Depression Relations
Rize Jin, Jinseon You, Jin-Woo Chung, Hee-Jin Lee, Maria Wolters, and
Jong Park
172
18:00 Closing remarks

Posters
Lexical Characteristics Analysis of Chinese Clinical Documents
Meizhi Ju, Haomin Li, and Huilong Duan
Using word embedding for bio-event extraction
Chen Li, Runqing Song, Maria Liakata, Andreas Vlachos, Stephanie Seneff,
and Xiangrong Zhang
Measuring the readability of medical research journal abstracts
Samuel J. Severance and Kevin Bretonnel Cohen
Translating Electronic Health Record Notes from English to Spanish: A
Preliminary Study
Weisong Liu and Shu Cai
Automatic Detection of Answers to Research Questions from Medline
Abstracts
Abdulaziz Alamri and Mark Stevenson
A preliminary study on automatic identification of patient smoking status in
unstructured electronic health records
Jitendra Jonnagaddala, Hong-Jie Dai, Pradeep Ray, and Siaw-Teng Liaw
Restoring the intended structure of Hungarian ophthalmology documents
Borbala Siklosi and Attila Novak
Evaluating distributed word representations for capturing semantics of
biomedical concepts
Muneeb Th, Sunil Sahu, and Ashish Anand
Investigating Public Health Surveillance using Twitter
Antonio Jimeno Yepes, Andrew MacKinlay, and Bo Han
Clinical Abbreviation Disambiguation Using Neural Word Embeddings
Yonghui Wu, Jun Xu, Yaoyun Zhang, and Hua Xu
Representing Clinical Diagnostic Criteria in Quality Data Model Using Natural
Language Processing
Na Hong, Dingcheng Li, Yue Yu, Hongfang Liu, Christopher G. Chute, and
Guoqian Jiang
173
One-day Workshops
Workshop 9: The Fifth Named Entities Workshop
Organizers: Min Zhang, Haizhou Li, Rafael E. Banchs, and A. Kumaran

Venue: 301A

Whitepaper of NEWS 2015 Shared Task on Machine Transliteration
Min Zhang, Haizhou Li, Rafael E. Banchs, and A. Kumaran
Report of NEWS 2015 Machine Transliteration Shared Task
Rafael E. Banchs, Min Zhang, Xiangyu Duan, Haizhou Li, and A. Kumaran
9:1510:05 Keynote Speech
How do you spell that? A journey through word representations
Grzegorz Kondrak
10:0512:15 Research Papers
10:0510:30 Boosting Named Entity Recognition with Neural Character Embeddings
Cicero dos Santos and Victor Guimaraes
11:0011:25 Regularity and Flexibility in English-Chinese Name Transliteration
Oi Yee Kwong
11:2511:50 HAREM and Klue: how to put two tagsets for named entities annotation
together
Livy Real and Alexandre Rademaker
11:5012:15 Semi-supervised Learning for Vietnamese Named Entity Recognition using
Online Conditional Random Fields
Quang Hong Pham, Minh-Le Nguyen, Thanh Binh Nguyen, and
Nguyen Viet Cuong
13:5016:50 System Papers
13:5014:15 Boosting English-Chinese Machine Transliteration via High Quality
Alignment and Multilingual Resources
Yan Shao, Jorg Tiedemann, and Joakim Nivre
14:1514:40 Neural Network Transduction Models in Transliteration Generation
Andrew Finch, Lemao Liu, Xiaolin Wang, and Eiichiro Sumita
14:4015:05 A Hybrid Transliteration Model for Chinese/English Named Entities
-BJTU-NLP Report for the 5th Named Entities Workshop
Dandan Wang, Xiaohui Yang, Jinan Xu, Yufeng Chen, Nan Wang, Bojia Liu,
Jian Yang, and Yujie Zhang
15:0515:30 Multiple System Combination for Transliteration
Garrett Nicolai, Bradley Hauer, Mohammad Salameh, Adam St Arnaud,
Ying Xu, Lei Yao, and Grzegorz Kondrak
16:0016:25 Data representation methods and use of mined corpora for Indian language
transliteration
Anoop Kunchukuttan and Pushpak Bhattacharyya
16:2516:50 NCU IISR English-Korean and English-Chinese Named Entity Transliteration
Using Different Grapheme Segmentation Approaches
Yu-Chun Wang, Chun-Kai Wu, and Richard Tzong-Han Tsai
174
16:5017:00 Closing
175
One-day Workshops
Workshop 10: Third Workshop on Continuous Vector Space Models

and their Compositionality
Organizers: Alexandre Allauzen, Edward Grefenstette, Karl Moritz Hermann, Hugo Larochelle, and
Scott Wen-tau Yih
Venue: 301B

9:059:50 INVITED TALK - Kyunghyun Cho
Observed versus latent features for knowledge base and text inference
Kristina Toutanova and Danqi Chen
Learning Embeddings for Transitive Verb Disambiguation by Implicit Tensor
Factorization
Kazuma Hashimoto and Yoshimasa Tsuruoka
11:0011:45 INVITED TALK - Yoah Goldberg
11:4512:30 INVITED TALK - Percy Liang
12:3014:00 Lunch
14:0014:45 INVITED TALK - Stephen Clark
14:4515:30 INVITED TALK - Ray Mooney
Learning Embeddings for Transitive Verb Disambiguation by Implicit Tensor
Factorization
Kazuma Hashimoto and Yoshimasa Tsuruoka
Recursive Neural Networks Can Learn Logical Semantics
Samuel R. Bowman, Christopher Potts, and Christopher D. Manning
Concept Extensions as the Basis for Vector-Space Semantics: Combining
Distributional and Ontological Information about Entities
Jackie Chi Kit Cheung
Joint Semantic Relevance Learning with Text Data and Graph Knowledge
Dongxu Zhang, Bin Yuan, Dong Wang, and Rong Liu
Exploring the effect of semantic similarity for Phrase-based Machine
Translation
Kunal Sachdeva and Dipti Sharma
Incremental Adaptation Strategies for Neural Network Language Models
Alex Ter-Sarkisov, Holger Schwenk, Fethi Bougares, and Loic Barrault
Observed versus latent features for knowledge base and text inference
Kristina Toutanova and Danqi Chen
17:0017:30 Panel
176
Workshop 11: Fourth Workshop on Hybrid Approaches to

Translation
Organizers: Bogdan Babych, Kurt Eberle, Marta R. Costa-jussa, Rafael E. Banchs, Patrik Lambert,
and Reinhard Rapp
Venue: 302A

9:1510:30 Introduction and Keynote Speech I
9:159:30 Welcome and introduction
9:3010:30 Invited talk: Hinrich Schtze
11:0012:30 Hybrid MT system design
11:0011:15 Bootstrapping a hybrid deep MT system
Joao Silva, Joao Rodrigues, Luis Gomes, and Antonio Branco
11:1511:30 Multi-system machine translation using online APIs for English-Latvian
Mat iss Rikters
11:3011:55 What a Transfer-Based System Brings to the Combination with PBMT
Ales Tamchyna and Ondrej Bojar
11:5512:20 Establishing sentential structure via realignments from small parallel corpora
George Tambouratzis and Vassiliki Pouli
14:0015:30 Resources and evaluation of hybrid MT systems
14:0014:15 Passive and Pervasive Use of Bilingual Dictionary in Statistical Machine
Translation
Liling Tan
14:1514:30 Automated Simultaneous Interpretation: Hints of a Cognitive Framework for
Machine Translation
Rafael E. Banchs
14:3014:45 A fuzzier approach to machine translation evaluation: A pilot study on
post-editing productivity and automated metrics in commercial settings
Carla Parra Escartin and Manuel Arcedillo
14:4515:10 A Methodology for Bilingual Lexicon Extraction from Comparable Corpora
Reinhard Rapp
15:1015:25 Ongoing Study for Enhancing Chinese-Spanish Translation with Morphology
Strategies
Marta R. Costa-jussa
16:0017:00 Keynote Speech II
16:0017:00 Invited talk: Gerard de Melo
17:0018:15 Industry applications and Hybrid MT
17:0017:30 Baidu Translate: Research and Products
Zhongjun He
17:3018:00 On Improving the Human Translation Process by Using MT Technologies
under a Cognitive Framework
Geng Xinhui
177
One-day Workshops
18:0018:15 Towards a shared task for shallow semantics-based translation (in an industrial
setting)
Kurt Eberle
18:1518:20 Conclusions
178
Workshop 12: Fourth Workshop on Linked Data in Linguistics:

Resources and Applications
Organizers: Christian Chiarcos, John Philip McCrae, Petya Osenova, Philipp Cimiano, and
Nancy Ide
Venue: 302B

09:0009:30 Introduction
09:3010:30 Invited Talk: "DBpedia and Mulitlingualism"
Key-Sun Choi
11:0011:30 From DBpedia and WordNet hierarchies to LinkedIn and Twitter
Aonghus McGovern, Alexander OConnor, and Vincent Wade
11:3012:00 A Linked Data Model for Multimodal Sentiment and Emotion Analysis
J. Fernando Sanchez-Rada, Carlos A. Iglesias, and Ronald Gil
12:0012:30 Seeing is Correcting: curating lexical resources using social interfaces
Livy Real, Fabricio Chalub, Valeria De Paiva, Claudia Freitas, and
Alexandre Rademaker
12:3014:00 Lunch
14:0014:30 Sar-graphs: A Linked Linguistic Knowledge Resource Connecting Facts with
Language
Sebastian Krause, Leonhard Hennig, Aleksandra Gabryszak, Feiyu Xu, and
Hans Uszkoreit
14:3015:00 Reconciling Heterogeneous Descriptions of Language Resources
John Philip McCrae, Philipp Cimiano, Victor Rodriguez-Doncel,
Daniel Vila Suero, Jorge Gracia, Luca Matteis, Roberto Navigli,
Andrejs Abele, Gabriela Vulcu, and Paul Buitelaar
15:0015:30 RDF Representation of Licenses for Language Resources
Victor Rodriguez-Doncel and Penny Labropoulou
16:0016:20 Linking Four Heterogeneous Language Resources as Linked Data
Benjamin Siemoneit, John Philip McCrae, and Philipp Cimiano
16:2016:40 EVALution 1.0: an Evolving Semantic Dataset for Training and Evaluation of
Distributional Semantic Models
Enrico Santus, Frances Yung, Alessandro Lenci, and Chu-Ren Huang
16:4017:00 Linguistic Linked Data in Chinese: The Case of Chinese Wordnet
Chih-Yao Lee and Shu-Kai Hsieh
17:0017:30 Closing Remarks and Discussion
179
One-day Workshops
Workshop 13: Workshop on Noisy User-generated Text
Organizers: Wei Xu, Bo Han, and Alan Ritter

Venue: 303A

9:0010:30 Invited Talks
9:009:45 Text Mining of Social Media: Going beyond the Text and Only the Text
Timothy Baldwin
9:4510:30 Where is Language?
Anders Sogaard
11:0012:30 Long Papers and Abstracts
11:0011:15 Learning finite state word representations for unsupervised Twitter adaptation
of POS taggers
Julie Wulff and Anders Sogaard
11:1511:30 Towards POS Tagging for Arabic Tweets
Fahad Albogamy and Allan Ramasy
11:3011:45 Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish
Tweets
Teresa Lynn, Kevin Scannell, and Eimear Maguire
11:4511:00 Challenges of studying and processing dialects in social media
Anna Jorgensen, Dirk Hovy, and Anders Sogaard
12:0012:15 Toward Tweets Normalization Using Maximum Entropy
Mohammad Arshi Saloot, Norisma Idris, Liyana Shuib, Ram Gopal Raj, and
AiTi Aw
12:1512:30 Five Shades of Noise: Analyzing Machine Translation Errors in
User-Generated Text
Marlies Van der Wees, Arianna Bisazza, and Christof Monz
12:3014:00 Poster Session and Lunch
Learning finite state word representations for unsupervised Twitter adaptation
of POS taggers
Julie Wulff and Anders Sogaard
Towards POS Tagging for Arabic Tweets
Fahad Albogamy and Allan Ramasy
Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish
Tweets
Teresa Lynn, Kevin Scannell, and Eimear Maguire
Challenges of studying and processing dialects in social media
Anna Jorgensen, Dirk Hovy, and Anders Sogaard
Toward Tweets Normalization Using Maximum Entropy
Mohammad Arshi Saloot, Norisma Idris, Liyana Shuib, Ram Gopal Raj, and
AiTi Aw
Five Shades of Noise: Analyzing Machine Translation Errors in
User-Generated Text
Marlies Van der Wees, Arianna Bisazza, and Christof Monz
A Normalizer for UGC in Brazilian Portuguese
Magali Sanches Duran, Maria das Gracas Volpe Nunes, and Lucas Avanco
180
USFD: Twitter NER with Drift Compensation and Linked Data

Leon Derczynski, Isabelle Augenstein, and Kalina Bontcheva
Enhancing Named Entity Recognition in Twitter Messages Using Entity
Linking
Ikuya Yamada, Hideaki Takeda, and Yoshiyasu Takefuji
Improving Twitter Named Entity Recognition using Word Representations
Zhiqiang Toh, Bin Chen, and Jian Su
NRC: Infused Phrase Vectors for Named Entity Recognition in Twitter
Colin Cherry, Hongyu Guo, and Chengbi Dai
IITP: Multiobjective Differential Evolution based Twitter Named Entity
Recognition
Md Shad Akhtar, Utpal Kumar Sikdar, and Asif Ekbal
Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition
for Twitter Microposts using Distributed Word Representations
Frederic Godin, Baptist Vandersmissen, Wesley De Neve, and Rik Van de Walle
Data Adaptation for Named Entity Recognition on Tweets with Features-Rich
CRF
Tian Tian, Marco Dinarelli, and Isabelle Tellier
Hallym: Named Entity Recognition on Twitter with Word Representation
Eun-Suk Yang and Yu-Seop Kim
IHS RD: Lexical Normalization for English Tweets
Dmitry Supranovich and Viachaslau Patsepnia
Bekli:A Simple Approach to Twitter Text Normalization.
Russell Beckley
NCSU-SAS-Ning: Candidate Generation and Feature Engineering for
Supervised Lexical Normalization
Ning Jin
DCU-ADAPT: Learning Edit Operations for Microblog Normalisation with the
Generalised Perceptron
Joachim Wagner and Jennifer Foster
LYSGROUP: Adapting a Spanish microtext normalization system to English.
Yerai Doval Mosquera, Jesus Vilares, and Carlos Gomez-Rodriguez
IITP: Hybrid Approach for Text Normalization in Twitter
Md Shad Akhtar, Utpal Kumar Sikdar, and Asif Ekbal
NCSU SAS WOOKHEE: A Deep Contextual Long-Short Term Memory
Model for Text Normalization
Wookhee Min and Bradford Mott
NCSU SAS SAM: Deep Encoding and Reconstruction for Normalization of
Noisy Text
Samuel Leeman-Munk, James Lester, and James Cox
USZEGED: Correction Type-sensitive Normalization of English Tweets Using
Efficiently Indexed n-gram Statistics
Gabor Berend and Ervin Tasnadi
14:0015:30 Shared Task Session
14:0014:30 Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter
Lexical Normalization and Named Entity Recognition
Timothy Baldwin, Marie-Catherine De Marneffe, Bo Han, Young-Bum Kim,
Alan Ritter, and Wei Xu
14:3014:45 Enhancing Named Entity Recognition in Twitter Messages Using Entity
Linking
Ikuya Yamada, Hideaki Takeda, and Yoshiyasu Takefuji
14:4515:00 Improving Twitter Named Entity Recognition using Word Representations
Zhiqiang Toh, Bin Chen, and Jian Su
181
One-day Workshops
15:0015:15 Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition
for Twitter Microposts using Distributed Word Representations
Frederic Godin, Baptist Vandersmissen, Wesley De Neve, and Rik Van de Walle
15:1515:30 NCSU SAS SAM: Deep Encoding and Reconstruction for Normalization of
Noisy Text
Samuel Leeman-Munk, James Lester, and James Cox
16:0017:30 Invited Talks
16:0016:45 Automated Grammatical Error Correction for Language Learners: Where are
we, and where do we go from there?
Joel Tetreault
16:4517:30 Are Minority Dialects "Noisy Text"?: Implications of Social and Linguistic
Diversity for Social Media NLP
Brendan OConnor
182
Workshop 14: The 2nd Workshop on Natural Language Processing

Techniques for Educational Applications
Organizers: Hsin-Hsi Chen, Yuen-Hsien Tseng, Yuji Matsumoto, and Lung Hsiang Wong
Venue: 303B

09:3009:40 Opening Ceremony
09:4010:30 Invited Speech
09:4010:30 Big Data-Based Automatic Essay Scoring Service
11:0012:00 Shared Task Session
11:0011:20 Overview of the NLP-TEA 2015 Shared Task for Chinese Grammatical Error
Diagnosis
Lung-Hao Lee, Liang-Chih Yu, and Li-Ping Chang
11:2011:40 Chinese Grammatical Error Diagnosis by Conditional Random Fields
Shih-Hung Wu, Po-Lin Chen, Liang-Pu Chen, Ping-Che Yang, and
Ren-Dar Yang
11:4012:00 NTOU Chinese Grammar Checker for CGED Shared Task
Chuan-Jie Lin and Shao-Heng Chen
12:0014:00 Lunch
14:3015:30 Regular Paper Session
14:3014:50 Collocation Assistant for Learners of Japanese as a Second Language
Lis Pereira and Yuji Matsumoto
14:5015:10 Semi-automatic Generation of Multiple-Choice Tests from Mentions of
Semantic Relations
Renlong Ai, Sebastian Krause, Walter Kasper, Feiyu Xu, and Hans Uszkoreit
15:1015:30 Interactive Second Language Learning from News Websites
Tao Chen, Naijia Zheng, Yue Zhao, Muthu Kumar Chandrasekaran, and
Min-Yen Kan
16:0016:05 Bilingual Keyword Extraction and its Educational Application
Chung-Chi Huang, Mei-Hua Chen, and Ping-Che Yang
16:0516:10 Annotating Entailment Relations for Shortanswer Questions
Simon Ostermann, Andrea Horbach, and Manfred Pinkal
16:1016:15 An Automated Scoring Tool for Korean Supply-type Items Based on
Semi-Supervised Learning
Minah Cheon, Hyeong-Won Seo, Jae-Hoon Kim, Eun-Hee Noh,
Kyung-Hee Sung, and EunYong Lim
16:1516:20 A System for Generating Multiple Choice Questions: With a Novel Approach
for Sentence Selection
Mukta Majumder and Sujan Kumar Saha
16:2016:25 The "News Web Easy" news service as a resource for teaching and learning
Japanese: An assessment of the comprehension difficulty of Japanese
sentence-end expressions
Hideki Tanaka, Tadashi Kumano, and Isao Goto
183
One-day Workshops
16:2516:30 Grammatical Error Correction Considering Multi-word Expressions

Tomoya Mizumoto, Masato Mita, and Yuji Matsumoto
16:3016:35 Salinlahi III: An Intelligent Tutoring System for Filipino Heritage Language
Learners
Ralph Vincent Regalado, Michael Louie Bonon, Nadine Chua,
Rene Rose Pinera, and Shannen Rose Dela Cruz
16:3516:40 Using Finite State Transducers for Helping Foreign Language Learning
Hasan Kaya and Gulsen Eryigit
16:4016:45 Chinese Grammatical Error Diagnosis Using Ensemble Learning
Yang Xiang, Xiaolong Wang, Wenying Han, and Qinghua Hong
16:4516:50 Condition Random Fields-based Grammatical Error Detection for Chinese as
Second Language
Jui-Feng Yeh, Chan Kun Yeh, Kai-Hsiang Yu, Ya-Ting Li, and Wan-Ling Tsai
16:5016:55 Improving Chinese Grammatical Error Correction with Corpus Augmentation
and Hierarchical Phrase-based Statistical Machine Translation
Yinchen Zhao, Mamoru Komachi, and Hiroshi Ishikawa
16:5517:00 Chinese Grammatical Error Diagnosis System Based on Hybrid Model
Xiupeng Wu, Peijie Huang, Jundong Wang, Qingwen Guo, Yuhong Xu, and
Chuping Chen
17:0017:10 Closing Remarks
184
Workshop 15: The First Workshop on Computing News Storylines
Organizers: Tommaso Caselli, Marieke van Erp, Anne-Lyse Minard, Mark Finlayson, Ben Miller,
Jordi Atserias, Alexandra Balahur, and Piek Vossen
Venue: 305

09:2509:50 Interactions between Narrative Schemas and Document Categories
Dan Simonson and Anthony Davis
09:5010:10 Improving Event Detection with Abstract Meaning Representation
Xiang Li, Thien Huu Nguyen, Kai Cao, and Ralph Grishman
10:1010:30 News clustering approach based on discourse text structure
Tatyana Makhalova, Dmitry Ilvovsky, and Boris Galitsky
11:0011:25 To Do or Not to Do: the Role of Agendas for Action in Analyzing News
Coverage of Violent Conflict
Katsiaryna Stalpouskaya and Christian Baden
11:2511:45 MediaMeter: A Global Monitor for Online News Coverage
Tadashi Nomoto
11:4512:05 Expanding the horizons: adding a new language to the news personalization
system
Andrey Fedorovsky, Maxim Ionov, Varvara Litvinova, Tatyana Olenina, and
Darya Trofimova
12:0512:30 Storylines for structuring massive streams of news
Piek Vossen, Tommaso Caselli, and Yiota Kontzopoulou
14:0014.20 From TimeLines to StoryLines: A preliminary proposal for evaluating
narratives
Egoitz Laparra, Itziar Aldabe, and German Rigau
14:2014:40 Cross-Document Non-Fiction Narrative Alignment
Ben Miller, Jennifer Olive, Shakthidhar Gopavaram, and Ayush Shrestha
14:4015:30 Discussion and Conclusions
185
186
Anti-harassment policy
8
ACL has always prided itself as a venue that allows for the open exchange of ideas and the freedom
of thought and expression. In keeping with these beliefs, to codify them and to ensure that ACL
becomes immediately aware of any deviation from these principles, ACL has instituted an
anti-harassment policy in coordination with NAACL.
Harassment and hostile behavior is unwelcome at any ACL conference; including speech or
behavior that intimidates, creates discomfort, or interferes with a persons participation or
opportunity for participation in the conference. We aim for ACL conferences to be environments
where harassment in any form does not happen, including but not limited to harassment based on
race, gender, religion, age, color, national origin, ancestry, disability, sexual orientation, or gender
identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking,
harassing photography or recording, inappropriate physical contact, and unwelcome sexual
attention.
If you are a victim of harassment or hostile behaviour at an ACL conference, or otherwise observe
such behaviour toward someone else, please contact any of the following people:
Any current member of the ACL board ( http://aclweb.org/about )

Hal Daume III
Julia Hirschberg
Su Jian
Priscilla Rasmussen
Please be assured that if you approach us, your concerns will be kept in strict confidence, and we
will consult with you on the actions taken by the Board.
187
188
Index
Abbes, Ramzi, 164 AlShenaifi, Nouf, 164

Abdelali, Ahmed, 164 Altamirano, Romina, 69
Abele, Andrejs, 179 Alvarez, Aitor Arronte, 170
Abend, Omri, 168 Amir, Silvio, 84
Abualasal, Jaber, 164 Anand, Ashish, 173
Adar, Eytan, 74 Ananiadou, Sophia, 172
Aggarwal, Varun, 84 Angeli, Gabor, 59, 82
Agic, eljko, 99, 144 Angelov, Krasimir, 165
Agirre, Eneko, 100 AR, Balamurali, 122
Agrawal, Ankit, 122 Arai, Noriko H., 95
Ai, Renlong, 88, 183 Aramaki, Eiji, 131, 140
Aizawa, Akiko, 169 Arcan, Mihael, 76
Akbik, Alan, 61, 88 Arcedillo, Manuel, 177
Akel, Rana, 163 Arnaud, Adam St, 174
Akhtar, Md Shad, 181 Asbayou, Omar, 164
Aksoy, Eren Erdal, 75 Asher, Nicholas, 57
Al-Badrashiny, Mohamed, 164 Astudillo, Ramn, 84
Al-Khalifa, Hend, 164 Atserias, Jordi, 185
Al-Shargi, Faisal, 163 Attardi, Giuseppe, 79
Al-Yahya, Maha, 164 Attia, Mohammed, 164
Alameri, Sumaya, 163 Augenstein, Isabelle, 181
Alamri, Abdulaziz, 173 Auli, Michael, 99, 116
Albanesi, Davide, 171 Aussenac-Gilles, Nathalie, 123
Alberti, Chris, 58 Avanco, Lucas, 180
Albogamy, Fahad, 180 Aw, AiTi, 180
Aldabe, Itziar, 114, 185 Azadi, Fatemeh, 166
Aldarmaki, Hanan, 164
Aldebei, Khaled, 118 Brschinger, Benjamin, 109
Aletras, Nikolaos, 147 Babych, Bogdan, 177
Alex, Beatrice, 170 Badaro, Gilbert, 163
AlGahtani, Shabib, 163 Baden, Christian, 185
Ali, Ahmed, 163 Bai, Yiqi, 161
Alishahi, Afra, 71 Bairi, Ramakrishna, 72
Allauzen, Alexandre, 176 Bak, JinYeong, 170
Allen, James, 172 Bakagianni, Juli, 88
Almeida, Mariana S. C., 62 Bakhshaei, Somayeh, 166
AlNefie, Rehab, 164 Balahur, Alexandra, 185
Aloimonos, Yiannis, 75 Baldridge, Jason, 106
Alonso, Hctor Martnez, 144 Baldwin, Timothy, 122, 180, 181
189
Author Index
Balla, Amar, 164 Bonon, Michael Louie, 184

Ballesteros, Miguel, 58 Bontcheva, Kalina, 119, 181
Baly, Ramy, 163 Bos, Johan, 111
Banchs, Rafael E., 174, 177 Bouamor, Houda, 163, 164
Banerjee, Siddhartha, 79 Bouchard, Guillaume, 27, 36
Bansal, Mohit, 125 Boudin, Florian, 166, 169
Bao, Forrest, 68 Bougares, Fethi, 164, 176
Baral, Chitta, 80 Bowman, Samuel R., 176
Barbosa, Luciano, 124 Boyd-Graber, Jordan, 109, 140, 141
Barlacchi, Gianni, 88, 98 Boydstun, Amber E., 116
Barona, Santiago, 115 Branco, Antonio, 168, 177
Barone, Antonio Valerio Miceli, 79 Braune, Fabienne, 78
Baroni, Marco, 57, 67, 81, 82 Bride, Antoine, 57
Barrn-Cedeo, Alberto, 124, 164, 166 Brockett, Chris, 116
Barrault, Loic, 176 Bronstein, Ofer, 115
Barzilay, Regina, 102, 146 Bryant, Christopher, 76
Basili, Roberto, 88 Buitelaar, Paul, 76, 179
Basu, Sumit, 80 Bulat, Luana, 98
Bauer, Sandro, 129 Butler, Alastair, 168
Bechara, Hannah, 129 Buys, Jan, 130
Beckley, Russell, 181
Belinkov, Yonatan, 164 Cacciafoco, Francesco Perono, 171
Bellandi, Andrea, 171 Cai, Shu, 173
Beller, Charley, 111 Cakici, Ruket, 71
Bender, Emily M., 165 Calacci, Dan, 139
Bengio, Yoshua, 47 Callison-Burch, Chris, 69, 111, 116
Benotti, Luciana, 69 Camacho-Collados, Jos, 67, 77
Benotto, Giulia, 171 Campos, Jorge, 88
Bentor, Yinon, 52 Cao, Kai, 185
Benz, Anton, 72 Cao, Yuhui, 161
Berant, Jonathan, 67, 104 Cao, Zhu, 74
Berend, Gabor, 181 Cao, Ziqiang, 129
Besanon, Romaric, 52 Caragea, Cornelia, 169
Bethard, Steven, 172 Carbonell, Jaime, 98, 122
Bhagavatula, Chandra, 77 Card, Dallas, 116
Bhattacharyya, Pushpak, 122, 126, 174 Cardie, Claire, 83
Bhattasali, Shohini, 126 Carman, Mark James, 122
Bhowmick, Sourav, 74 Carreras, Xavier, 51, 137
Biemann, Chris, 88 Caselli, Tommaso, 185
Bilmes, Jeff, 72 Castellucci, Giuseppe, 88
Bing, Lidong, 138 Cejuela, Juan Miguel, 88
Bird, Steven, 129 Chai, Qinghua, 102
Bisazza, Arianna, 120, 180 Chalub, Fabricio, 179
Bisk, Yonatan, 106, 130 Chan, Angel, 160
Bitvai, Zsolt, 97 Chan, Tsz Ping, 69
Bjerva, Johannes, 170 Chandrasekaran, Muthu Kumar, 183
Black, Alan W, 122 Chang, Angel, 49
Blunsom, Phil, 130 Chang, Baobao, 58, 73, 123
Bogdanova, Dasha, 124 Chang, Jason, 89
Bohnet, Bernd, 144 Chang, Jim, 89
Boisson, Joanne, 89 Chang, Jing-Shin, 57
Bojar, Ondrej, 177 Chang, Li-Ping, 160, 183
Boldoba, Josu, 166 Chang, Ming-Wei, 65, 100, 104
Bollegala, Danushka, 76 Chang, Tao-Hsing, 160, 161
Bond, Francis, 88, 165 Chang, Yung-Chun, 127
Boni, Mohammad Al, 127 Chao, Jiayuan, 148
190
Author Index
Chao, Lidia S., 160 Chrupaa, Grzegorz, 71

Charniak, Eugene, 83 Chu, Wanghuan, 169
Chatterjee, Rajen, 95 Chu, Wei-Cheng, 161
Che, Wanxiang, 87 Chua, Nadine, 184
Chen, Bin, 181 Chung, Jin-Woo, 172
Chen, Bingzhou, 161 Chute, Christopher G., 173
Chen, Boxing, 95 Ciaramita, Massimiliano, 109
Chen, Cen-Chieh, 127 Cimiano, Philipp, 179
Chen, Chen, 113 Ciobanu, Alina Maria, 116
Chen, Chien Chin, 127 Clark, Alexander, 139
Chen, Chun-Hsien, 161 Clark, Kevin, 108
Chen, Chuping, 184 Clark, Peter, 56
Chen, Danqi, 176 Clark, Stephen, 61, 71, 98, 99, 125
Chen, Dong, 169 Clarke, Charles, 123
Chen, Hsin-Hsi, 160, 183 Claus, Patrick, 88
Chen, Hsueh-Chih, 160 Clavel, Chlo, 84
Chen, Huadong, 78 Cohen, Kevin Bretonnel, 172, 173
Chen, Jiajun, 78, 87 Cohen, William W., 59, 75, 109
Chen, Liang-Pu, 183 Cohn, Trevor, 97, 119, 122, 129
Chen, Long, 62 Coke, Reed, 63
Chen, Mei-Hua, 183 Collins, Michael, 58, 103
Chen, Ping, 118 Conrad, Stefan, 116
Chen, Po-Lin, 183 Cook, Paul, 129
Chen, Qian, 50 Costa, Lus Morgado da, 88
Chen, Qiang, 62 Costa-jussa, Marta R., 177
Chen, Qingcai, 119, 125 Cotterell, Ryan, 146
Chen, Shao-Heng, 183 Coulet, Adrien, 172
Chen, Tao, 67, 161, 183 Cox, James, 181, 182
Chen, Wei-Te, 131 Crabb, Benoit, 117
Chen, Weizheng, 118 Croce, Danilo, 88
Chen, Wenliang, 148 Cruys, Tim Van de, 57
Chen, Xiaoping, 50 Cruz, Shannen Rose Dela, 184
Chen, Xinchi, 86, 145 Crysmann, Berthold, 165
Chen, Yubo, 52 Cuong, Nguyen Viet, 174
Chen, Yufeng, 168, 174 Curran, James R., 86
Chen, Yun-Nung, 64, 130, 160 Cytryn, Jeremy, 126
Chen, Zhao, 161
Chen, Zhigang, 50 DSouza, Jennifer, 100
Chen, Zhiyuan, 126 Daga, Pranjal, 122
Cheng, Fei, 99 Dagan, Ido, 50, 100, 115, 171
Cheng, Hao, 71 Dai, Chengbi, 181
Cheng, Xueqi, 51, 160 Dai, Hong-Jie, 173
Cheng, Yu, 122 Dai, LiRong, 118
Cheon, Minah, 166, 183 Dai, Xinyu, 78, 84
Cherry, Colin, 181 Daille, Beatrice, 166
Cheung, Jackie Chi Kit, 63, 176 Danescu-Niculescu-Mizil, Cristian, 140
Chiarcos, Christian, 179 Daniele, Falavigna, 85
Chiticariu, Laura, 61 Danilevsky, Marina, 61
Cho, Kyunghyun, 47 Dannells, Dana, 165
Choe, Do Kook, 83, 87 Darwish, Kareem, 163, 164
Choi, Eunsol, 104 Das, Dipanjan, 98, 110
Choi, Jinho D., 61 Das, Dipankar, 169
Choi, Key-Sun, 61, 179 Das, Rajarshi, 78
Choudhary, Alok, 122 Dasgupta, Sajib, 115
Chris, Quirk, 168 Datta, Srayan, 74
Christodoulopoulos, Christos, 130 Daum III, Hal, 139, 141
191
Author Index
Davis, Anthony, 185 Elhadad, Nomie, 106

De Rijke, Maarten, 141 Elliott, Desmond, 49, 71
Delac, Goran, 170 El Hajj, Wassim, 163
Demberg, Vera, 77 Enache, Ramona, 165
Demner-Fushman, Dina, 172 Erbs, Nicolai, 169
Deng, Li, 71 Erdem, Aykut, 71
Deng, Yuntian, 103 Erdem, Erkut, 71
Derczynski, Leon, 181 Erp, Marieke van, 170, 185
Deulofeu, Jos, 85 Eryigit, Gulsen, 130, 184
Devlin, Jacob, 48, 71 Escartin, Carla Parra, 177
De Beaumont, Will, 172 Espana-Bonet, Cristina, 166
De Castilho, Richard Eckart, 88
De Marneffe, Marie-Catherine, 70, 181 Faili, Heshaam, 130
De Matos, David Martins, 122 Fan, Zhenzhen, 165
De Melo, Gerard, 74, 82, 168 Fang, Hao, 71
De Neve, Wesley, 181, 182 Farah, Benamara, 123
De Paiva, Valeria, 179 Farra, Noura, 163
De Rijke, Maarten, 73 Faruqui, Manaal, 111, 117
De Souza, Jos G. C., 55 Fayad, Linda, 163
De Vries, Arjen, 49 Federico, Marcello, 119
Diab, Mona, 163, 164 Fedorovsky, Andrey, 185
Diaz, Fernando, 138 Feldman, Elana, 126
Dinarelli, Marco, 181 Feldman, Naomi, 139
Dinh, Erik-Ln Do, 88 Feng, Chong, 161
Dinu, Georgiana, 57, 81 Feng, Shi, 162
Dinu, Liviu P., 116 Feng, Yansong, 98
Di Segni, Gianfranco, 171 Fermuller, Cornelia, 75
Dligach, Dmitriy, 172 Fernndez-Gonzlez, Daniel, 99, 112
Dolan, Bill, 116 Ferreira, Thiago, 69
Dong, Daxiang, 144 Ferret, Olivier, 52, 117
Dong, Li, 56 Figueira, Helena, 62
Dou, Qing, 79 Filice, Simone, 82, 88, 124
Downey, Doug, 59, 77 Finch, Andrew, 174
Dragut, Eduard, 83 Finegan-Dollak, Catherine, 63
Dredze, Mark, 115, 116 Finlayson, Mark, 185
Du, Lan, 109 Fischer, Stefan, 77
Du, Yantao, 112 Fishel, Mark, 130
Duek, Ondrej, 63 Fisher, Robert, 70
Duan, Huilong, 173 Florian, Radu, 172
Duan, Manjuan, 165 Foster, Jennifer, 181
Duan, Xiangyu, 174 Foulds, James, 49, 51
Dubey, Kumar, 56 Frank, Anette, 102, 115, 170
Duh, Kevin, 82, 99, 160, 168 Freitas, Claudia, 179
Duong, Long, 129 Fried, Daniel, 56, 125
Duran, Magali Sanches, 180 Friedrich, Annemarie, 102
Durrani, Nadir, 164 Fujita, Akira, 95
Durrett, Greg, 58, 108
Dyer, Chris, 58, 78, 79, 95, 98, 111, 117, 122 Grtner, Markus, 88
Gmez-Rodrguez, Carlos, 99
Eberle, Kurt, 177, 178 Gabryszak, Aleksandra, 179
Eckart, Kerstin, 88 Gad-elrab, Mohamed H., 164
Eger, Steffen, 81, 171 Gala, Nuria, 163
Ein-Dor, Liat, 116 Galanis, Dimitrios, 88
Eisner, Jason, 27, 31, 99, 146 Galescu, Lucian, 172
Ekbal, Asif, 181 Galindo, Michelle, 70
El-Hajj, Wassim, 163 Galitsky, Boris, 124, 185
192
Author Index
Galley, Michel, 80, 116, 168 Guo, Weiwei, 138

Ganchev, Kuzman, 110 Guo, Yuhong, 119
Ganitkevitch, Juri, 69, 116 Gupta, Samir, 172
Gao, Jianfeng, 104, 116 Gupta, Saurabh, 71
Gao, Wei, 69 Gurevych, Iryna, 76, 88, 169
Gao, Yingkai, 103 Guzmn, Francisco, 78
Gasic, Milica, 127 Gzawi, Mahmoud, 164
Ge, Tao, 58, 73, 123, 172
Ge, Wendong, 128 Habash, Nizar, 163
Georgi, Ryan, 170 Habib, Mena, 88
Gerber, Matthew S., 127 Hadrich-Belguith, Lamia, 123
Gershman, Anatole, 64, 122 Hagiwara, Masato, 115
Getoor, Lise, 49, 51 Hahn-Powell, Gustave, 56, 89
Ghiasifard, Sonia, 166 Hajj, Hazem, 163
Ghoneim, Mahmoud, 163 Hakala, Kai, 88
Ghosh, Samik, 172 Hallgren, Thomas, 165
Gil, Ronald, 179 Hamdi, Ahmed, 163
Giles, C. Lee, 89, 169 Han, Aaron Li-Feng, 160
Gillick, Daniel, 100 Han, Bo, 173, 180, 181
Gimpel, Kevin, 125 Han, Jiawei, 27, 29, 73, 135
Ginter, Filip, 88 Han, Wenying, 184
Giovannetti, Emiliano, 171 Han, Xianpei, 55, 114
Glava, Goran, 69 Han, Zhe, 98
Gleize, Martin, 81 Hanks, Patrick, 27, 34
Godin, Frederic, 181, 182 Hao, Hongwei, 114
Goebel, Randy, 100 Hardmeier, Christian, 168
Gokcen, Ajda, 70 Hashimoto, Kazuma, 176
Goldberg, Yoav, 50 Hassan, Mohsen, 172
Gollapalli, Sujatha Das, 169 Hauer, Bradley, 174
Gomes, Luis, 177 Hawwari, Abdelati, 163
Gomez-Rodriguez, Carlos, 181 Hayashi, Katsuhiko, 95
Gopavaram, Shakthidhar, 185 Hazem, Amir, 166
Gormley, Matthew R., 27, 31 He, Shizhu, 75
Goto, Isao, 183 He, Tingting, 56
Gracia, Jorge, 179 He, Wei, 144, 161
Graham, Yvette, 149 He, Xiangjian, 118
Grau, Brigitte, 81 He, Xiaodong, 71, 104
Grave, Edouard, 106 He, Yanxiang, 62
Gravier, Christophe, 67 He, Yulan, 67
Grefenstette, Edward, 50, 176 He, Zhongjun, 177
Grishina, Yulia, 166 Heafield, Kenneth, 115
Grishman, Ralph, 74, 114, 185 Hearst, Marti A. , 93
Gronas, Mikhail, 170 Heinzerling, Benjamin, 88
Groschwitz, Jonas, 110 Hennig, Leonhard, 74, 88, 179
Gross, Justin H., 116 Herder, Eelco, 138
Gruzitis, Normunds, 165 Hermann, Karl Moritz, 176
Gu, Jiatao, 97 Hicks, Thomas, 89
Gui, Lin, 161 Hinrichs, Erhard, 145
Guimaraes, Victor, 174 Hirst, Graeme, 72, 117
Gulordava, Kristina, 117 Hockenmaier, Julia, 106, 130
Guo, Hongyu, 95, 181 Hoi, Steven, 141
Guo, Jiafeng, 51 Hong, Kaiduo, 161, 162
Guo, Jiang, 87 Hong, Na, 173
Guo, Li, 50 Hong, Qinghua, 184
Guo, Qingwen, 184 Horbach, Andrea, 183
Guo, Shu, 50 Hoshino, Sho, 95
193
Author Index
Hou, Jianpeng, 160 Jensen, Lars Juhl, 88

Hou, Junfeng, 118 Jeong, Minwoo, 64
Hou, Min, 161 Jezek, Elisabetta, 27, 34
Hou, Yufang, 170 Jha, Rahul, 63
Hovy, Dirk, 77, 99, 118, 139, 180 Ji, Donghong, 63
Hovy, Eduard, 168 Ji, Guoliang, 75
Hsieh, Shu-Kai, 179 Ji, Heng, 27, 29, 52, 73, 79, 100, 115, 123, 172
Hsieh, Yu-lun, 127 Ji, Yangfeng, 116
Hsu, Daniel, 103 Jia, Ming, 161
Hsu, Wen-Lian, 127 Jiang, Guoqian, 173
Hu, Baotian, 119, 125 Jiang, Hui, 50, 111, 118
Hu, Linmei, 114 Jiang, Jing, 62, 97
Hu, Po, 56 Jiang, Wenbin, 47, 137
Hu, Xiaohua, 122 Jin, Gongye, 161
Hu, Yu, 111 Jin, Ning, 181
Hu, Yue, 121 Jin, Rize, 172
Hu, Zhiting, 103 Jin, Yaohong, 161
Huang, Bert, 51 Johannsen, Anders, 144
Huang, Chu-Ren, 27, 41, 160, 179 Johnson, Mark, 109
Huang, Chung-Chi, 183 Jonnagaddala, Jitendra, 173
Huang, Degen, 62 Jorgensen, Anna, 180
Huang, Heyan, 161 Joshi, Aditya, 122, 126
Huang, Hongzhao, 73 Joty, Shafiq, 78, 124
Huang, Lei, 68, 84, 161 Ju, Meizhi, 173
Huang, Liang, 27, 38, 97 Jurafsky, Dan, 85, 113
Huang, Minlie, 105 Jurcicek, Filip, 63
Huang, Peijie, 161, 162, 184
Huang, Poyao, 103 Kdr, kos, 71
Huang, Qiang, 161 Kajiwara, Tomoyuki, 131
Huang, Shujian, 78, 87 Kan, Min-Yen, 138, 169, 183
Huang, Songfang, 98 Kang, Xin, 161
Huang, Ting-Hao, 160 Kanouchi, Shin, 140
Huang, Xuanjing, 86, 145 Karoui, Jihen, 123
Huang, Zhongqiang, 48 Kasper, Walter, 183
Hummel, Shay, 116 Kaufmann, Manuel, 88
Hussey, Daniel Vidal, 165 Kawahara, Daisuke, 27, 34, 161
Kawarabayashi, Ken-Ichi, 76
Iacobacci, Ignacio, 50 Kaya, Hasan, 184
Iacovelli, Douglas, 70 Kedzie, Chris, 138
Ide, Nancy, 179 Keller, Frank, 87
Idris, Norisma, 180 Kennington, Casey, 57
Iglesias, Carlos A., 179 Keulen, Maurice van, 88
Ilvovsky, Dmitry, 124, 185 Khadivi, Shahram, 166
Imany, Mohsen, 130 Khairallah, Jeffrey, 163
Ionov, Maxim, 185 Kiagias, Emmanouil, 141
Ishikawa, Hiroshi, 131, 140, 184 Kiela, Douwe, 71, 98
Iwayama, Makoto, 89 Kim, Jae-Hoon, 166, 183
Iyer, Rishabh, 72 Kim, Mi-Young, 100
Iyyer, Mohit, 141 Kim, Young-Bum, 64, 97, 128, 181
Kim, Youngjun, 172
Jaakkola, Tommi, 146 Kim, Yu-Seop, 181
Jakubina, Laurent, 166 King, Ben, 63
Jalalvand, Shahab, 85 Kirov, Christo, 124
Jansen, Peter, 56 Kitagawa, Yoshiaki, 131
Jatowt, Adam, 74 Klabunde, Ralf, 72
Jean, Sbastien, 47 Klein, Dan, 58
194
Author Index
Knight, Kevin, 73, 79 Lee, Lung-Hao, 160, 183

Ko, Youngjoong, 128 Lee, Sophia, 126, 161
Koleva, Nikolina, 128 Lee, Yann-Huei, 88
Koller, Alexander, 110, 128 Leeman-Munk, Samuel, 181, 182
Komachi, Mamoru, 131, 140, 184 Lei, Yu, 62
Kondrak, Grzegorz, 100, 146, 174 Lenci, Alessandro, 179
Kong, Lingpeng, 160 Lepage, Yves, 55
Konstas, Ioannis, 87 Lestari, Victoria Anugrah, 170
Kontzopoulou, Yiota, 185 Lester, James, 181, 182
Kotani, Katsunori, 166 Levi, Effi, 125
Krause, Sebastian, 88, 179, 183 Levin, Lori, 165
Krishnamurthy, Jayant, 57 Levy, Omer, 50
Kruszewski, Germn, 82 Levy, Ran, 116
Kshirsagar, Meghana, 98 Lewis, William, 170
Kshirsagar, Rohan, 115 Li, Binyang, 161
Ku, Lun-Wei, 88, 105 Li, Chen, 69, 80, 88, 173
Kuhn, Jonas, 88 Li, Dingcheng, 173
Kumano, Tadashi, 183 Li, Haizhou, 174
Kumar, Shachi H., 49 Li, Hang, 47, 119, 137
Kumar, Srijan, 140 Li, Haomin, 173
Kumaran, A., 174 Li, Hong, 88
Kunchukuttan, Anoop, 174 Li, Hongjie, 161
Kurohashi, Sadao, 161 Li, Hongzheng, 161
Kuznetsov, Sergey O., 124 Li, Jiwei, 85
Kwiatkowski, Tom, 104 Li, Juanzi, 114
Kwong, Oi Yee, 174 Li, Miao, 161
Li, Piji, 138
L, Tun Anh, 88 Li, Qi, 115
Labropoulou, Penny, 179 Li, Qiuchi, 161
Labutov, Igor, 80 Li, Ru, 102
Laforest, Frdrique, 67 Li, Shaotong, 168
Lai, K. Robert, 127 Li, Shoushan, 68, 84, 126
Lai, Wei, 160 Li, Sujian, 73, 100, 122, 129
Lam, Wai, 65, 138 Li, Tao, 84
Lamar, Thomas, 48 Li, Victor O.K., 97
Lambert, Patrik, 127, 177 Li, Wenjie, 62, 122, 129
Lampos, Vasileios, 147 Li, Xiang, 122, 185
Lan, Yanyan, 51 Li, Xiaoli, 114, 169
Langlais, Phillippe, 166 Li, Xiaoming, 118
Langlet, Caroline, 84 Li, Ya-Ting, 184
Lao, Ni, 75 Li, Yanran, 122
Laparra, Egoitz, 114, 185 Li, Yunyao, 61
Lappin, Shalom, 139 Li, Zhenghua, 148
Larochelle, Hugo, 176 Liakata, Maria, 173
Lau, Jey Han, 139 Liang, Chen, 89
Lauw, Hady, 59 Liang, Percy, 82, 97, 104, 110, 168
Lavrenko, Victor, 147 Liao, Chun, 161
Lazaridou, Angeliki, 57, 67, 81, 82 Liao, Xiangwen, 161
Lazer, David, 139 Liao, Yi, 65, 138
Lazic, Nevena, 100 Liao, Yuan-Fu, 160
Le, Quoc V., 47, 168 Liaw, Siaw-Teng, 173
Lee, Changsu, 128 Liberman, Mark, 160
Lee, Chih-Yao, 179 Liebeck, Matthias, 116
Lee, Chin-Hui, 80 Liebeskind, Chaya, 171
Lee, Hee-Jin, 172 Lim, EunYong, 183
Lee, John, 121 Lin, Chen, 172
195
Author Index
Lin, Chin-Yew, 73 Maehara, Takanori, 76

Lin, Chuan-Jie, 161, 183 Magdy, Walid, 163
Lin, Hongyu, 55 Maguire, Eimear, 180
Lin, Ming-Jhih, 161 Maier, Wolfgang, 87
Lin, Wei, 50 Majumder, Mukta, 183
Linard, Alexis, 166 Makhalova, Tatyana, 185
Ling, Wang, 58, 84, 122 Makhoul, John, 48
Ling, Xiao, 65 Makkaoui, Olfa, 172
Ling, Zhen-Hua, 111 Maletti, Andreas, 78
Lipenkova, Janna, 88 Manjunatha, Varun, 141
Liska, Adam, 81 Manning, Christopher D., 49, 59, 82, 108, 137,
Litvinova, Varvara, 185 176
Liu, Bing, 126 Manurung, Ruli, 121, 170
Liu, Bojia, 174 Mao, Yu, 123
Liu, Chao, 121 Marie, Benjamin, 120
Liu, Chenglin, 114 Markert, Katja, 138
Liu, Fei, 68 Marquez, Lluis, 166
Liu, Hongfang, 173 Martino, Giovanni Da San, 82, 124
Liu, Huidan, 117 Martins, Andr F. T., 62, 108, 112
Liu, Jiangming, 168 Martschat, Sebastian, 88
Liu, Kai, 115 Marujo, Luis, 122
Liu, Kang, 52, 74, 75 Mascarell, Laura, 130
Liu, Lemao, 174 Mathews, Kusum, 172
Liu, Mengwen, 122 Matsumoto, Yuji, 82, 99, 129, 160, 183, 184
Liu, Quan, 111 Matsuzaki, Takuya, 95
Liu, Qun, 47, 137, 145 Matteis, Luca, 179
Liu, Rong, 176 Matthews, Austin, 58
Liu, Ting, 72, 83, 87 Maulidyani, Anggi, 121
Liu, Weisong, 173 Max, Aurlien, 120
Liu, Xiaohu, 128 McAllester, David, 125
Liu, Xule, 62 McCallum, Andrew, 51, 52
Liu, Yalin, 113 McCann, Julie, 113
Liu, Yang, 55, 69, 80, 87, 100, 105, 120, 123 McClosky, David, 87
Liu, Yongbin, 114 McCrae, John Philip, 179
Liu, Yuanchao, 105 McGovern, Aonghus, 179
Liu, Yue, 172 McGuinness, Deborah, 172
Liu, Zhiyuan, 120 McKeown, Kathleen, 138
Loginova-Clouet, Elizaveta, 166 McKeown, Kathy, 163
Long, Congjun, 117 McNaught, John, 163
Long, Zi, 166 Megyesi, Beata, 170
Lu, Chunliang, 65 Mehler, Alexander, 171
Lu, Wei, 126 Meij, Edgar, 73
Lu, Zhengdong, 47, 119, 137 Memisevic, Roland, 47
Lukasik, Michal, 119 Mendes, Pedro, 62
Luo, Yen-Fu, 170 Meng, Fandong, 47
Luong, Ngoc-Quang, 130 Merlo, Paola, 117
Luong, Thang, 47, 85 Mi, Haitao, 86
Lyngfelt, Benjamin, 165 Michael, Thilo, 88
Lynn, Teresa, 180 Mielens, Jason, 106
Miler, Kristina, 109
Mller, Stefan, 165 Miller, Ben, 185
Mrquez, Llus, 78, 124 Miller, Timothy, 172
Ma, Jianqiang, 145 Miller, Tristan, 76
Ma, Mingbo, 97 Min, Wookhee, 181
Ma, Nianzu, 126 Minard, Anne-Lyse, 185
MacKinlay, Andrew, 173 Ming, Fang, 168
196
Author Index
Minkov, Einat, 75 Nguyen, Kiem-Hieu, 52

Mirowski, Piotr, 119 Nguyen, Minh-Le, 174
Mishra, Abhijit, 122 Nguyen, Thanh Binh, 174
Misra, Dipendra Kumar, 82 Nguyen, Thien Hai, 105
Mita, Masato, 184 Nguyen, Thien Huu, 74, 114, 185
Mitchell, Jeff, 103 Nguyen, Viet-An, 109
Mitchell, Margaret, 71, 116 Nicolai, Garrett, 174
Mitchell, Tom M., 57, 59 Nicosia, Massimo, 88, 98
Mitra, Arindam, 80 Niculae, Vlad, 140
Mitra, Prasenjit, 79 Nissim, Malvina, 111
Mitsuhashi, Tomoharu, 166 Nivre, Joakim, 170, 174
Miura, Akiva, 121 Niwa, Yoshiki, 89
Miyao, Yusuke, 58, 95, 100, 112 Noh, Eun-Hee, 183
Miyoshi, Toshinori, 89 Noji, Hiroshi, 112
Mizumoto, Tomoya, 184 Nomoto, Tadashi, 185
Mochihashi, Daichi, 148 Norman, Christopher, 169
Moeljadi, David, 165 Nourian, Alireza, 130
Moens, Marie-Francine, 125 Novak, Attila, 173
Mohamed, Abdelrahman, 117 Nuhn, Malte, 120
Mohammad, Seyed, 166 Nunes, Maria das Gracas Volpe, 180
Mohit, Behrang, 163 Nuo, Minghua, 117
Monroe, Will, 49 Nyberg, Eric, 125
Monz, Christof, 120, 180
Mooney, Raymond, 52, 80 OConnor, Alexander, 179
Moriceau, Vronique, 123 OConnor, Brendan, 182
Morin, Emmanuel, 166 OSaghdha, Diarmuid, 127
Moschitti, Alessandro, 56, 82, 88, 98, 124 Obeid, Ossama, 163
Mosquera, Yerai Doval, 181 Oda, Yusuke, 55
Mostefa, Djamel, 164 Oflazer, Kemal, 163, 164
Mott, Bradford, 181 Oh, Alice, 170
Movshovitz-Attias, Dana, 109 Okazaki, Naoaki, 131, 140
Mrkic, Nikola, 127 Olenina, Tatyana, 185
Mu, Yanfei, 161 Olive, Jennifer, 185
Mubarak, Hamdy, 163, 164 Orita, Naho, 139
Mukherjee, Arjun, 27, 39 Osborne, Miles, 147
Osenova, Petya, 179
Nagata, Masaaki, 95, 97, 99 Ostermann, Simon, 183
Nakagawa, Tetsuji, 55 Ostling, Robert, 98
Nakamura, Satoshi, 55, 121 Ou, Zhijian, 78
Nakashole, Ndapandula, 59 Ouchi, Hiroki, 82
Nakov, Preslav, 78, 124, 169
Nallapati, Ramesh, 172 Paetzold, Gustavo, 88, 89
Napoles, Courtney, 121 Palaniappan, Sucheendra, 172
Naradowsky, Jason, 27, 36 Pan, Xiaoman, 73
Narasimhan, Karthik, 102, 146 Panahloo, Zeinab Ali, 166
Nasr, Alexis, 85, 163 Pandey, Nishant, 84
Navigli, Roberto, 50, 67, 77, 168, 179 Paraboni, Ivandr, 69, 70
Nawar, Michael, 164 Paris, Cecile, 170
Neelakantan, Arvind, 52 Park, Jong, 172
Negri, Matteo, 55, 85, 95, 119 Park, Joonsuk, 126
Neto, Joo, 122 Park, MinJun, 161
Neubig, Graham, 55, 121 Parmentier, Yannick, 165
Ney, Hermann, 120 Passonneau, Rebecca, 138
Ng, Dominick, 86 Pasupat, Panupong, 110
Ng, Hwee Tou, 76 Pate, John K, 109
Ng, Vincent, 70, 72, 100, 113, 160 Patsepnia, Viachaslau, 181
197
Author Index
Paul, Apurba, 169 Rappoport, Ari, 125, 168

Pavlick, Ellie, 69, 111, 116 Rasooli, Mohammad Sadegh, 130
Pei, Wenzhe, 58, 73 Rastogi, Pushpendre, 116
Peng, Nanyun, 115, 146 Ray, Pradeep, 173
Peng, Yifan, 172 Real, Livy, 174, 179
Pereira, Lis, 183 Regalado, Ralph Vincent, 184
Persing, Isaac, 72 Reiter, Nils, 170, 171
Petrolito, Ruggero, 171 Renals, Steve, 163
Petrolito, Tommaso, 171 Resnik, Philip, 109, 116
Petrov, Slav, 58 Reue, Sebastian, 72
Pettersson, Eva, 170 Ricci, Elisa, 55
Pham, Nghia The, 67, 82 Richardson, Matthew, 56
Pham, Quang Hong, 174 Riedel, Sebastian, 27, 36
Pilehvar, Mohammad Taher, 50, 67, 77 Riedl, Martin, 88
Pinera, Rene Rose, 184 Riester, Arndt, 70
Pinkal, Manfred, 102, 183 Rigau, German, 111, 114, 185
Pinto, Claudia, 62 Rikters, Mat iss, 177
Piperidis, Stelios, 88 Riloff, Ellen, 172
Plank, Barbara, 74, 144 Rimell, Laura, 71
Polajnar, Tamara, 125 Ringlstetter, Christoph, 100
Popescu, Octavian, 27, 34 Rinott, Ruty, 116
Popescu-Belis, Andrei, 130 Rios, Miguel, 166
Post, Matt, 121 Ritter, Alan, 140, 180, 181
Potts, Christopher, 49, 176 Rocktaschel, Tim, 27, 36
Pouli, Vassiliki, 177 Rodrigues, Joao, 177
Pradhan, Sameer, 130 Rodriguez-Doncel, Victor, 179
Praet, Raf, 170 Roesiger, Ina, 70
Premkumar, Melvin Jose Johnson, 59 Roller, Roland, 100, 172
Preotiuc-Pietro, Daniel, 147 Romic, Nenad, 170
Primadhanty, Audi, 51 Rosa, Rudolf, 99
Prudhommeaux, Emily, 98 Rose, Carolyn, 140
Pu, Xiao, 130 Roth, Benjamin, 52
Pyysalo, Sampo, 88 Rothe, Sascha, 149
Rouhizadeh, Masoud, 98
Qian, Qiao, 105 Rousseau, Francois, 141
Qian, Xian, 87 Rozovskaya, Alla, 163
Qiao, Haiyan, 161 Rubinstein, Dana, 125
Qin, Bing, 72, 83 Rudnicky, Alexander, 64
Qiu, Minghui, 68 Rumshisky, Anna, 170
Qiu, Xipeng, 86, 145 Ruppert, Eugen, 88
Quattoni, Ariadna, 51 Rush, Alexander M., 108
Que, Roger, 124
Quirk, Chris, 80, 116, 168 Sgaard, Anders, 99, 112, 118, 129, 144
Saadane, Houda, 163
Rademaker, Alexandre, 174, 179 Sachan, Mrinmaya, 56
Radev, Dragomir, 63 Sachdeva, Kunal, 176
Rahimi, Afshin, 122 Sadeghi, Amir Pouya Agha, 166
Raj, Ram Gopal, 180 Safabakhsh, Reza, 166
Rajani, Nazneen Fatema, 52 Saggion, Horacio, 129
Ramakrishnan, Ganesh, 72 Saha, Sujan Kumar, 183
Ramasy, Allan, 180 Sahu, Sunil, 173
Rambow, Owen, 27, 32, 163 Sajjad, Hassan, 164
Ramesh, Arti, 49 Sakaguchi, Keisuke, 121
Ramisch, Carlos, 85 Sakti, Sakriani, 55, 121
Ranta, Aarne, 161, 165 Salameh, Mohammad, 174
Rapp, Reinhard, 166, 177 Sallab, Ahmad Al, 163
198
Author Index
Saloot, Mohammad Arshi, 180 Silic, Marin, 170

Samardzic, Tanja, 170 Silva, Joao, 177
Sanchez-Rada, J. Fernando, 179 Silva, Mario, 84
Santos, Cicero dos, 74, 124, 174 Silverstein, Kate, 51
Santos, Pedro Bispo, 169 Simmons, Reid, 70
Santus, Enrico, 179 Simonson, Dan, 185
Sarikaya, Ruhi, 64, 97, 128 Singh, Sameer, 65
Sato, Misa, 89 Siu, Amy, 172
Savova, Guergana, 172 Slonim, Noam, 116
Savva, Manolis, 49 Smith, Noah A., 58, 98, 111, 116
Sawai, Yuichiro, 129 Smucker, Mark, 123
Saxena, Ashutosh, 82 Socher, Richard, 137
Sayeed, Asad, 77 Soderland, Stephen, 73
Scannell, Kevin, 180 Sofianopoulos, Sokratis, 88
Scarton, Carolina, 89 Sogaard, Anders, 180
Schtze, Hinrich, 49, 149 Song, Runqing, 173
Schamper, Julian, 120 Song, Sanghoun, 165
Schikowski, Robert, 170 Sordoni, Alessandro, 116
Schlangen, David, 57, 64 Soroa, Aitor, 100
Schluter, Natalie, 129 Specia, Lucia, 88, 89
Schneider, Andrew, 83 Spranger, Michael, 172
Schneider, Nathan, 98 Sproat, Richard, 98
Schuler, William, 165 Srbljic, Sinisa, 170
Schwartz, Richard, 48 Sridhar, Dhanya, 51
Schwartz, Roy, 125 Stajner, Sanja, 69, 129
Schweitzer, Katrin, 88 Stalpouskaya, Katsiaryna, 185
Schwenk, Holger, 176 Stanovsky, Gabriel, 100
Seemann, Nina, 78 Stathopoulos, Yiannos, 113
Seneff, Stephanie, 173 Staudte, Maria, 128
Seo, Hyeong-Won, 166, 183 Stede, Manfred, 166
Seo, Jungyun, 128 Steedman, Mark, 103, 109
Setiawan, Hendra, 48 Steele, David, 89
Severance, Samuel J., 173 Stenetorp, Pontus, 88
Severyn, Aliaksei, 98 Stent, Amanda, 61
Shaban, Khaled Bashir, 163 Stevens, Jon, 72
Shang, Lifeng, 137 Stevenson, Mark, 100, 172, 173
Shanker, Vijay, 172 Stoll, Sabine, 170
Shao, Chao, 114 Stratos, Karl, 64, 97, 103, 128
Shao, Yan, 174 Strube, Michael, 88, 93
Sharma, Dipti, 176 Strubell, Emma, 51
Sharma, Vinita, 126 Su, Jian, 181
Sharoff, Serge, 166 Su, Jinsong, 55
Shashidhar, Vinay, 84 Su, Keh-Yih, 144
Shi, Fulin, 62 Su, Pei-Hao, 127
Shi, Tianze, 120 Subercaze, Julien, 67
Shi, Xinlei, 121 Sudoh, Katsuhito, 95
Shieber, Stuart, 108 Suero, Daniel Vila, 179
Shindo, Hiroyuki, 82, 129 Sui, Zhifang, 73, 123, 160
Shirai, Kiyoaki, 105 Sulem, Elior, 168
Shoufan, Abdulhadi, 163 Sumita, Eiichiro, 86, 120, 174
Shrestha, Ayush, 185 Sun, Chengjie, 105
Shuib, Liyana, 180 Sun, Fei, 51
Shutova, Ekaterina, 82 Sun, Le, 114
Siemoneit, Benjamin, 179 Sun, Liang, 106
Sikdar, Utpal Kumar, 181 Sun, Maosong, 120
Siklosi, Borbala, 173 Sun, Qinghua, 89
199
Author Index
Sun, Rui, 63 Trancoso, Isabel, 84, 122

Sun, Weiwei, 112 Trofimova, Darya, 185
Sun, Yizhou, 27, 29, 73 Tsagkias, Manos, 73
Sun, Zhongqian, 161 Tsai, Richard Tzong-Han, 174
Sung, Kyung-Hee, 183 Tsai, Wan-Ling, 184
Supranovich, Dmitry, 181 Tseng, Yuen-Hsien, 160, 183
Surdeanu, Mihai, 56, 89 Tsou, Benjamin K., 166
Sutskever, Ilya, 47 Tsujii, Junichi, 172
Suzuki, Jun, 97 Tsukahara, Hiroshi, 148
Sylak-Glassman, John, 124 Tsur, Oren, 139, 147
Tsuruoka, Yoshimasa, 176
Tckstrm, Oscar, 104, 110 Tsvetkov, Yulia, 95, 111
Taddy, Matt, 68 Tu, Zhaopeng, 119
Tai, Kai Sheng, 137 Tuarob, Suppawong, 169
Takeda, Hideaki, 181 Tucker, Conrad, 169
Takefuji, Yoshiyasu, 181 Turchi, Marco, 55, 76, 85, 95, 119
Tambouratzis, George, 177 Tyshchuk, Yulia, 79
Tamchyna, Ales, 168, 177
Tan, Liling, 177 Uchiumi, Kei, 148
Tan, Luchen, 123 Unger, Christina, 165
Tan, Zhiqiang, 78 Uszkoreit, Hans, 74, 88, 179, 183
Tanaka, Hideki, 183 Utiyama, Masao, 120
Tanaka, Katsumi, 74 Utsuro, Takehito, 166
Tanaka, Takaaki, 99 Vaithyanathan, Shivakumar, 61
Tandon, Niket, 82 Valenzuela-Escrcega, Marco A., 89
Tang, Buzhou, 125 Valli, Andr, 85
Tang, Duyu, 83 Van der Wees, Marlies, 120, 180
Tang, Jie, 114 Vandersmissen, Baptist, 181, 182
Tang, Shiping, 123 Vanderwende, Lucy, 80
Tang, Zhuoying, 162 Vandyke, David, 127
Tannier, Xavier, 52 Van Durme, Benjamin, 69, 111, 116
Tao, Kejia, 82 Van Santen, Jan, 98
Tasnadi, Ervin, 181 Vaswani, Ashish, 79
Teichmann, Christoph, 110 Vazirgiannis, Michalis, 141
Tellier, Isabelle, 181 Vilares, Jesus, 181
Teng, Choh Man, 172 Villalba, Martin, 128
Teng, Yonglin, 161 Vilnis, Luke, 51
Ter-Sarkisov, Alex, 176 Vinyals, Oriol, 47
Tetreault, Joel, 49, 61, 121, 182 Virk, Shafqat Mumtaz, 88
Teufel, Simone, 69, 113, 129 Viswanath, Vish, 118
Th, Muneeb, 173 Viswanathan, Vidhoon, 52
Thang, Le Quang, 112 Vlachos, Andreas, 27, 36, 61, 119, 173
Thomson, Blaise, 127 Vladimir, Klemo, 170
Thomson, Sam, 98 Vo, Nguyen, 80
Tian, Bo, 105 Vogel, Stephan, 163
Tian, Tian, 181 Voigt, Rob, 113
Tiedemann, Jorg, 174 Volk, Martin, 130
Tkachenko, Maksim, 59 Vor der Bruck, Tim, 171
Toda, Tomoki, 55, 121 Vornov, Eliana, 139
Todo, Naoya, 95 Voskarides, Nikos, 73
Toh, Zhiqiang, 181 Vossen, Piek, 185
Tokgoz, Alper, 130 Vulcu, Gabriela, 179
Tong, Roland, 161 Vulic, Ivan, 71, 125
Toussain, Yannick, 172
Toutanova, Kristina, 176 Wade, Vincent, 179
Tran, Giang, 138 Wagner, Joachim, 181
200
Author Index
Walker, Marilyn, 51 Wei, Chuyuan, 123

Wallace, Byron C., 83 Wei, Furu, 56, 100, 129
Wallace, William, 79 Wei, Si, 50, 111
Walle, Rik Van de, 181, 182 Wei, Zhongyu, 69
Wan, Ji, 141 Weikum, Gerhard, 164, 172
Wan, Stephen, 170 Weiren, Yu, 113
Wan, Xiaojun, 112, 121 Weiss, David, 58
Wang, Baoxun, 105 Weiss, Gregor, 130
Wang, Bin, 50, 78 Weissenborn, Dirk, 74
Wang, Cheng, 84 Weld, Daniel S., 65, 73
Wang, Chuan, 130 Weller, Marion, 95
Wang, Daling, 162 Wen, Miaomiao, 140
Wang, Dandan, 174 Wen, Tsung-Hsien, 127
Wang, Di, 125 Wen, Zhen, 73
Wang, Dong, 176 Werling, Keenon, 82
Wang, Fangyuan, 114 Weston, Jason, 108
Wang, Hai, 125 Wiebe, Janyce, 27, 32
Wang, Haifeng, 87, 144 Winterstein, Gregoire, 171
Wang, Hanshi, 123 Wiseman, Sam, 108
Wang, Hongning, 127 Wolfe, Travis, 116
Wang, Houfeng, 100, 129 Wolters, Maria, 172
Wang, Hua, 115 Wong, Derek F., 160
Wang, Jialei, 141 Wong, Lung Hsiang, 183
Wang, Jie, 161 Wray, Samantha, 163
Wang, Jin, 127 Wu, Cathy, 172
Wang, Jingjing, 68 Wu, Chun-Kai, 174
Wang, Jingwen, 161 Wu, Hua, 144
Wang, Jinpeng, 118 Wu, Jian, 117
Wang, Jundong, 184 Wu, Jian-Cheng, 89
Wang, Lihong, 50 Wu, Juan, 102
Wang, Linlin, 74 Wu, Shih-Hung, 183
Wang, Mingxuan, 47, 137 Wu, Shuangzhi, 64
Wang, Nan, 174 Wu, Xiupeng, 184
Wang, Peng, 114 Wu, Yonghui, 173
Wang, Qiang, 89 Wu, Yunong, 161
Wang, Quan, 50 Wu, Zhaohui, 89
Wang, Rong, 84 Wulff, Julie, 180
Wang, Shao-Yu, 161 Wurzer, Dominik, 147
Wang, Shichang, 160
Wang, Tong, 117, 118 Xia, Fei, 106, 170
Wang, William Yang, 59, 64 Xia, Rui, 51, 84
Wang, Xiaolin, 174 Xia, Yunqing, 80
Wang, Xiaolong, 105, 125, 184 Xiang, Bing, 74, 97
Wang, Xin, 105 Xiang, Yang, 184
Wang, Xuan, 67 Xiao, Min, 119
Wang, Xuzhong, 114 Xiao, Tong, 89
Wang, Yang, 162 Xie, Weijian, 161, 162
Wang, Yaqi, 162 Xie, Yusheng, 122
Wang, Yih-Ru, 160 Xie, Zehua, 121
Wang, Yu-Chun, 174 Xing, Eric, 56, 103
Wang, Yushi, 104 Xinhui, Geng, 177
Wang, Zhiguo, 86 Xiong, Deyi, 55, 168
Wang, Zhiqiang, 102 Xiong, Jinhua, 160
Wang, Zhongqing, 126, 161 Xu, Bo, 114, 128
Watanabe, Taro, 47, 86 Xu, Feiyu, 74, 88, 179, 183
Weerkamp, Wouter, 73, 120 Xu, Hua, 173
201
Author Index
Xu, Jiaming, 114 Yepes, Antonio Jimeno, 173

Xu, Jinan, 168, 174 Yeung, Chak Yan, 121
Xu, Jun, 51, 109, 173 Yih, Scott Wen-tau, 176
Xu, Ke, 56, 80 Yih, Wen-tau, 104
Xu, Kui, 123 Yin, Wenpeng, 49
Xu, Kun, 98 Yogatama, Dani, 100, 111
Xu, Liheng, 52, 75, 161 Yosef, Mohamed Amir, 164
Xu, MingBin, 118 Yoshimi, Takehiko, 166
Xu, Ruifeng, 67, 161 You, Jinseon, 172
Xu, Wei, 86, 180, 181 Young, Steve, 127
Xu, Wenduan, 99 Yu, Dian, 79
Xu, Xiaoying, 160 Yu, Dianhai, 144
Xu, Ying, 100, 174 Yu, Heng, 95
Xu, Yuhong, 184 Yu, Jianfei, 97
Xue, Nianwen, 86, 113, 130 Yu, Kai-Hsiang, 184
Yu, Liang-Chih, 127, 160, 183
Yagcioglu, Semih, 71 Yu, Mo, 115
Yamada, Ikuya, 181 Yu, Yue, 173
Yamamoto, Kazuhide, 131 Yuan, Bin, 176
Yamamoto, Mikio, 166 Yuan, Jiahong, 160
Yan, Hongfei, 118 Yuan, Yulin, 161
Yan, Rui, 122 Yung, Frances, 160, 179
Yan, Tian, 161
Yan, Yaowei, 68 Zabokrtsky, Zdenek, 99
Yanai, Kohsuke, 89 Zadrozny, Bianca, 124
Yanase, Toshihiko, 89 Zafarian, Atefeh, 166
Yang, Bishan, 83 Zaghouani, Wajdi, 163, 164
Yang, Cheng-Han, 160 Zaheer, Manzil, 78
Yang, Diyi, 140 Zaremba, Wojciech, 47
Yang, Eun-Suk, 181 Zbib, Rabih, 48
Yang, Haitong, 85 Zeng, Daojian, 52
Yang, Jian, 174 Zeng, Xiaodong, 160
Yang, Jie, 118 Zerrouki, Taha, 164
Yang, Min, 68 Zerva, Chrysoula, 172
Yang, Ping-Che, 183 Zervanou, Kalliopi A., 170
Yang, Ren-Dar, 183 Zesch, Torsten, 169
Yang, Sen, 161 Zettlemoyer, Luke, 104, 110
Yang, Wei, 161 Zhai, Junjie, 121
Yang, Wenjing, 161 Zhang, Boliang, 73
Yang, Xiaohui, 174 Zhang, Congle, 73
Yang, Xudong, 121 Zhang, Dakui, 123
Yang, Yaqin, 113 Zhang, Dongdong, 64
Yang, Yezhou, 75 Zhang, Dongxu, 176
Yang, Yi, 65, 77 Zhang, Fan, 123
Yang, Yinfei, 68 Zhang, Hao, 161
Yao, Junfeng, 55 Zhang, Haotian, 123
Yao, Lei, 174 Zhang, Heng, 114
Yao, Xuchen, 69 Zhang, Jingyi, 120
Yao, Yao, 160 Zhang, Kunpeng, 122
Yarowsky, David, 87, 124 Zhang, Meishan, 63
Ye, Dashu, 162 Zhang, Mengdi, 114
Ye, Weiping, 160 Zhang, Min, 55, 95, 148, 174
Yeh, Chan Kun, 184 Zhang, Muyu, 72
Yeh, Jui-Feng, 184 Zhang, Qiao, 160
Yen, Tzu-Hsi, 89 Zhang, Sheng, 98
Yener, Bulent, 73 Zhang, ShiLiang, 118
202
Author Index
Zhang, Shuiyuan, 160

Zhang, Xiangrong, 173
Zhang, Xinrui, 161
Zhang, Xue-jie, 127
Zhang, Yan, 118
Zhang, Yaoyun, 173
Zhang, Yating, 74
Zhang, Yifei, 162
Zhang, Yongdong, 141
Zhang, Yue, 63, 87, 148, 160
Zhang, Yujie, 168, 174
Zhang, Zhenzhong, 114
Zhang, Zhifei, 161
Zhao, Dongyan, 98
Zhao, Hai, 120
Zhao, Jun, 52, 56, 65, 74, 75
Zhao, Kai, 27, 38
Zhao, Tiejun, 64
Zhao, Xinru, 160
Zhao, Yinchen, 184
Zhao, Yue, 183
Zhao, Zhendong, 109
Zheng, Mao, 72
Zheng, Naijia, 183
Zhi, Qiyu, 161
Zhou, Bowen, 74, 97
Zhou, Guangyou, 56
Zhou, Guilong, 162
Zhou, Guodong, 68, 75, 84, 126
Zhou, Hao, 87
Zhou, Hongzhao, 161
Zhou, Huiwei, 62
Zhou, Jie, 86
Zhou, Keira, 127
Zhou, Ming, 56, 64, 100, 129
Zhou, Xiaoqiang, 125
Zhu, Chenxi, 86, 145
Zhu, Hongtao, 161
Zhu, Huaiyu, 61
Zhu, Jingbo, 89
Zhu, Muhua, 89
Zhu, Qiaoming, 75
Zhu, Xiaodan, 50, 103
Zhu, Xiaolin, 161
Zhu, Xiaoyan, 105
Zhu, Xuan, 95, 105
Zhuang, Tao, 85
Ziabary, Mohammadzadeh, 166
Zirikly, Ayah, 115
Zong, Chengqing, 85, 135
Zou, Bowei, 75
Zweig, Geoffrey, 71
Zweigenbaum, Pierre, 166
203
Local Guide
204
Local Guide
Local Guide
9
The Association for Computational Linguistics and the Asian Federation of Natural Language
Processing (ACL-IJCNLP 2015) will be held in Beijing, China this year. Here are some things that I
think might be useful or enjoyable for visiting computational linguists, natural language processing
people, and the like.
205
Currency The official name for the currency of China is Renminbi (RMB). It is
denominated into Yuan () or Kuai (). Foreign currency can be exchanged for RMB
at airports, banks and hotels. Major credit cards are honored at most hotels. Banks usually
open at 9:00 in the morning and close at 17:00 in the afternoon all working days.
Electricity Electricity is supplied at 220V, 50Hz AC throughout China. Major hotels

usually provide 115V outlet for razor.
Smoking Smoking is not allowed in the conference venues, nor in any public indoor
establishments such as restaurants and bars.
Weather Summer clothes such as shorts and dresses are enough for this time in Beijing.
The sunlight may be very strong in the afternoon, so do prepare some sun-tan oil, lotion,
cream if you are going to go outdoors. You may also pack a raincoat or umbrella for any
sudden rain during travel.
Beijing Average Climate by Month
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
High() 1.1 3.9 11.1 19.4 26.1 30.0 30.6 29.4 25.6 18.9 10.0 2.8
Low() -9.4 -7.2 -1.1 7.2 12.8 17.8 21.1 20.0 13.9 7.2 -0.6 -7.2
Precip(mm) 0.1 0.2 0.4 1 1.1 2.8 6.9 7.2 1.9 0.7 0.2 0.1
High() 34 39 52 67 79 86 87 85 78 66 50 37
Low() 15 19 30 45 55 64 70 68 57 45 31 19
Transportation
1. Airport to CNCC
Beijing Subway (Airport Express Line) You can take the Airport Express Line
runs from Terminal 3/Terminal 2 to Sanyuanqiao () station, then take subway line
10 to Beitucheng () station, and transfer subway line 8 to the Olympic Sports
Center () station.
Airport Taxi The legitimate taxis form a long queue outside the Arrival Hall, but taxis
move quickly so you won't wait long. At the head of the line a dispatcher will give you
your taxi's number, which is useful in case of complaints. The charge will be at least
100CNY, but pay according to the meter, which includes an expressway toll of 15CNY.
After 23:00, you will pay more. You can show the below information to taxi driver:

Please take me to China National Convention Center (CNCC)
7
Address: No.7 Tianchen East Road, Olympic District, Beijing
Airport Shuttle Bus The airport shuttle runs every 30 minutes from early 5:30 to
20:00, and cover different routes, including Asian Games Village (Anhui Bridge which
is close to CNCC). It costs 25 yuan (about $4).
2. Airport to Hotels
CNCC Grand Hotel Take Airport Express, then transfer to Subway Line 8 and get
off at Olympic Park, and walk west about 300 meters. You can also get to CNCC Grand
Hotel by taxi and you can show the below information to taxi driver:

Please take me to China National Convention Center Grand Hotel
8
Address: No.8 Beichen West Road, Chaoyang District, Beijing
InterContinental Beijing Beichen Hotel Take Airport Express, then transfer to

Subway Line 8 and get off at Olympic Park, and walk west about 300 meters. You can
also get to InterContinental Beijing Beichen Hotel by taxi and you can show the below
information to taxi driver:

Please take me to InterContinental Beijing Beichen Hotel
8
Address: No.8 Beichen West Road, Chaoyang District, Beijing
Best Western OL Stadium Hotel Take Airport Bus Line 4 (Beijing Capital
International Airport - Gongzhufen) and get off at Madian Bridge station. Then transfer
to bus 315 and get off at Beishatan Bridge station. You can also get to Best Western OL
Stadium Hotel by taxi and you can show the below information to taxi driver:

Please take me to Best Western OL Stadium Hotel

Address: No.1 Datun Road, Beishatan Chaoyang District, Beijing
3. Hotels to CNCC
CNCC Grand Hotel From the hotel is about 10-minutes walk away to the
Conference venue.
InterContinental Beijing Beichen Hotel From the hotel is about 10-minutes walk
away to the Conference venue.
Best Western OL Stadium Hotel From the hotel is about 20-minutes walk away to
the Conference venue. The local organizer will also offer shuttle bus service between
Best Western OL Stadium Hotel and CNCC from July 27 to July 29. Below is the time
table of the shuttle bus:
08:20 08:35 Best Western OL Stadium Hotel CNCC

July 27
20:50 21:10 CNCC Best Western OL Stadium Hotel
08:30 08:45 Best Western OL Stadium Hotel CNCC
July 28
21:10 21:30 CNCC Best Western OL Stadium Hotel
July 29 08:30 08:45 Best Western OL Stadium Hotel CNCC
4. City Transport
Subway During the build up to the 2008 Olympics, Beijing's Subway has been
extensively developed from 2 lines to 14 lines. Please see the Subway Map for detail.
City Public Buses They run from 5:30 till 23:00 daily. Taking buses in Beijing is
cheap, but less comfortable than a taxi or the subway. The flat rate for a tram or
ordinary public bus is 2 yuan. Buses equipped with air-conditioning or of a special line
are charged according to the distance. Having your destination in Chinese characters
will help. When squeezing onto a crowded bus take care of your wallet, etc. Minibuses,
running from 7:00 to 19:00, charge the flat rate of 2 yuan guaranteeing a seat. They are
faster and more comfortable.
Taxis Though Beijing does suffer from congestion, its taxi drivers will find the fastest
way to your destination. Bring the name of your destination in Chinese characters if
your spoken Chinese is not good. Pedi-cab is a good choice for sightseeing, especially
for visiting the narrow Hutongs. You will find pedi-cabs on the street. You should agree
on a price with the driver before starting the journey. Legally registered pedi-cabs can
be identified by a certificate attached to the cab and the driver has a card hanging
around his neck.
Bicycle China used to be called the sea of bicycles and in Beijing today the bike is
still a convenient vehicle for most people. Renting a bike may be a better way for you to
see this city at your own pace. Bikes can be hired from many hotels for 2030 yuan/day.
A deposit will be required. You can also rent bikes at some bicycle shops for repairing
bikes and inflating tires. Their charge for renting bikes there is lower as the bikes are
not as new. When needed, you park your bike in a bike park, which can be easily
identified by the large amount of bikes on roadsides. The charge is about 1 yuan.
Culture and Tour
Over the 5,000 years history of China, Beijing has been the capital for many dynasties,
Jin, Yuan, Ming, Qing, etc. It has many historical attractions. As the capital of China,
Beijing is also a modern city with 20 million populations. There are abundant place to
visit in Beijing.
Wangfujing () Olympic Green () Temple of Heaven ()

Wangfujing in Beijing is a well- China National Convention First constructed in 1420, it was
known golden commercial Center is in the Olympic the place where the emperors of
district. Shopping in Wangfujing Village. You can walk to the Ming and Qing dynasties
Street has been listed in many Birds Nest (National Stadium) would worship the god of
travelers' schedule. and Water Cube. heaven and pray for good
harvest.
From To Transportation
Forbidden City/
Subway: Line 8 (Beitucheng station) Line 10 (Guomao station)
CNCC Tiananmen Line 1 (get off at Tiananmen East station).
Square
Subway: Line 8 (Beitucheng station) Line 10 (Haidian
Huangzhuang station) Line 4(get off at Beigongmen station.
CNCC Summer Palace Beigongmen means the North Palace Gate)
Public Bus: No. 594 to Beigongmen station.
Temple of Subway: Line 8 (Beitucheng station) Line 10 (Huixinxijie
CNCC Heaven Nankou station) Line 5(get off at Tiantan Dongmen station).
Bus: You can get to Deshengmen Bus Station by taxis. Then Bus
No. 877, 919 and 880 are available from Deshengmen Bus Station
to the Great Wall.
CNCC The Great Wall Train: subway Line 8 (Beitucheng station) Line 10
(Zhichunlu station) Line 13(get off at Xizhimen station), then
you can take S2 series trains from Beijing North Railway Station
(nearby Xizhimen station) to Badaling Railway Station.
Subway: Line 8 (Beitucheng station) Line 10 (Guomao station)
CNCC Wangfujing Line 1 (get off at Tiananmen East station).
The Great Wall of China () The Great

Wall built from Qin Dynasty (~2200 years ago),
8000+km long. 'He who does not reach the Great
Wall is not a true man ()', as this
famous Chinese saying goes, the Great Wall of
China, a great engineering marvel, always attracts
throngs of adventurous tourists from all over the
world.
Forbidden City () The former imperial

palace which was the home to twenty-four Chinese
emperors over 491 years between 1420 and 1911.
The Forbidden City is now known as the Palace
Museum and is open to visitors. It has 9,999 rooms
- a room being the space between four pillars. The
well guarded palace is surrounded by a moat 3,800
metres long and 52 metres wide.
Summer Palace () It was a royal palace

in later Qing Dynasty, only secondary to the
Forbidden City. Summer Palace is actually not just
a royal palace where once Princess Dowager Cixi
and the emperor lived and handled court affairs,
accepted laudations and received foreign diplomats.
It also epitomizes classical Chinese architecture, in
terms of both garden design and construction. It is
the largest royal garden in Beijing.
Dining Routes Reference
There are three food courts near CNCC.
Sihai Hyatt Golden Spring

Food Palace Food Palace
CNCC
Xin Ao Shopping
Center
1. Xin Ao Shopping Center
Route Walk 200 meters straight (the east direction) from the front door of the CNCC,
and then go downstairs
Restaurant Adress Characteristic Per capita Telephone Logo
& Logo consumption
NEW H1-61 American 80-300RMB 010-8437
YORKER Cafeteria 8485
Food court H1-62 Chinese Fast 20-50RMB 010-
Food 84374628
YOSHINOY H1-63-01 Japanese food 30-50RMB 010-

A 84371107
HuaTian H1-59-01 Korean food 20-50RMB 010-
YANJI 84378265
Restaurant
Qing-feng H1-59-02 steamed stuffed 15-50RMB 010-

Steamed buns 84377200
Dumping
Shop
ORIENT H1-45-01 Dumplings 30-100RMB 010-
KING OF 84377368
DUMPLING
S
SUBWAY H1-49 American fast 20-50RMB 010-
food 84377920
Element H.K H1-50-01 Hong Kong 50-100RMB 010-

Style Tea 84374018
Restaurant
xiabu xiabu H1-52-01 Hot pot 50-100RMB 010-

84370951
YI WAN JU H1-53 Peking noodles 30-100RMB 010-

84377373
Hollywood H1-54 fast food with 20-50RMB 010-

varies styles 84377393
YOU WO H1-55 Sichuan Cuisine 50-100RMB 010-

FLAVOR 84374286
FOOD
HU GUO SI H1-58-02 Beijing local 20-100RMB 010-
XIAO CHI snacks 84377426
PAO PAO H1-44 Hot pot 30-50RMB 135224046
33
McDonald's H1-42 Fast food like 20-50RMB 010-
Hamburgers and 84374541
fries
The PIZZA H1-43-01 pizza 40-100RMB 010-
Company 84377498
BELLA H1-41 Italian food like 40-100RMB 010-

VISTA spaghettie and 84377305
Restaurant pizza
French H1-45-02 French themed 50-150RMB 010-
Teppanyaki restaurant 84378469
NA DU H1-60 Hot-spicy pot 50-100RMB 010-

84377432
CLASSICS H1-50-02 Classical 20-40RMB 010-

FOODS Chinese Food 84378408
Leisure H1-43-02 Japanese sushi 50-100RMB 010-

Restaurant and food 84377321
LONG QI H1-47 Japanese food 80-150RMB 010-

84370198
XIAO DIAO H1-93-03 Beijing Food 80-150RMB 010-
LI TANG 84437260
HaoShangHa H1-94-01 beefsteak 70-150RMB

o
BIAN JING H1-93-02 Southeast Asian 80-150RMB 010-

delicacies 84437159
XI LE DOU H1-04 bakery 20-50RMB 010-

84378210
SEVENANA H1-63-02 Curry theme 30-50RMB 010-

restaurant 84377962
SHUN KOU H1-52-02 Traditional 20-50RMB 010-
LIU Chinese food 84377442/7
443
XIANGFEI H1-58-01 Chinese fast 20-50RMB 010-
FASTFOOD food, famous for 84378735
roast chicken
2. Golden Spring Food Palace
Restaurant Adress Characterist Per capita Telephone Logo

ic consumptio
n
HUI MAN 1F, No 3 Anhui 70-100RMB 010-
XIANG Cuisine 84874648
BEIJING 3F, Beijing 100RMB 010-
BIANYIFA No.3(from Roast Duck 84855865
NG ROAST No.2 elevator
DUCK to enter)
CAFE KU 1F/2F,No.(fr Coffee, 50-100RMB 010-
om No.3 Italian food, 84855983
elevator to beef, salad
enter)
SORABOL 1F,No.8 Korean 50-100RMB 010-
cuisine 84855532
HaiDiLao 4F,No.8(fro Famous hot 100RMB 010-
HotPot m No.1 pot 84855758
elevator to
enter)
FENG 2F,No.8(fro Korean skew 50-100RMB 010-
SHENG m No.4 er 84855687
MAO elevator to
enter)
Luo Luo Fish 1F,No.11 Fish Hot Pot 50-100RMB 010-
Hot Pot 84855007
Dong Lai 2F,No.12(fro Hot pot 150RMB -
Shun m No.5
Restaurant elevator to
enter)
YI PIN 1F,No.13 Small 30-50RMB 010-
XIAO steamed bun 84873799
LONG
CHUAN 1F,No.16 Spicy hot pot 50-100RMB 010-
CHENG 84874840
YUAN
DA QING 3F,No.16(fro Chinese 50-100RMB 010-
HUA m No.6 dishes 84874545
elevator to
enter)
JIANG TAI 1F,No.17-18 Creative 140RMB 010-
WU ER restaurant, 84855900
sea food
MIAN GU 1F,No.19 noodles 30-50RMB 010-
XIANG 84873244
RUN SHI 1F,No.21 Chinese 50-100RMB 010-
SHANG dishes 84855168
3. Sihai Hyatt Food Palace
Restaurant Adress Characterist Per capita Telephone Logo

ic consumptio
n
HE SHU 2F Hot pot 50-80RMB 010-64830600
XIANG
HOU QI SHI 2F Skewer 50-80RMB 010-64837178
DAI
ZHAI NIU 2F Hot pot 50-80RMB 010-64861391
CHANG
BING 2F Skewer 50-80RMB 010-64861392
CHENG
CHUAN BA
SI CHUAN 2F Sichuan Hot 50-80RMB 010-64101390
REN HUO pot
GOO
Liu Po Hot 1F Hot Pot 50-80RMB 010-64861396
Pot
ADAAMA 1F ramen 15-30RMB -
ramen
YUAN ZHI 1F Chinese fast 15-30RMB -
Chinese food food
YANG YI 1F spicy 15-30RMB 010-59432931
REN hotchpotch
HUAS 1F Hong Kong 50-100RMB 010-58460008
DIMSUM snacks
JING WEI 1F dumplings 15-50RMB -
JIAO ZIX
ZAI TAI YU B1 fishes 50-100RMB 010-64837176
HAN SHI B1 Korean 30-80RMB 010-64837177
KAO ROU barbecue
CHUAN XI 1F Sichuan 30-80RMB 010-64861398
YIN XIANG cuisine
CNCC Floor 3
CNCC Floor 4
ACL-IJCNLP 2015 gratefully acknowledges the following sponsors for their support:
Platinum Sponsors____________________________________
Gold Sponsors_________________________________________
Silver Sponsors_________________________________________
Bronze Sponsor____________ Best Paper Sponsor____________
Sponsor of Oversea Student Fellowship _____________________

Acl 2015 Handbook

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Acl 2015 Handbook

Uploaded by

Copyright:

Available Formats

CNCC Grand Hotel

Best Western OL CNCC

2 Tutorials: Sunday, July 26 27

3 Main Conference: Monday, July 27 45

4 Main Conference: Tuesday, July 28 91

5 Main Conference: Wednesday, July 29 133

6 Colocated conference: CoNLL: ThursdayFriday, July 3031 151

7 Workshops: ThursdayFriday, July 3031 159

8 Anti-harassment policy 187

Author Index 189

9 Local Guide 205

Local Supporters (sorted by A-Z)

IN MEMORIAM: ADAM KILGARRIFF

Obituary written by Roger Evans

IN MEMORIAM: JANE J. ROBINSON

Yuji Matsumoto, Nara Institute of Science and Technology

ACL-IJCNLP 2015 Program Co-Chairs

Program Committee Co-chairs

Date Time Item Location

7.26 Lunch 11:30-13:30 Chinese bento Function Hall B

7.26 Lunch 11:30-13:30 Western bento Function Hall B

7.26 Dinner 18:00-21:00 Buffet Ballroom C

7.27 Lunch 11:30-13:30 Student Lunch Function A+B

7.27 Lunch 11:30-13:30 Chinese bento 306AB/307AB

7.27 Lunch 11:30-13:30 Western bento 308

7.27 Dinner Buffet+Poster Plenary Hall B

7.28 Lunch 11:30-13:30 Chinese bento 306AB/307AB

7.28 Lunch 11:30-13:30 Western bento 308

7.28 Dinner Buffet+Poster Plenary Hall B

7.29 Lunch 11:30-13:30 Chinese bento 307AB

7.29 Lunch 11:30-13:30 Western bento 308

7.30 Lunch 11:30-13:30 Chinese bento 403/405

7.30 Lunch 11:30-13:30 Western bento 406

7.31 Lunch 11:30-13:30 Chinese bento 403/405

7.31 Lunch 11:30-13:30 Western bento 406

7:30 18:00 Registration 3rd floor

Structured Belief Propagation for NLP 302A

Corpus Patterns for Semantic Processing 305

10:30 11:00 Coffee break

Scalable Large-Margin Structured Learning: Theory and Algorithms 302A

Detecting Deceptive Opinion Spam using Linguistics, Behavioral and Statistical

15:30 16:00 Coffee break

18:00 21:00 Welcome Reception Ballroom C

Successful Data Mining Methods for NLP

Jiawei Han, Heng Ji, and Yizhou Sun

Sunday, July 26, 2015, 9:0012:30

Structured Belief Propagation for NLP

Matthew R. Gormley and Jason Eisner

Sunday, July 26, 2015, 9:0012:30

Owen Rambow and Janyce Wiebe

Sunday, July 26, 2015, 9:0012:30

Corpus Patterns for Semantic Processing

Patrick Hanks, Elisabetta Jezek, Daisuke Kawahara, and Octavian Popescu

Sunday, July 26, 2015, 14:0017:30

and paraphrasing generation etc.

Matrix and Tensor Factorization Methods for Natural Language

Sunday, July 26, 2015, 14:0017:30

Scalable Large-Margin Structured Learning: Theory and

Liang Huang and Kai Zhao

Sunday, July 26, 2015, 14:0017:30

Detecting Deceptive Opinion Spam using Linguistics, Behavioral and

Sunday, July 26, 2015, 14:0017:30

What You Need to Know about Chinese for Chinese Language

Sunday, July 26, 2015, 14:0017:30