Park, SoYoung; Song, YoungIn; Rim, HaeChang
In this paper, we propose a segmentbased annotation tool providing appropriate interactivity between a human annotator and an automatic parser. The proposed annotation tool provides the preview of a complete sentence structure suggested by the parser, and updates the preview whenever the annotator cancels or selects each segmentation point. Thus, the annotator can select the proper sentence segments maximizing parsing accuracy and minimizing human intervention. Experimental results show that the proposed tool allows the annotator to be able to reduce human intervention by approximately 39% compared with manual annotation. Sejong Korean treebank, one of the large scale treebanks, was constructed with the proposed annotation tool.
Bond, Francis; Fujita, Sanae; Tanaka, Takaaki
In this paper we describe the current state of a new Japanese lexical resource: the Hinoki treebank. The treebank is built from dictionary definitions, examples and news text, and uses an HPSG based Japanese grammar to encode both syntactic and semantic information. It is combined with an ontology based on the definition sentences to give a detailed sense level description of the most familiar 28,000 words of Japanese.
Theune, Mariët; Hielkema, Feikje; Hendriks, Petra
4 Citations
This article describes the generation of aggregated and elliptic sentences, using Dependency Trees connected by rhetorical relations as input. The system we have developed can generate both hypotactic and paratactic constructions with appropriate cue words, and various forms of ellipsis such as Gapping and Conjunction Reduction. We contend that Dependency Trees connected by rhetorical relations are excellent input for a generation system that has to generate ellipsis, and we propose a taxonomy of the most common Dutch cue words, grouped according to the kind of discourse relations they signal. Finally, we argue that syntactic aggregation should be performed in the Surface Realizer of a language generation system, because it requires access to languagespecific syntactic information.
Xue, Nianwen
1 Citations
We describe a Chinese lexical semantic resource that consists of 11,765 predicates (mostly verbs and their nominalizations) analyzed with coarsegrained senses and semantic roles. We show that distinguishing senses at a coarsegrained level is a necessary part of specifying the semantic roles and describe our strategies for sense determination for purposes of predicateargument structure specification. The semantic roles are postulated to account for syntactic variations, the different ways in which the semantic roles of a predicate are realized. The immediate purpose for this lexical semantic resource is to support the annotation of the Chinese PropBank, but we believe it can also serve as stepping stone for higherlevel semantic generalizations.
Ji, Donghong; He, Yanxiang; Xiao, Guozheng
1 Citations
In this paper, we propose a word sense learning algorithm which is capable of unsupervised feature selection and cluster number identification. Feature selection for word sense learning is built on an entropybased filter and formalized as a constraint optimization problem, the output of which is a set of important features. Cluster number identification is built on a Gaussian mixture model with a MDLbased criterion, and the optimal model order is inferred by minimizing the criterion. To evaluate closeness between the learned sense clusters with the groundtruth classes, we introduce a kind of weighted Fmeasure to model the effort needed to reconstruct the classes from the clusters. Experiments show that the algorithm can retrieve important features, roughly estimate the class numbers automatically and outperforms other algorithms in terms of the weighted Fmeasure. In addition, we also try to apply the algorithm to a specific task of adding new words into a Chinese thesaurus.
Nguyễn, Thị Minh Huyền; Romary, Laurent; Rossignol, Mathias; Vũ, Xuân Lương
3 Citations
Only very recently have Vietnamese researchers begun to be involved in the domain of Natural Language Processing (NLP). As there does not exist any published work in formal linguistics nor any recognizable standard for Vietnamese word definition and word categories, the fundamental tasks for automatic Vietnamese language processing, such as partofspeech tagging, parsing, etc., are very difficult tasks for computer scientists. The fact that all necessary linguistic resources have to be built from scratch by each research team is a real obstacle to the development of Vietnamese language processing. The aim of our projects is thus to build a common linguistic database that is freely and easily exploitable for the automatic processing of Vietnamese. In this paper, we present our work on creating a Vietnamese lexicon for NLP applications. We emphasize the standardization aspect of the lexicon representation. We especially propose an extensible set of Vietnamese syntactic descriptions that can be used for tagset definition and morphosyntactic analysis. These descriptors are established in such a way as to be a reference set proposal for Vietnamese in the context of ISO subcommittee TC 37/SC 4 (Language Resource Management).
Bao, Jun Peng; Lyon, Caroline; Lane, Peter C. R.
7 Citations
The Ferret copy detector has been used since 2001 to find plagiarism in large collections of students’ coursework in English. This article reports on extending its application to Chinese, with experiments on corpora of coursework collected from two Chinese universities. Our experiments show that Ferret can find both artificially constructed plagiarism and actually occurring, previously undetected plagiarism. We discuss issues of representation, focus on the effectiveness of a subsymbolic approach, and show that Ferret does not need to find word boundaries first.
Ohno, Tomohiro; Matsubara, Shigeki; Kashioka, Hideki; Maruyama, Takehiko; Tanaka, Hideki; Inagaki, Yasuyoshi
2 Citations
Spoken monologues feature greater sentence length and structural complexity than spoken dialogues. To achieve highparsing performance for spoken monologues, simplifying the structure by dividing a sentence into suitable language units could prove effective. This paper proposes a method for dependency parsing of Japanese spoken monologues based on sentence segmentation. In this method, dependency parsing is executed in two stages: at the clause level and the sentence level. First, dependencies within a clause are identified by dividing a sentence into clauses and executing stochastic dependency parsing for each clause. Next, dependencies across clause boundaries are identified stochastically, and the dependency structure of the entire sentence is thus completed. An experiment using a spoken monologue corpus shows the effectiveness of this method for efficient dependency parsing of Japanese monologue sentences.
Hashimoto, Chikara; Sato, Satoshi; Utsuro, Takehito
1 Citations
Detecting idioms in a sentence is important to sentence understanding. This paper discusses the linguistic knowledge for idiom detection. The challenges are that idioms can be ambiguous between literal and idiomatic meanings, and that they can be “transformed” when expressed in a sentence. However, there has been little research on Japanese idiom detection with its ambiguity and transformations taken into account. We propose a set of linguistic knowledge for idiom detection that is implemented in an idiom dictionary. We evaluated the linguistic knowledge by measuring the performance of an idiom detector that exploits the dictionary. As a result, more than 90% of the idioms are detected with 90% accuracy.
Dasgupta, Sajib; Ng, Vincent
11 Citations
Unsupervised morphological analysis is the task of segmenting words into prefixes, suffixes and stems without prior knowledge of languagespecific morphotactics and morphophonological rules. This paper introduces a simple, yet highly effective algorithm for unsupervised morphological learning for Bengali, an Indo–Aryan language that is highly inflectional in nature. When evaluated on a set of 4,110 humansegmented Bengali words, our algorithm achieves an Fscore of 83%, substantially outperforming Linguistica, one of the most widelyused unsupervised morphological parsers, by about 23%.
Rudeanu, Sergiu; Simovici, Dan A.
We study ranges of algebraic functions in lattices and in algebras, such as ŁukasiewiczMoisil algebras which are obtained by extending standard lattice signatures with unary operations.We characterize algebraic functions in such lattices having intervals as their ranges and we show that in Artinian or Noetherian lattices the requirement that every algebraic function has an interval as its range implies the distributivity of the lattice.
Bhattacharyya, Pushpak; Chakrabarti, Debasri; Sarma, Vaijayanthi M.
3 Citations
Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major IndoEuropean languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constructing a verb knowledge base for Hindi, which arranges the Hindi verbs in a hierarchy of isa (hypernymy) relation. We realized that there are unique Indian language phenomena that bear upon the lexicalization vs. syntactically derived choice. One such example is the occurrence of conjunct and compound verbs (called Complex Predicates) which are found in all Indian languages. This paper presents our experience in the construction of lexical knowledge bases for Indian languages with special attention to Hindi. The question of storing versus deriving complex predicates has been dealt with linguistically and computationally. We have constructed empirical tests to decide if a combination of two words, the second of which is a verb, is a complex predicate or not. Such tests provide a principled way of deciding the status of complex predicates in Indian language wordnets.
Repp, Sophie
4 Citations
The paper shows that in gapping sentences where a negative marker in the first conjunct takes wide scope over the whole coordination, the negation obligatorily operates on the level of the speech act rather than on the level of the proposition. In assertions, this is denial negation, and in questions, outer negation. The negation operating on the level of the speech act is argued to be an instantiation of the degrees of strength that are associated with the sincerity conditions of a speech act, which is a feature that it shares with VERUM focus and certain epistemic adverbs. Syntactically, this negation is situated higher than propositional negation, viz. in the CP of the clause. This suggests that gapping with wide scope negation is fundamentally different from ‘ordinary’ gapping which always involves propositional negation.
Hoeksema, Jack
4 Citations
Pseudogapping is often treated as a combination of movement and ellipsis in current generative work. After reviewing a number of arguments against this type of analysis, I argue for an interpretive approach to pseudogapping, following Miller (1990). On the basis of corpus data and informant judgments, I then proceed to outline two factors, syntactic context and remnant type, which affect the acceptability of pseudogapping. The effects of the two factors are gradient and cumulative, in the sense of Keller (2000).
Spenader, Jennifer; Hendriks, Petra
What knowledge sources are necessary in the interpretation and generation of ellipsis? After a short background on earlier approaches we compare and discuss each of the four papers selected for this special issue, examining how they approach ellipsis generation or interpretation. We highlight areas where more research needs to be done: outlining how pragmatics affects ellipsis, empirical studies, and theoretical work on what the effect of ellipsis is in context.
Murata, Masaki; Ma, Qing; Uchimoto, Kiyotaka; Kanamaru, Toshiyuki; Isahara, Hitoshi
1 Citations
This paper describes experiments carried out utilizing a variety of machinelearning methods (the knearest neighborhood, decision list, maximum entropy, and support vector machine), and using six machinetranslation (MT) systems available on the market for translating tense, aspect, and modality. We found that all these, including the simple stringmatchingbased knearest neighborhood used in a previous study, obtained higher accuracy rates than the MT systems currently available on the market. We also found that the support vector machine obtained the best accuracy rates (98.8%) of these methods. Finally, we analyzed errors against the machinelearning methods and commercially available MT systems and obtained error patterns that should be useful for making future improvements.
Benthem, J. van; Bezhanishvili, G.; Cate, B. ten; Sarenac, D.
14 Citations
We introduce the horizontal and vertical topologies on the product of topological spaces, and study their relationship with the standard product topology. We show that the modal logic of products of topological spaces with horizontal and vertical topologies is the fusion S4 ⊕ S4. We axiomatize the modal logic of products of spaces with horizontal, vertical, and standard product topologies.We prove that both of these logics are complete for the product of rational numbers ℚ × ℚ with the appropriate topologies.
Aczel, Peter; Crosilla, Laura; Ishihara, Hajime; Palmgren, Erik; Schuster, Peter
5 Citations
Working in the weakening of constructive ZermeloFraenkel set theory in which the subset collection scheme is omitted, we show that the binary re.nement principle implies all the instances of the exponentiation axiom in which the basis is a discrete set. In particular binary re.nement implies that the class of detachable subsets of a set form a set. Binary re.nement was originally extracted from the fullness axiom, an equivalent of subset collection, as a principle that was su.cient to prove that the Dedekind reals form a set. Here we show that the Cauchy reals also form a set. More generally, binary refinement ensures that one remains in the realm of sets when one starts from discrete sets and one applies the operations of exponentiation and binary product a finite number of times.
Foulis, David J.
1 Citations
A Heyting effect algebra (HEA) is a latticeordered effect algebra that is at the same time a Heyting algebra and for which the Heyting center coincides with the effectalgebra center. Every HEA is both an MValgebra and a StoneHeyting algebra and is realized as the unit interval in its own universal group. We show that a necessary and sufficient condition that an effect algebra is an HEA is that its universal group has the central comparability and central Rickart properties.
George, Benjamin R.
1 Citations
The notions of finite and infinite secondorder characterizability of cardinal and ordinal numbers are developed. Several known results for the case of finite characterizability are extended to infinite characterizability, and investigations of the secondorder theory of ordinals lead to some observations about the FraenkelCarnap question for wellorders and about the relationship between ordinal characterizability and ordinal arithmetic. The broader significance of cardinal characterizability and the relationships between different notions of characterizability are also discussed.
Cignoli, Roberto; Monteiro, Luiz
1 Citations
For each integer n ≥ 2, MV_{n} denotes the variety of MValgebras generated by the MVchain with n elements. Algebras in MV_{n} are represented as continuous functions from a Boolean space into a nelement chain equipped with the discrete topology. Using these representations, maximal subalgebras of algebras in MV_{n} are characterized, and it is shown that proper subalgebras are intersection of maximal subalgebras. When A ∈ MV_{3}, the mentioned characterization of maximal subalgebras of A can be given in terms of prime filters of the underlying lattice of A, in the form that was conjectured by A. Monteiro.
Zimmermann, Thomas Ede
9 Citations
The paper is about the interpretation of opaque verbs like “seek”, “owe”, and “resemble” which allow for unspecific readings of their (indefinite) objects. It is shown that the following two observations create a problem for semantic analysis:
(a)
The opaque position is upward monotone: “John seeks a unicorn” implies “John seeks an animal”, given that “unicorn” is more specific than “animal”.
(b)
Indefinite objects of opaque verbs allow for higherorder, or “underspecific”, readings: “Jones is looking for something Smith is looking for” can express that there is something unspecific that both Jones and Smith are looking for.
Given (a) and (b), it would seem that the following inference is hard to escape, if the premisses are construed unspecifically and the conclusion is taken on its under specific reading:
Jones is looking for a sweater.
Smith is looking for a pen.
Smith is looking for something Jones is looking for.
It is shown that this monotonicity problem can be solved by analyzing unspecific readings as existential quantifications over the subproperties of the property expressed by their object.
Ericsson, Stina
1 Citations
Elliptical utterances in dialogue are here investigated using Optimality Theory (OT). The focus is on generation, and the analysis as well as the examples used are based on a study of elliptical utterances in corpora of recorded dialogues. The OT analysis makes use of the information structural notions of focus and ground. Two important optimisations of elliptical utterances are investigated. One concerns the optimisation of the part of the context that the elliptical utterance is connected to, and the other concerns the determination of whether an elliptical or a nonelliptical utterance is to be produced.
Jones, Rosie; Bartz, Kevin; Subasic, Pero; Rey, Benjamin
1 Citations
Web searchers reformulate their queries, as they adapt to search engine behavior, learn more about a topic, or simply correct typing errors. Automatic query rewriting can help user web search, by augmenting a user’s query, or replacing the query with one likely to retrieve better results. One example of queryrewriting is spellcorrection. We may also be interested in changing words to synonyms or other related terms. For Japanese, the opportunities for improving results are greater than for languages with a single character set, since documents may be written in multiple character sets, and a user may express the same meaning using different character sets. We give a description of the characteristics of Japanese search query logs and manual query reformulations carried out by Japanese web searchers. We use characteristics of Japanese query reformulations to extend previous work on automatic query rewriting in English, taking into account the Japanese writing system. We introduce several new features for building models resulting from this difference and discuss their impact on automatic query rewriting. We also examine enhancements in the form of rules which block conversion between some character sets, to address Japanese homophones. The precision/recall curves show significant improvement with the new feature set and blocking rules, and are often better than the English counterpart.
Chang, JingShin; Teng, WeiLun
2 Citations
An HMMbased single character recovery (SCR) model is proposed in this paper to extract a large set of atomic abbreviations and their full forms from a text corpus. By an “atomic abbreviation,” it refers to an abbreviated word consisting of a single Chinese character. This task is important since Chinese abbreviations cannot be enumerated exhaustively but the abbreviation process for compound words seems to be compositional. One can often decode an abbreviated word character by character to its full form. With a large atomic abbreviation dictionary, one may be able to handle multiple character abbreviation problems more easily based on the compositional property of abbreviations.
Matthewson, Lisa
36 Citations
This paper contributes to the debate about ‘tenseless languages’ by defending a tensed analysis of a superficially tenseless language. The language investigated is St’át’imcets (Lillooet Salish). I argue that although St’át’imcets lacks overt tense morphology, every finite clause in the language possesses a phonologically covert tense morpheme; this tense morpheme restricts the reference time to being nonfuture. Future interpretations, as well as ‘past future’ wouldreadings, are obtained by the combination of covert tense with an operator analogous to Abusch’s (1985) WOLL. I offer St’át’imcetsinternal evidence (of a kind not previously adduced) that the WOLLlike operator is modal in nature. It follows from the analysis presented here that there are only two (probably related) differences between St’át’imcets and English in the area of tense. The first is that St’át’imcets lacks tense morphemes which are pronounced. The second is that the St’át’imcets tense morpheme is semantically underspecified compared to English ones. In each of these respects, the St’át’imcets tense morpheme displays similar properties to pronouns, which may be covert and which may fail to distinguish person, number or gender. Along the way, I point out several striking and subtle similarities in the interpretive possibilities of St’át’imcets and English. I suggest that these similarities may reveal nonaccidental properties of tense systems in natural language. I conclude with discussion of the implications of the analysis for crosslinguistic variation, learnability and the possible existence of tenseless languages.
Gehrke, Mai
28 Citations
Algebraic work [9] shows that the deep theory of possible world semantics is available in the more general setting of substructural logics, at least in an algebraic guise. The question is whether it is also available in a relational form.This article seeks to set the stage for answering this question. Guided by the algebraic theory, but purely relationally we introduce a new type of frames. These structures generalize Kripke structures but are twosorted, containing both worlds and coworlds. These latter points may be viewed as modelling irreducible increases in information where worlds model irreducible decreases in information. Based on these structures, a purely model theoretic and uniform account of completeness for the implicationfusion fragment of various substructural logics is given. Completeness is obtained via a generalization of the standard canonical model construction in combination with correspondence results.
Blackburn, P.; Cate, B. ten
21 Citations
In this paper we argue that hybrid logic is the deductive setting most natural for Kripke semantics. We do so by investigating hybrid axiomatics for a variety of systems, ranging from the basic hybrid language (a decidable system with the same complexity as orthodox propositional modal logic) to the strong Priorean language (which offers full firstorder expressivity).
We show that hybrid logic offers a genuinely firstorder perspective on Kripke semantics: it is possible to define base logics which extend automatically to a wide variety of frame classes and to prove completeness using the Henkin method. In the weaker languages, this requires the use of nonorthodox rules. We discuss these rules in detail and prove noneliminability and eliminability results. We also show how another type of rule, which reflects the structure of the strong Priorean language, can be employed to give an even wider coverage of frame classes. We show that this deductive apparatus gets progressively simpler as we work our way up the expressivity hierarchy, and conclude the paper by showing that the approach transfers to firstorder hybrid logic.
ArlóCosta, Horacio; Pacuit, Eric
10 Citations
The paper focuses on extending to the first order case the semantical program for modalities first introduced by Dana Scott and Richard Montague. We focus on the study of neighborhood frames with constant domains and we offer in the first part of the paper a series of new completeness results for salient classical systems of first order modal logic. Among other results we show that it is possible to prove strong completeness results for normal systems without the Barcan Formula (like FOL + K)in terms of neighborhood frames with constant domains. The first order models we present permit the study of many epistemic modalities recently proposed in computer science as well as the development of adequate models for monadic operators of high probability. Models of this type are either difficult of impossible to build in terms of relational Kripkean semantics [40].
We conclude by introducing general first order neighborhood frames with constant domains and we offer a general completeness result for the entire family of classical first order modal systems in terms of them, circumventing some wellknown problems of propositional and first order neighborhood semantics (mainly the fact that many classical modal logics are incomplete with respect to an unmodified version of either neighborhood or relational frames). We argue that the semantical program that thus arises offers the first complete semantic unification of the family of classical first order modal logics.
Brandenburger, Adam; Keisler, H. Jerome
20 Citations
A paradox of selfreference in beliefs in games is identified, which yields a gametheoretic impossibility theorem akin to Russell’s Paradox. An informal version of the paradox is that the following configuration of beliefs is impossible:
Ann believes that Bob assumes that
Ann believes that Bob’s assumption is wrong
This is formalized to show that any belief model of a certain kind must have a ‘hole.’ An interpretation of the result is that if the analyst’s tools are available to the players in a game, then there are statements that the players can think about but cannot assume. Connections are made to some questions in the foundations of game theory.
Sowa, John F.
6 Citations
Since the pioneering work by Kripke and Montague, the term possible world has appeared in most theories of formal semantics for modal logics, natural languages, and knowledgebased systems. Yet that term obscures many questions about the relationships between the real world, various models of the world, and descriptions of those models in either formal languages or natural languages. Each step in that progression is an abstraction from the overwhelming complexity of the world. At the end, nothing is left but a colorful metaphor for an undefined element of a set W called worlds, which are related by an undefined and undefinable primitive relation R called accessibility. For some purposes, the resulting abstraction has proved to be useful, but as a general theory of meaning, the abstraction omits too many significant features. So much information has been lost at each step that many philosophers, linguists, and psychologists have dismissed modeltheoretic semantics as irrelevant to the study of meaning. This article examines the steps in the process of extractingthe pair (W,R) from the world and the way people talk about the world. It shows that the Kripke worlds can be reinterpreted as part of a Peircean semiotic theory, which can also include contributions from many other studies in cognitive science. Among them are Dunn’s semantics based on laws and facts, the lexical semantics preferred by manylinguists, psychological models of how the world is perceived, and philosophies of science that relate theories to the world. A full integration of all those sources is far beyond the scope of this article, but an outline of the approach suggests that Peirce’s vision is capable of relating and reconciling the competing sources.
Vakarelov, Dimiter
5 Citations
The paper is devoted to the contributions of Helena Rasiowa to the theory of nonclassical negation. The main results of Rasiowa in this area concerns
–constructive logic with strong (Nelson) negation,
–intuitionistic negation and some of its generalizations: minimal negation of Johansson and seminegation.
We discuss also the impact of Rasiowa works on the theory of nonclassical negation.
Poesio, M.; Patel, A.; Eugenio, B. Di
7 Citations
The recent development of reliable guidelines for discourse structure annotation, and the resulting availability of corpora annotated for discourse structure, have made it possible to subject to rigorous empirical testing the claims of seminal theories about the impact of discourse structure on anaphora. We carried out an empirical investigation of the claims made in two models of the Global Focus–Grosz and Sidner’s stack model and Walker’s cache model–using a corpus of tutorial dialogues annotated according to Relational Discourse Analysis. We studied how these two models affect both the accessibility of the antecedents and the ambiguity of both pronouns and definite descriptions, examining a variety of stack and cache update strategies and of cache sizes, and paying special attention to the problem of antecedents contained in embedded segments. The best results for the stack model were obtained when DSPs were only associated with intentional relations (i.e., excluding informational relations) and allowing embedded segments to remain on the stack as long as the superordinate segment was open. With the cache model, we found that cache size matters (if the size is less than 10, the model is too restrictive) and that the cache replacement strategy matters even more.
Branigan, Holly
5 Citations
Most research on dialogue has concentrated on dialogues involving two interlocutors. In this paper we consider the nature of multiparty dialogues. We discuss whether some of the important characteristics that have been identified in twoparty dialogues and the theoretical accounts that have been proposed for them are also applicable to multiparty dialogues. We argue that the way in which common ground is accumulated in multiparty dialogues differs from the way in which it is accumulated in the twoparty dialogues that have commonly been studied. However, we argue that these differences are related to particular characteristics which tend to be associated with either twoparty or multiparty dialogues but are not inherent to them. We show that these characteristics can account for differences between different types of twoparty and multiparty dialogues, including effects of group size, and we propose that the same fundamental principles underlie behaviour in both twoparty and multiparty dialogues.
Purver, Matthew
7 Citations
This paper sets out a approach to clarification requests (CRs) general enough to cover all the major forms found in corpus data and specific enough to analyse the questions they ask about individual words and phrases. Its main features are a view of utterances as contextual abstracts with a radically abstracted semantic representation, and a view of CRs as standard utterances asking standard questions, but showing a particular kind of contextual dependence. It shows how it can be implemented computationally within a prototype textbased dialogue system, CLARIE, allowing it not only to generate CRs to clarify unknown reference and learn new words, but also to interpret and respond to user CRs, with both capabilities integrated within the standard dialogue processes and governed by empirical evidence.
Benz, Anton
1 Citations
We are going to explain partial blocking as the result of diachronic processes based on what we will call associative learning. Especially, we argue that the task posed by partial blocking phenomena is to explain their emergence from unambiguous and fully expressive languages. This contrasts with approaches that presuppose underspecified semantic meanings or ineffability like Bidirectional Optimality Theory (Bi–OT) and some game theoretic explanations. We introduce a formal framework based on learning, speaker’s preferences and pure semantics for describing diachronic strengthening of meaning. Moreover, we show how the diachronic development of systems of semantically co–extensive forms can be described in terms of a complete system of diachronic laws.
Purver, Matthew; Cann, Ronnie; Kempson, Ruth
17 Citations
Standard grammar formalisms are defined without reflection of the incremental, serial and contextdependent nature of language processing; any incrementality must therefore be reflected by independently defined parsing and/or generation techniques, and contextdependence by separate pragmatic modules. This leads to a poor setup for modelling dialogue, with its rich speakerhearer interaction and high proportion of contextdependent and apparently grammatically illformed utterances. Instead, this paper takes an inherently incremental grammar formalism, Dynamic Syntax (DS) (Kempson et al., 2001), proposes a contextbased extension and defines corresponding contextdependent parsing and generation models together with a resulting natural definition of contextdependent wellformedness. These are shown to allow a straightforward model of otherwise problematic dialogue phenomena such as shared utterances, ellipsis and alignment. We conclude that language competence is a capacity for dialogue.
Fox, Danny; Hackl, Martin
61 Citations
The notion of measurement plays a central role in human cognition. We measure people’s height, the weight of physical objects, the length of stretches of time, or the size of various collections of individuals. Measurements of height, weight, and the like are commonly thought of as mappings between objects and dense scales, while measurements of collections of individuals, as implemented for instance in counting, are assumed to involve discrete scales. It is also commonly assumed that natural language makes use of both types of scales and subsequently distinguishes between two types of measurements. This paper argues against the latter assumption. It argues that natural language semantics treats all measurements uniformly as mappings from objects (individuals or collections of individuals) to dense scales, hence the Universal Density of Measurement (UDM). If the arguments are successful, there are a variety of consequences for semantics and pragmatics, and more generally for the place of the linguistic system within an overall architecture of cognition.
Kibble, Rodger
6 Citations
This paper rehearses some arguments in favour of a normative, commitment based semantics for dialogue acts, as opposed to more familiar mentalistic accounts based on notions of belief and intention. The main focus of the paper is on identifying appropriate notions of propositional commitment and entitlement that can be applied to argumentation and dialogue modelling. A case is made for adopting elements of Brandom’s framework of normative pragmatics, modelling dialogue states as deontic scoreboards which keep track of commitments and entitlements that speakers acknowledge and hearers attribute to other interlocutors. The paper concludes by outlining protocols and update rules for selected dialogue acts.
Pickering, Martin J.; Garrod, Simon
70 Citations
Pickering and Garrod (2004) argued that alignment is the basis of successful communication in dialogue. In other words, successful communication goes handinhand with the development of similar representations in the interlocutors. But what exactly does this mean? In this paper, we attempt to define alignment, contrasting alignment of situation models with alignment of linguistic representations. We then speculate on how these notions are related and why they lead to conversational success
Gluer, Kathrin; Pagin, Peter
7 Citations
Saul Kripke’s thesis that ordinary proper names are rigid designators is supported by widely shared intuitions about the occurrence of names in ordinary modal contexts. By those intuitions names are scopeless with respect to the modal expressions. That is, sentences in a pair like (a) Aristotle might have been fond of dogs, (b) Concerning Aristotle, it is true that he might have been fond of dogs will have the same truth value. The same does not in general hold for definite descriptions. If one, like Kripke, accounts for this difference by means of the intensions of the names and the descriptions, the conclusion is that names do not in general have the same intension as any normal, identifying description. However, this difference can be accounted for alternatively by appeal to the semantics of the modal expressions. On the account we suggest, dubbed ‘relational modality’, simple singular terms, like proper names, contribute to modal contexts simply by their actual world reference, not by their descriptive content. That account turns out to be fully equivalent with the rigidity account when it comes to truth of modal and nonmodal sentence (with respect to the actual world), and hence supports the same basic intuitions. Here we present the relational modality account and compare it with others, in particular Kripke’s own.
Nguyen, Thai Phuong; Shimazu, Akira
1 Citations
We present a phrasebased statistical machine translation approach which uses linguistic analysis in the preprocessing phase. The linguistic analysis includes morphological transformation and syntactic transformation. Since the wordorder problem is solved using syntactic transformation, there is no reordering in the decoding phase. For morphological transformation, we use handcrafted transformational rules. For syntactic transformation, we propose a transformational model based on a probabilistic contextfree grammar. This model is trained using a bilingual corpus and a broadcoverage parser of the source language. This approach is applicable to language pairs in which the target language is poor in resources. We considered translation from English to Vietnamese and from English to French. Our experiments showed significant BLEUscore improvements in comparison with Pharaoh, a stateoftheart phrasebased SMT system.
Crego, Josep Maria; Mariño, José B.
13 Citations
In this paper we describe an elegant and efficient approach to coupling reordering and decoding in statistical machine translation, where the ngram translation model is also employed as distortion model. The reordering search problem is tackled through a set of linguistically motivated rewrite rules, which are used to extend a monotonic search graph with reordering hypotheses. The extended graph is traversed in the global search when a fully informed decision can be taken. Further experiments show that the ngram translation model can be successfully used as reordering model when estimated with reordered source words. Experiments are reported on the Europarl task (Spanish–English and English–Spanish). Results are presented regarding translation accuracy and computational efficiency, showing significant improvements in translation quality with respect to monotonic search for both translation directions at a very low computational cost.
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
1 Citations
In this paper we present EXTRA (EXamplebased TRanslation Assistant), a translation memory (TM) system. EXTRA is able to propose effective translation suggestions by relying on syntactic analysis of the text and on a rigorous, languageindependent measure; the search is performed efficiently in large amounts of bilingual texts thanks to its advanced retrieval techniques. EXTRA does not use external knowledge requiring the intervention of users and is completely customizable and portable as it has been implemented on top of a standard DataBase Management System. The paper provides a thorough evaluation of both the effectiveness and the efficiency of our system. In particular, in order to quantify the benefits offered by EXTRA assisted translation over manual translation, we introduce a simulator implementing specifically devised statistical, processoriented, discreteevent models. As far as we know, this is the first time statistical simulation experiments have been used to face the nontrivial problem of evaluating TM systems, particularly for comparing the time that could be saved by performing assisted translation versus “manual” translation and for optimally tuning the system behaviour with respect to differently skilled users. In our experiments, we considered three scenarios, manual translation with one or two translators and assisted translation with one translator. The time needed for one translator to do an assisted translation is significantly closer to that of a team of two translators than to that of the single translator. The mean sentence translation time is by far the lowest for this scenario, corresponding to the highest per translator productivity. We also estimate the total translation time when the number of query sentences, the maximum number of suggestions to be read, and the probability of look up are varied: the best tradeoff is given by reading (and presenting) four or five suggestions at the most.
Fitting, Melvin
2 Citations
In an earlier paper, [5], I gave semantics and tableau rules for a simple firstorder intensional logic called FOIL, in which both objects and intensions are explicitly present and can be quantified over. Intensions, being nonrigid, are represented in FOIL as (partial) functions from states to objects. Scoping machinery, predicate abstraction, is present to disambiguate sentences like that asserting the necessary identity of the morning and the evening star, which is true in one sense and not true in another.
In this paper I address the problem of axiomatizing FOIL. I begin with an interesting sublogic with predicate abstraction and equality but no quantifiers. In [2] this sublogic was shown to be undecidable if the underlying modal logic was at least K4, though it is decidable in other cases. The axiomatization given is shown to be complete for standard logics without a symmetry condition. The general situation is not known. After this an axiomatization for the full FOIL is given, which is straightforward after one makes a change in the point of view.
Gottwald, Siegfried
14 Citations
For classical sets one has with the cumulative hierarchy of sets, with axiomatizations like the system ZF, and with the category SET of all sets and mappings standard approaches toward global universes of all sets.
We discuss here the corresponding situation for fuzzy set theory. Our emphasis will be on various approaches toward (more or less naively formed) universes of fuzzy sets as well as on axiomatizations, and on categories of fuzzy sets.
What we give is a (critical) survey of quite a lot of such approaches which have been offered in the last approximately 35 years.
Part I was devoted to model based and to axiomatic approaches; the present Part II is devoted to category theoretic approaches.
Jansana, Ramon
17 Citations
A logic is selfextensional if its interderivability (or mutual consequence) relation is a congruence relation on the algebra of formulas. In the paper we characterize the selfextensional logics with a conjunction as the logics that can be defined using the semilattice order induced by the interpretation of the conjunction in the algebras of their algebraic counterpart. Using the charactrization we provide simpler proofs of several results on selfextensional logics with a conjunction obtained in [13] using Gentzen systems. We also obtain some results on Fregean logics with conjunction.
Konev, B.; Kontchakov, R.; Wolter, F.; Zakharyaschev, M.
5 Citations
We investigate computational properties of propositional logics for dynamical systems. First, we consider logics for dynamic topological systems (W.f), fi, where W is a topological space and f a homeomorphism on W. The logics come with ‘modal’ operators interpreted by the topological closure and interior, and temporal operators interpreted along the orbits {w, f(w), f^{2}(w), ˙˙˙} of points w ε W. We show that for various classes of topological spaces the resulting logics are not recursively enumerable (and so not recursively axiomatisable). This gives a ‘negative’ solution to a conjecture of Kremer and Mints. Second, we consider logics for dynamical systems (W, f), where W is a metric space and f and isometric function. The operators for topological interior/closure are replaced by distance operators of the form ‘everywhere/somewhere in the ball of radius a, ‘for a ε Q^{+}. In contrast to the topological case, the resulting logic turns out to be decidable, but not in time bounded by any elementary function.
Francez, Nissim; Steedman, Mark
5 Citations
The paper proposes a semantics for contextual (i.e., Temporal and Locative) Prepositional Phrases (CPPs) like during every meeting, in the garden, when Harry met Sally and where I’m calling from. The semantics is embodied in a multimodal extension of Combinatory Categoral Grammar (CCG). The grammar allows the strictly monotonic compositional derivation of multiple correct interpretations for “stacked” or multiple CPPs, including interpretations whose scope relations are not what would be expected on standard assumptions about surfacesyntactic command and monotonic derivation. A typehierarchy of functional modalities plays a crucial role in the specification of the fragment.
López, M. Dolores Jiménez
2 Citations
Taking as its starting point significant similarities between a formal language model—Grammar Systems—and a grammatical theory—Autolexical Syntax—in this paper we suggest the application of the former to the topic of the latter. To show the applicability of Grammar Systems Theory to grammatical description, we introduce a formallanguagetheoretic framework for the architecture of natural language grammar: Linguistic Grammar Systems. We prove the adequacy of this model by highlighting its features (modularity, parallelism, interaction) and by showing the similarity between this framework and accepted and wellknown grammatical models (e.g. Autolexical Syntax).
Mayol, L.; Boleda, G.; Badia, T.
2 Citations
This paper describes a methodology aimed at grouping Catalan verbs according to their syntactic behavior. Our goal is to acquire a small number of basic classes with a high level of accuracy, using minimal resources. Information on syntactic class, expensive and slow to compile by hand, is useful for any NLP task requiring specific lexical information. We show that it is possible to acquire this kind of information using only a POStagged corpus. We perform two clustering experiments. The first one aims at classifying verbs into transitive, intransitive and verbs alternating with a seconstruction. Our system achieves an average 0.84 Fscore, for a task with a 0.33 baseline. The second experiment aims at further distinguishing among pure intransitives and verbs bearing a prepositional object. The baseline for the task is 0.51 and the upperbound 0.98. The system achieves an average 0.88 Fscore.
White, Michael
22 Citations
We describe a chart realization algorithm for Combinatory Categorial Grammar (CCG), and show how it can be used to efficiently realize a wide range of coordination phenomena, including argument cluster coordination and gapping. The algorithm incorporates three novel methods for improving the efficiency of chart realization: (i) using rules to chunk the input logical form into subproblems to be solved independently prior to further combination; (ii) pruning edges from the chart based on the ngram score of the edge’s string, in comparison to other edges with equivalent categories; and (iii) formulating the search as a bestfirst anytime algorithm, using ngram scores to sort the edges on the agenda. The algorithm has been implemented as an extension to the OpenCCG open source CCG parser, and initial performance tests indicate that the realizer is fast enough for practical use in natural language dialogue systems.
Siddharthan, Advaith
23 Citations
Syntactic simplification is the process of reducing the grammatical complexity of a text, while retaining its information content and meaning. The aim of syntactic simplification is to make text easier to comprehend for human readers, or process by programs. In this paper, we formalise the interactions that take place between syntax and discourse during the simplification process. This is important because the usefulness of syntactic simplification in making a text accessible to a wider audience can be undermined if the rewritten text lacks cohesion. We describe how various generation issues like sentence ordering, cueword selection, referringexpression generation, determiner choice and pronominal use can be resolved so as to preserve conjunctive and anaphoric cohesive relations during syntactic simplification and present the results of an evaluation of our syntactic simplification system.
Deemter, Kees; Reiter, Ehud; Horacek, Helmut
1 Citations
Natural Language Generation (NLG) can be studied both empirically and formally. In recent times empirical research has tended to dominate the NLG research community, especially with the emergence of corpus and statistical techniques. However, formal research, based on proofs, formal models, and formal linguistics, can also contribute to the NLG research agenda. In this paper, we discuss what types of formal research are relevant for NLG, and we introduce the four papers in the Special Issue on Formal Issues in Natural Language Generation of the journal Research on Language and Computation in the light of this discussion.
Mel’čuk, Igor; Wanner, Leo
6 Citations
This paper addresses one of the central problems arising at the transfer stage in machine translation: syntactic mismatches, that is, mismatches between a sourcelanguage sentence structure and its equivalent targetlanguage sentence structure. The level at which we assume the transfer to be carried out is the DeepSyntactic Structure (DSyntS) as proposed in the MeaningText Theory (MTT). DSyntS is abstract enough to avoid all types of divergences that result either from restricted lexical cooccurrence or from surfacesyntactic discrepancies between languages. As for the remaining types of syntactic divergences, all of them occur not only interlinguistically, but also intralinguistically; this means that establishing correspondences between semantically equivalent expressions of the source and target languages that diverge with respect to their syntactic structure is nothing else than paraphrasing. This allows us to adapt the powerful intralinguistic paraphrasing mechanism developed in MTT for purposes of interlinguistic transfer.
Minock, Michael
2 Citations
This article proposes a novel technique to generate natural language descriptions for a wide class of relational database queries. The approach to describing queries is phrasal and is restricted to a class of queries that return only whole schema tuples as answers. Query containment and equivalence are decidable for this class and this property is exploited in the maintenance and use of a phrasal lexicon. The query description mechanism is implemented within the Schema Tuple Query Processor (STEP) system (http://www.cs.umu.se/~mjm/step). Because the said query class is also closed over elementary set operations, it may be reasoned with in a relatively unrestricted manner. This enables a modular separation between a reasoning component and a ‘tactical’ realization component. To demonstrate this modularity, this fragment is shown to be adequate for several cooperative reasoning techniques. Thus the cooperative information system serves as the ‘strategic’ component, deciding what to say, while the generation system acts as the ‘tactical’ component, deciding how to say it. Naturally expressions within the said query language are the interchange language between these two components.
Horacek, Helmut
Handling the dependencies among alternatives in composing expressions in an efficient and qualitatively accurate manner is a fundamental problem of NLG. To pursue this goal effectively, simplifications are put forward in practical approaches, but also ambitious control regimes are tried out occasionally. However, neither of these is able to operate adequately on larger and involved structures. Approaching this issue in a methodological way, we present a case study from the area of mathematical proofs that illustrates the rhetorically motivated reorganization of machinegenerated case analyses. Ingredients of this investigation are the design of optimization operations, orderings on groups of operations that take their dependencies into account, and tentative applications of local operations to test the effects of crucial dependencies. Our approach conceives NLG as a standard pipeline architecture putting emphasis on orderings, with local revisions as a minor extension. This is particularly effective when text planning is organized as an optimization rather than as a construction process, such as for the presentation of mathematical proofs.
Kowalski, Tomasz; Kracht, Marcus
Post to Citeulike
In this paper we show that a variety of modal algebras of finite type is semisimple iff it is discriminator iff it is both weakly transitive and cyclic. This fact has been claimed already in [4] (based on joint work by the two authors) but the proof was fatally flawed.
Blok, W. J.; Jónsson, Bjarni
21 Citations
This paper is based on Lectures 1, 2 and 4 in the series of ten lectures titled “Algebraic Structures for Logic” that Professor Blok and I presented at the Twenty Third Holiday Mathematics Symposium held at New Mexico State University in Las Cruces, New Mexico, January 812, 1999. These three lectures presented a new approach to the algebraization of deductive systems, and after the symposium we made plans to publish a joint paper, to be written by Blok, further developing these ideas. That project was still incomplete when Blok died. In fact, there is no indication that he had prepared a draft of the paper, and we do not know what new material he intended to include. I am therefore not in a position to complete the project as he had envisioned it. So, I have settled for the more limited objective of presenting the material from the three lectures, leaving to others the task of adapting the techniques used there to more general situations.
Kearnes, Keith A.
Let FΛ be a finite dimensional path algebra of a quiver Λ over a field F. Let L and R denote the varieties of all left and right FΛmodules respectively. We prove the equivalence of the following statements.
The subvariety lattice of L is a sublattice of the subquasivariety lattice of L.
The subquasivariety lattice of R is distributive.
Λ is an ordered forest.
Dillon, Sarah; Fraser, Janet
7 Citations
There has been little research on the role of translation memory (TM) in practitioners’ working practices, apart from reviews and a survey into ownership and rates issues. The present study provides a comprehensive snapshot of the perceptions of UKbased professional translators with regard to TM as a tool in their working environment. Moore and Benbasat’s instrument for measuring perceptions with regard to the adoption of an information technology innovation was adapted and used to investigate three hypotheses: that translators who are relatively new to the translation industry have a more positive general perception of TM than experienced translators; that translators who use TM have a more positive general perception of it than translators who do not; and, finally, that translators’ perception of the value of TM is not linked with their perceived IT proficiency. The study found that younger translators took a positive general view of TM irrespective of actual use, in particular attributing esteem to more experienced translators using (or perceived to be using) TM. Nonusers at all experience levels, however, had a negative general view of TM irrespective of actual use. Both findings point to the significance of adequate knowledge in adoption decisions. Perceived IT proficiency, finally, was found to play a key role in translators’ perceptions of the benefits of TM. These findings are discussed in the light of recent trends in the translation industry, including Continuing Professional Development, quality assurance and regulation.
Maksimova, Larisa
5 Citations
All extensions of the modal Grzegorczyk logic Grz possessing projective Beth's property PB2 are described. It is proved that there are exactly 13 logics over Grz with PB2. All of them are finitely axiomatizable and have the finite model property. It is shown that PB2 is strongly decidable over Grz, i.e. there is an algorithm which, for any finite system Rul of additional axiom schemes and rules of inference, decides if the calculus Grz+Rul has the projective Beth property.
Olson, Jeffrey S.
1 Citations
CRS(fc) denotes the variety of commutative residuated semilatticeordered monoids that satisfy (x ⋀ e)^{k} ≤ (x ⋀ e)^{k+1}. A structural characterization of the subdirectly irreducible members of CRS(k) is proved, and is then used to provide a constructive approach to the axiomatization of varieties generated by positive universal subclasses of CRS(k).
Tsinakis, Constantine; Wille, Annika M.
17 Citations
We establish the existence uncountably many atoms in the subvariety lattice of the variety of involutive residuated lattices. The proof utilizes a construction used in the proof of the corresponding result for residuated lattices and is based on the fact that every residuated lattice with greatest element can be associated in a canonical way with an involutive residuated lattice.
van Alten, C. J.
9 Citations
A biresiduation algebra is a 〈/,\,1〉subreduct of an integral residuated lattice. These algebras arise as algebraic models of the implicational fragment of the Full Lambek Calculus with weakening. We axiomatize the quasivariety B of biresiduation algebras using a construction for integral residuated lattices. We define a filter of a biresiduation algebra and show that the lattice of filters is isomorphic to the lattice of Bcongruences and that these lattices are distributive. We give a finite basis of terms for generating filters and use this to characterize the subvarieties of B with EDPC and also the discriminator varieties. A variety generated by a finite biresiduation algebra is shown to be a subvariety of B. The lattice of subvarieties of B is investigated; we show that there are precisely three finitely generated covers of the atom.
Jansana, Ramon
Willem Blok was one of the founders of the field Abstract Algebraic Logic. The paper describes his research in this field.
Blok, W. J.; Hoogland, Eva
6 Citations
The present paper is a study in abstract algebraic logic. We investigate the correspondence between the metalogical Beth property and the algebraic property of surjectivity of epimorphisms. It will be shown that this correspondence holds for the large class of equivalential logics. We apply our characterization theorem to relevance logics and manyvalued logics.
Adaricheva, K.; Mckenzie, R.; Zenk, E. R.; Mar´ti, M.; Nation, J. B.
4 Citations
The least element 0 of a finite meet semidistributive lattice is a meet of meetprime elements. We investigate conditions under which the least element of an algebraic, meet semidistributive lattice is a (complete) meet of meetprime elements. For example, this is true if the lattice has only countably many compact elements, or if L < 2^{ℵ0}, or if L is in the variety generated by a finite meet semidistributive lattice. We give an example of an algebraic, meet semidistributive lattice that has no meetprime element or joinprime element. This lattice L has L = LC = 2^{ℵ0} where L_{c} is the set of compact elements of L.
Benthem, Johan Van
25 Citations
Taking Löb's Axiom in modal provability logic as a running thread, we discuss some general methods for extending modal frame correspondences, mainly by adding fixedpoint operators to modal languages as well as their correspondence languages. Our suggestions are backed up by some new results – while we also refer to relevant work by earlier authors. But our main aim is advertizing the perspective, showing how modal languages with fixedpoint operators are a natural medium to work with.
Cignoli, Roberto; Torrell, Antoni Torrens
32 Citations
The aim of this paper is to give a description of the free algebras in some varieties of Glivenko MTLalgebras having the Boolean retraction property. This description is given (generalizing the results of [9]) in terms of weak Boolean products over Cantor spaces. We prove that in some cases the stalks can be obtained in a constructive way from free kernel DLalgebras, which are the maximal radical of directly indecomposable Glivenko MTLalgebras satisfying the equation in the title. We include examples to show how we can apply the results to describe free algebras in some well known varieties of involutive MTLalgebras and of pseudocomplemented MTLalgebras.
Czelakowski, Janusz
1 Citations
The purpose of this paper is to present in a uniform way the commutator theory for kdeductive system of arbitrary positive dimension k. We are interested in the logical perspective of the research — an emphasis is put on an analysis of the interconnections holding between the commutator and logic. This research thus qualifies as belonging to abstract algebraic logic, an area of universal algebra that explores to a large extent the methods provided by the general theory of deductive systems. In the paper the new term ‘commutator formula’ is introduced. The paper is concerned with the meanings of the above term in the models provided by the commutator theory and clarifies contexts in which these meanings occur. The work is presented in an abstracted form: main ideas are outlined but proofs are deferred to the second part of the paper.
Font, Josep Maria; Jansana, Ramon; Pigozzi, Don
5 Citations
In this paper we consider the structure of the class FGModS of full generalized models of a deductive system S from a universalalgebraic point of view, and the structure of the set of all the full generalized models of S on a fixed algebra A from the latticetheoretical point of view; this set is represented by the lattice FACS_{s}A of all algebraic closedset systems C on A such that (A, C) ε FGModS. We relate some properties of these structures with tipically logical properties of the sentential logic S. The main algebraic properties we consider are the closure of FGModS under substructures and under reduced products, and the property that for any A the lattice FACS_{s}A is a complete sublattice of the lattice of all algebraic closedset systems over A. The logical properties are the existence of a fully adequate Gentzen system for S, the Local Deduction Theorem and the Deduction Theorem for S. Some of the results are established for arbitrary deductive systems, while some are found to hold only for deductive systems in more restricted classes like the protoalgebraic or the weakly algebraizable ones. The paper ends with a section on examples and counterexamples.
Galatos, Nikolaos; Ono, Hiroakira
31 Citations
Substructural logics have received a lot of attention in recent years from the communities of both logic and algebra. We discuss the algebraization of substructural logics over the full Lambek calculus and their connections to residuated lattices, and establish a weak form of the deduction theorem that is known as parametrized local deduction theorem. Finally, we study certain interpolation properties and explain how they imply the amalgamation property for certain varieties of residuated lattices.
Goldblatt, Robert
2 Citations
The categorytheoretic nature of general frames for modal logic is explored. A new notion of "modal map" between frames is defined, generalizing the usual notion of bounded morphism/pmorphism. The category Fm of all frames and modal maps has reflective subcategories CHFm of compact Hausdorff frames, DFm of descriptive frames, and UEFm of ultrafilter enlargements of frames. All three subcategories are equivalent, and are dual to the category of modal algebras and their homomorphisms.
An important example of a modal map that is typically not a bounded morphism is the natural insertion of a frame A into its ultrafilter enlargement EA. This map is used to show that EA is the free compact Hausdorff frame generated by A relative to Fm. The monad E of the resulting adjunction is examined and its EilenbergMoore category is shown to be isomorphic to CHFm. A categorical equivalence between the Kleisli category of E and UEFm is defined from a construction that assigns to each frame A a frame A* that is "imageclosed" in the sense that every pointimage {b : aRb} in A is topologically closed. A* is the unique imageclosed frame having the same ultrafilter enlargement as A.
These ideas are connected to a category 2U shown by S. K. Thomason to be dual to the category of complete and atomic modal algebras and their homomorphisms. 2U is the full subcategory of the Kleisli category of E based on the Kripke frames.
Beigman Klebanov, Beata; Shamir, Eli
7 Citations
Lexical cohesion refers to the readerperceived unity of text achieved by the author’s usage of words with related meanings (Halliday and Hasan, 1976). This article reports on an experiment with 22 readers aimed at finding lexical cohesive patterns in 10 texts. Although there was much diversity in peoples’ answers, we identified a common core of the phenomenon, using statistical analysis of agreement patterns and a validation experiment. The core data may now be used as a minimal test set for models of lexical cohesion; we present an example suggesting that models based on mutually exclusive lexical chains will not suffice. In addition, we believe that procedures for revealing and analyzing subgroup patterns of agreement described here may be applied to data collected in other studies of comparable size.
Stevenson, Mark
5 Citations
Several recent Information Extraction (IE) systems have been restricted to the identification facts which are described within a single sentence. It is not clear what effect this has on the difficulty of the extraction task or how the performance of systems which consider only single sentences should be compared with those which consider multiple sentences. This paper compares three IE evaluation corpora, from the Message Understanding Conferences, and finds that a significant proportion of the facts mentioned therein are not described within a single sentence. Therefore systems which are evaluated only on facts described within single sentences are being tested against a limited portion of the relevant information in the text and it is difficult to compare their performance with other systems. Further analysis demonstrates that anaphora resolution and world knowledge are required to combine information described across multiple sentences. This result has implications for the development and evaluation of IE systems.
Kilgarriff, Adam; Rundell, Michael; Uí Dhonnchadha, Elaine
3 Citations
In a 12month project we have developed a new, registerdiverse, 55millionword bilingual corpus—the New Corpus for Ireland (NCI)—to support the creation of a new EnglishtoIrish dictionary. The paper describes the strategies we employed, and the solutions to problems encountered. We believe we have a good model for corpus creation for lexicography, and others may find it useful as a blueprint. The corpus has two parts, one Irish, the other HibernoEnglish (English as spoken in Ireland). We describe its design, collection and encoding.
Loftsson, Hrafn
2 Citations
We use integrations and combinations of taggers to improve the tagging accuracy of Icelandic text. The accuracy of the best performing integrated tagger, which consists of our linguistic rulebased tagger for initial disambiguation and a trigram tagger for full disambiguation, is 91.80%. Combining five different taggers, using simple voting, results in 93.34% accuracy. By adding two linguistically motivated rules to the combined tagger, we obtain an accuracy of 93.48%. This method reduces the error rate by 20.5%, with respect to the best performing tagger in the combination pool.
Boas, Hans Christian
3 Citations
New methods of documenting languages with digital technologies has led to a multitude of different formats that are difficult to reuse over time. To overcome the problems surrounding the portability of digital language documentation, linguists are in the process of formulating bestpractice recommendations to increase the likelihood of their work's longterm survival. This paper describes the implementation of a comprehensive set of current bestpractice recommendations pertaining to content, format, discovery, access, citation, preservation, and rights in the context of the language documentation efforts of the Texas German Dialect Project. This project is different from others in that it is not primarily concerned with digitizing and archiving existing recordings. Instead, the archive it is creating is the endresult of a research project whose workflow begins with datacollection in the filed and ends with depositing digitized and annotated language materials in a webaccessible digital archive of Texas German. This paper shows how a number of conflicting bestpractice recommendations can be resolved, thereby satisfying the diverse needs of academic research, teaching, and outreach to the community. As such, the results reported here are an important contribution to the search for strategies guaranteeing the longterm survival of digital language documentation resources.
