Showing 1 to 100 of 417 matching Articles
Results per page:
Export (CSV)
By
Biemann, Chris
7 Citations
Syntactic preprocessing is a step that is widely used in NLP applications. Traditionally, rulebased or statistical PartofSpeech (POS) taggers are employed that either need considerable rule development times or a sufficient amount of manually labeled data. To alleviate this acquisition bottleneck and to enable preprocessing for minority languages and specialized domains, a method is presented that constructs a statistical syntactic tagger model from a large amount of unlabeled text data. The method presented here is called unsupervised POStagging, as its application results in corpus annotation in a comparable way to what POStaggers provide. Nevertheless, its application results in slightly different categories as opposed to what is assumed by a linguistically motivated POStagger. These differences hamper evaluation procedures that compare the output of the unsupervised POStagger to a tagging with a supervised tagger. To measure the extent to which unsupervised POStagging can contribute in applicationbased settings, the system is evaluated in supervised POStagging, word sense disambiguation, named entity recognition and chunking. Unsupervised POStagging has been explored since the beginning of the 1990s. Unlike in previous approaches, the kind and number of different tags is here generated by the method itself. Another difference to other methods is that not all words above a certain frequency rank get assigned a tag, but the method is allowed to exclude words from the clustering, if their distribution does not match closely enough with other words. The lexicon size is considerably larger than in previous approaches, resulting in a lower outofvocabulary (OOV) rate and in a more consistent tagging. The system presented here is available for download as opensource software along with tagger models for several languages, so the contributions of this work can be easily incorporated into other applications.
more …
By
Guillaume, Bruno; Perrier, Guy
2 Citations
Interaction Grammars are a grammatical formalism based on the notion of polarity. Polarities express the resource sensitivity of natural languages by modelling the distinction between saturated and unsaturated syntactic structures. Syntactic composition is represented as a chemical reaction guided by the saturation of polarities. It is expressed in a modeltheoretic framework where grammars are constraint systems using the notion of tree description and parsing appears as a process of building tree description models satisfying criteria of saturation and minimality.
more …
By
Devereux, Barry; Pilkington, Nicholas; Poibeau, Thierry; Korhonen, Anna
Show all (4)
9 Citations
In recent years a number of methods have been proposed for the automatic acquisition of featurebased conceptual representations from text corpora. Such methods could offer valuable support for theoretical research on conceptual representation. However, existing methods do not target the full range of conceptrelationfeature triples occurring in humangenerated norms (e.g. flute produce sound) but rather focus on conceptfeature pairs (e.g. flute – sound) or triples involving specific relations only (e.g. isa or partof relations). In this article we investigate the challenges that need to be met in both methodology and evaluation when moving towards the acquisition of more comprehensive conceptual representations from corpora. In particular, we investigate the usefulness of three types of knowledge in guiding the extraction process: encyclopedic, syntactic and semantic. We present first a semantic analysis of existing, humangenerated feature production norms, which reveals information about cooccurring concept and feature classes. We introduce then a novel method for largescale feature extraction which uses the classbased information to guide the acquisition process. The method involves extracting candidate triples consisting of concepts, relations and features (e.g. deer have antlers, flute produce sound) from corpus data parsed for grammatical dependencies, and reweighting the triples on the basis of conditional probabilities calculated from our semantic analysis. We apply this method to an automatically parsed Wikipedia corpus which includes encyclopedic information and evaluate its accuracy using a number of different methods: direct evaluation against the McRae norms in terms of feature types and frequencies, human evaluation, and novel evaluation in terms of conceptual structure variables. Our investigation highlights a number of issues which require addressing in both methodology and evaluation when aiming to improve the accuracy of unconstrained feature extraction further.
more …
By
Rus, Vasile; McCarthy, Philip M.; Graesser, Arthur C.; McNamara, Danielle S.
Show all (4)
5 Citations
We show in this article how an approach developed for the task of recognizing textual entailment relations can be extended to identify paraphrase and elaboration relations. Entailment is a unidirectional relation between two sentences in which one sentence logically infers the other. There seems to be a close relation between entailment and two other sentencetosentence relations: elaboration and paraphrase. This close relation is discussed to theoretically justify the newly derived approaches. The proposed approaches use lexical, syntactic, and shallow negation handling. The proposed approaches offer significantly better results than several baselines. When compared to other paraphrase and elaboration approaches they produce similar or better results. We report results on several data sets: the Microsoft Research Paraphrase corpus, a benchmark for evaluating approaches to paraphrase identification, and a data set collected from highschool students’ interactions with an intelligent tutoring system iSTART, which includes both paraphrase and elaboration utterances.
more …
By
Gabbay, Dov M.; Marcelino, Sérgio
6 Citations
A reactive graph generalizes the concept of a graph by making it dynamic, in the sense that the arrows coming out from a point depend on how we got there.
This idea was first applied to Kripke semantics of modal logic in [2]. In this paper we strengthen that unimodal language by adding a second operator. One operator corresponds to the dynamics relation and the other one relates paths with the same endpoint. We explore the expressivity of this interpretation by axiomatizing some natural subclasses of reactive frames.
The main objective of this paper is to present a methodology to study reactive logics using the existent classic techniques.
more …
By
Gabbay, Dov M.; Garcez, Artur S. d’Avila
21 Citations
This paper studies methodologically robust options for giving logical contents to nodes in abstract argumentation networks. It defines a variety of notions of attack in terms of the logical contents of the nodes in a network. General properties of logics are refined both in the object level and in the metalevel to suit the needs of the application. The networkbased system improves upon some of the attempts in the literature to define attacks in terms of defeasible proofs, the socalled rulebased systems. We also provide a number of examples and consider a rigorous case study, which indicate that our system does not suffer from anomalies. We define consequence relations based on a notion of defeat, consider rationality postulates, and prove that one such consequence relation is consistent.
more …
By
Gabbay, Dov M.
33 Citations
This paper is part of a research program centered around argumentation networks and offering several research directions for argumentation networks, with a view of using such networks for integrating logics and network reasoning.
In Section 1 we introduce our program manifesto. In Section 2 we motivate and show how to substitute one argumentation network as a node in another argumentation network.
Substitution is a purely logical operation and doing it for networks, besides developing their theory further, also helps us see how to bring logic and networks closer together.
Section 3 develops the formal properties of the new kind of network and Section 4 offers general discussion and comparison with the literature.
more …
By
Gabbay, Dov M.
12 Citations
Given an argumentation network we associate with it a modal formula representing the ‘logical content’ of the network. We show a onetoone correspondence between all possible complete Caminada labellings of the network and all possible models of the formula.
more …
By
Caminada, Martin W. A.; Gabbay, Dov M.
92 Citations
In the current paper, we reexamine how abstract argumentation can be formulated in terms of labellings, and how the resulting theory can be applied in the field of modal logic. In particular, we are able to express the (complete) extensions of an argumentation framework as models of a set of modal logic formulas that represents the argumentation framework. Using this approach, it becomes possible to define the grounded extension in terms of modal logic entailment.
more …
By
Gabbay, Dov M.; Szałas, Andrzej
5 Citations
In the current paper we consider theories with vocabulary containing a number of binary and unary relation symbols. Binary relation symbols represent labeled edges of a graph and unary relations represent unique annotations of the graph’s nodes. Such theories, which we call annotation theories, can be used in many applications, including the formalization of argumentation, approximate reasoning, semantics of logic programs, graph coloring, etc. We address a number of problems related to annotation theories over finite models, including satisfiability, querying problem, specification of preferred models and model checking problem.
We show that most of considered problems are NPTime or coNPTimecomplete. In order to reduce the complexity for particular theories, we use secondorder quantifier elimination. To our best knowledge none of existing methods works in the case of annotation theories. We then provide a new secondorder quantifier elimination method for stratified theories, which is successful in the considered cases. The new result subsumes many other results, including those of [2, 28, 21].
more …
By
Wu, Yining; Caminada, Martin; Gabbay, Dov M.
30 Citations
In this paper, we prove the correspondence between complete extensions in abstract argumentation and 3valued stable models in logic programming. This result is in line with earlier work of [6] that identified the correspondence between the grounded extension in abstract argumentation and the wellfounded model in logic programming, as well as between the stable extensions in abstract argumentation and the stable models in logic programming.
more …
By
Gabbay, Dov M.
17 Citations
In 2005 the author introduced networks which allow attacks on attacks of any level. So if a → b reads a attacks b, then this attack can itself be attacked by another node c. This attack itself can attack another node d. This situation can be iterated to any level with attacks and nodes attacking other attacks and other nodes.
In this paper we provide semantics (of extensions) to such networks. We offer three different approaches to obtaining semantics.
1.
The translation approach
This uses the methodology of ‘Logic by translation’. We translate faithfully the new
networks into ordinary Dung networks with more nodes and extract the semantics
from the translation.
2.
The labelling approach
This method regards the arrows as additional entities to be attacked and to mount attacks and applies a variation of the usual machinery of Camindada like labelling to the network. The new concept we need to employ here is that of ‘joint attacks’.
3.
The logic programming approach
We translate the higher level network into a logic program and obtain semantics for it through known semantics for logic programs.
We then compare our methods with those of S. Modgil and P. M. Dung et al.
more …
By
Boella, Guido; Gabbay, Dov M.; Torre, Leendert; Villata, Serena
Show all (4)
26 Citations
In this paper, we introduce the methodology and techniques of metaargumentation to model argumentation. The methodology of metaargumentation instantiates Dung’s abstract argumentation theory with an extended argumentation theory, and is thus based on a combination of the methodology of instantiating abstract arguments, and the methodology of extending Dung’s basic argumentation frameworks with other relations among abstract arguments. The technique of metaargumentation applies Dung’s theory of abstract argumentation to itself, by instantiating Dung’s abstract arguments with metaarguments using a technique called flattening. We characterize the domain of instantiation using a representation technique based on soundness and completeness. Finally, we distinguish among various instantiations using the technique of specification languages.
more …
By
Ji, Donghong; Zhao, Shiju; Xiao, Guozheng
1 Citations
In this paper, we address the problem of document reranking in information retrieval, which is usually conducted after initial retrieval to improve rankings of relevant documents. To deal with this problem, we propose a method which automatically constructs a term resource specific to the document collection and then applies the resource to document reranking. The term resource includes a list of terms extracted from the documents as well as their weighting and correlations computed after initial retrieval. The term weighting based on local and global distribution ensures the reranking not sensitive to different choices of pseudo relevance, while the term correlation helps avoid any bias to certain specific concept embedded in queries. Experiments with NTCIR3 data show that the approach can not only improve performance of initial retrieval, but also make significant contribution to standard query expansion.
more …
By
Baldwin, Timothy
1 Citations
This research looks at the effects of segment order and segmentation on translation retrieval performance for an experimental Japanese–English translation memory system. We implement a number of both bagofwords and segmentordersensitive string comparison methods, and test each over characterbased and wordbased indexing using ngrams of various orders. To evaluate accuracy, we propose an automatic method which identifies the targetlanguage string(s) which would lead to the optimal translation for a given input, based on analysis of the heldout translation and the current contents of the translation memory. Our results indicate that characterbased indexing is superior to wordbased indexing, and also that bagofwords methods are equivalent to segmentordersensitive methods in terms of accuracy but vastly superior in terms of retrieval speed, suggesting that word segmentation and segmentorder sensitivity are unnecessary luxuries for translation retrieval.
more …
By
Koehn, Philipp
14 Citations
We investigate novel types of assistance for human translators, based on statistical machine translation methods. We developed the computeraided tool Caitra that makes suggestions for sentence completion, shows word and phrase translation options, and allows postediting of machine translation output. We carried out a study of the translation process that involved nonprofessional translators that were native in either French or English and recorded their interaction with the tool. Users translated 192 sentences from French news stories into English. Most translators were faster and better when assisted by our tool. A detailed examination of the logs also provides insight into the human translation process, such as time spent on different activities and length of pauses.
more …
By
Kuo, Chenli; Ramsay, Allan
Despite the importance of intonation in spoken languages, deeper linguistic information encoded in prosody is rarely taken into account in speechtospeech machine translation systems. This paper concerns the translation of spoken English into Mandarin Chinese, paying particular attention to the emphatic/contrastive focus in questions which is realised by means of phonological stress in spoken English but by lexical and syntactic devices in Mandarin. There are two main reasons to translate phonologically marked emphatic/contrastive focus with other linguistic devices: firstly, different languages tend to use different devices to express emphatic/contrastive focus; secondly, the production of prosody in texttospeech systems is far from perfect. In this paper, a translation framework which is capable of treating emphatic/contrastive focus is outlined and focus rules are developed. The framework has been tested on a corpus of 207 utterances in the domain of asthma, although the focus rules are not domainspecific.
more …
By
Hashimoto, Chikara; Kawahara, Daisuke
4 Citations
Some phrases can be interpreted in their context either idiomatically (figuratively) or literally. The precise identification of idioms is essential in order to achieve fullfledged natural language processing. Because of this, the authors of this paper have created an idiom corpus for Japanese. This paper reports on the corpus itself and the results of an idiom identification experiment conducted using the corpus. The corpus targeted 146 ambiguous idioms, and consists of 102,856 examples, each of which is annotated with a literal/idiomatic label. All sentences were collected from the World Wide Web. For idiom identification, 90 out of the 146 idioms were targeted and a word sense disambiguation (WSD) method was adopted using both common WSD features and idiomspecific features. The corpus and the experiment are both, as far as can be determined, the largest of their kinds. It was discovered that a standard supervised WSD method works well for idiom identification and it achieved accuracy levels of 89.25 and 88.86%, with and without idiomspecific features, respectively. It was also found that the most effective idiomspecific feature is the one that involves the adjacency of idiom constituents.
more …
By
Abdel Monem, Azza; Shaalan, Khaled; Rafea, Ahmed; Baraka, Hoda
Show all (4)
10 Citations
The interlingual approach to machine translation (MT) is used successfully in multilingual translation. It aims to achieve the translation task in two independent steps. First, meanings of the sourcelanguage sentences are represented in an intermediate languageindependent (Interlingua) representation. Then, sentences of the target language are generated from those meaning representations. Arabic natural language processing in general is still underdeveloped and Arabic natural language generation (NLG) is even less developed. In particular, Arabic NLG from Interlinguas was only investigated using templatebased approaches. Moreover, tools used for other languages are not easily adaptable to Arabic due to the language complexity at both the morphological and syntactic levels. In this paper, we describe a rulebased generation approach for taskoriented Interlinguabased spoken dialogue that transforms a relatively shallow semantic interlingual representation, called interchange format (IF), into Arabic text that corresponds to the intentions underlying the speaker’s utterances. This approach addresses the handling of the problems of Arabic syntactic structure determination, and Arabic morphological and syntactic generation within the Interlingual MT approach. The generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in Ecommerce) multilingual speechtospeech MT project. The IFtoArabic generator is implemented in SICStus Prolog. We conducted evaluation experiments using the input and output from the English analyzer that was developed by the NESPOLE! team at Carnegie Mellon University. The results of these experiments were promising and confirmed the ability of the rulebased approach in generating Arabic translation from the Interlingua taken from the travel and tourism domain.
more …
By
Hansson, Sven Ove
6 Citations
Two types of measures of probabilistic uncertainty are introduced and investigated. Dispersion measures report how diffused the agent’s secondorder probability distribution is over the range of firstorder probabilities. Robustness measures reflect the extent to which the agent’s assessment of the prior (objective) probability of an event is perturbed by information about whether or not the event actually took place. The properties of both types of measures are investigated. The most obvious type of robustness measure is shown to coincide with one of the major candidates for a dispersion measure, the mean square deviation measure.
more …
By
BonneauMaynard, Hélène; Quignard, Matthieu; Denis, Alexandre
5 Citations
The aim of the French Media project was to define a protocol for the evaluation of speech understanding modules for dialog systems. Accordingly, a corpus of 1,257 real spoken dialogs related to hotel reservation and tourist information was recorded, transcribed and semantically annotated, and a semantic attributevalue representation was defined in which each conceptual relationship was represented by the names of the attributes. Two semantic annotation levels are distinguished in this approach. At the first level, each utterance is considered separately and the annotation represents the meaning of the statement without taking into account the dialog context. The second level of annotation then corresponds to the interpretation of the meaning of the statement by taking into account the dialog context; in this way a semantic representation of the dialog context is defined. This paper discusses the data collection, the detailed definition of both annotation levels, and the annotation scheme. Then the paper comments on both evaluation campaigns which were carried out during the project and discusses some results.
more …
By
Benthem, Johan; Gerbrandy, Jelle; Kooi, Barteld
30 Citations
Current dynamicepistemic logics model different types of information change in multiagent scenarios. We generalize these logics to a probabilistic setting, obtaining a calculus for multiagent update with three natural slots: prior probability on states, occurrence probabilities in the relevant process taking place, and observation probabilities of events. To match this update mechanism, we present a complete dynamic logic of information change with a probabilistic character. The completeness proof follows a compositional methodology that applies to a much larger class of dynamicprobabilistic logics as well. Finally, we discuss how our basic update rule can be parameterized for different update policies, or learning methods.
more …
By
Bulińska, Maria
2 Citations
Nonassociative Lambek Calculus (NL) is a syntactic calculus of types introduced by Lambek [8]. The polynomial time decidability of NL was established by de Groote and Lamarche [4]. Buszkowski [3] showed that systems of NL with finitely many assumptions are decidable in polynomial time and generate contextfree languages; actually the PTIME complexity is established for the consequence relation of NL. Adapting the method of Buszkowski [3] we prove an analogous result for Nonassociative Lambek Calculus with unit (NL1). Moreover, we show that any Lambek grammar based on NL1 (with assumptions) can be transformed into an equivalent contextfree grammar in polynomial time.
more …
By
Cowen, Robert
The expressive power of 2cnfs, conjunctive normal forms with two literals per clause, is shown to be severely limited compared to 3cnfs.
By
Staruch, Bogdan
1 Citations
This paper presents the first purely algebraic characterization of classes of partial algebras definable by a set of strong equations. This result was posible due to new tools such as invariant congruences, i.e. a generalization of the notion of a fully invariant congruence, and extension of algebras, specific for strong equations.
more …
By
Przybocki, Mark; Peterson, Kay; Bronsart, Sébastien; Sanders, Gregory
Show all (4)
3 Citations
This paper discusses the evaluation of automated metrics developed for the purpose of evaluating machine translation (MT) technology. A general discussion of the usefulness of automated metrics is offered. The NIST MetricsMATR evaluation of MT metrology is described, including its objectives, protocols, participants, and test data. The methodology employed to evaluate the submitted metrics is reviewed. A summary is provided for the general classes of evaluated metrics. Overall results of this evaluation are presented, primarily by means of correlation statistics, showing the degree of agreement between the automated metric scores and the scores of human judgments. Metrics are analyzed at the sentence, document, and system level with results conditioned by various properties of the test data. This paper concludes with some perspective on the improvements that should be incorporated into future evaluations of metrics for MT evaluation.
more …
By
Snover, Matthew G.; Madnani, Nitin; Dorr, Bonnie; Schwartz, Richard
Show all (4)
26 Citations
This paper describes a new evaluation metric, TERPlus (TERp) for automatic evaluation of machine translation (MT). TERp is an extension of Translation Edit Rate (TER). It builds on the success of TER as an evaluation metric and alignment tool and addresses several of its weaknesses through the use of paraphrases, stemming, synonyms, as well as edit costs that can be automatically optimized to correlate better with various types of human judgments. We present a correlation study comparing TERp to BLEU, METEOR and TER, and illustrate that TERp can better evaluate translation adequacy.
more …
By
Lavie, Alon; Denkowski, Michael J.
24 Citations
The Meteor Automatic Metric for Machine Translation evaluation, originally developed and released in 2004, was designed with the explicit goal of producing sentencelevel scores which correlate well with human judgments of translation quality. Several key design decisions were incorporated into Meteor in support of this goal. In contrast with IBM’s Bleu, which uses only precisionbased features, Meteor uses and emphasizes recall in addition to precision, a property that has been confirmed by several metrics as being critical for high correlation with human judgments. Meteor also addresses the problem of reference translation variability by utilizing flexible word matching, allowing for morphological variants and synonyms to be taken into account as legitimate correspondences. Furthermore, the feature ingredients within Meteor are parameterized, allowing for the tuning of the metric’s free parameters in search of values that result in optimal correlation with human judgments. Optimal parameters can be separately tuned for different types of human judgments and for different languages. We discuss the initial design of the Meteor metric, subsequent improvements, and performance in several independent evaluations in recent years.
more …
By
Wong, Billy; Kit, Chunyu
3 Citations
We propose a novel metric ATEC for automatic MT evaluation based on explicit assessment of word choice and word order in an MT output in comparison to its reference translation(s), the two most fundamental factors in the construction of meaning for a sentence. The former is assessed by matching word forms at various linguistic levels, including surface form, stem, sound and sense, and further by weighing the informativeness of each word. The latter is quantified in term of the discordance of word position and word sequence between a translation candidate and its reference. In the evaluations using the MetricsMATR08 data set and the LDC MTC2 and MTC4 corpora, ATEC demonstrates an impressive positive correlation to human judgments at the segment level, highly comparable to the few stateoftheart evaluation metrics.
more …
By
Baroni, Marco; Bernardini, Silvia; Ferraresi, Adriano; Zanchetta, Eros
Show all (4)
184 Citations
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Italian built by web crawling, and describes the methodology and tools used in their construction. The corpora contain more than a billion words each, and are thus among the largest resources for the respective languages. The paper also provides an evaluation of their suitability for linguistic research, focusing on ukWaC and itWaC. A comparison in terms of lexical coverage with existing resources for the languages of interest produces encouraging results. Qualitative evaluation of ukWaC versus the British National Corpus was also conducted, so as to highlight differences in corpus composition (text types and subject matters). The article concludes with practical information about format and availability of corpora and tools.
more …
By
Leusch, Gregor; Ney, Hermann
2 Citations
We present two evaluation measures for Machine Translation (MT), which are defined as error rates extended by block moves. In contrast to Ter, these measures are constrained in a way that allows for an exact calculation in polynomial time. We then investigate three methods to estimate the standard error of error rates, and compare them to bootstrap estimates. We assess the correlation of our proposed measures with human judgment using data from the National Institute of Standards and Technology (NIST) 2008 MetricsMATR workshop.
more …
By
Pedersen, Bolette Sandford; Nimb, Sanni; Asmussen, Jørg; Sørensen, Nicolai Hartvig; TrapJensen, Lars; Lorentzen, Henrik
Show all (6)
6 Citations
This paper is a contribution to the discussion on compiling computational lexical resources from conventional dictionaries. It describes the theoretical as well as practical problems that are encountered when reusing a conventional dictionary for compiling a lexicalsemantic resource in terms of a wordnet. More specifically, it describes the methodological issues of compiling a wordnet for Danish, DanNet, from a monolingual basis, and not—as is often seen—by applying the translational expansion method with Princeton WordNet as the English source. Thus, we apply as our basis a large, corpusbased printed dictionary of modern Danish. Using this approach, we discuss the issues of readjusting inconsistent and/or underspecified hyponymy hierarchies taken from the conventional dictionary, sense distinctions as opposed to the synonym sets of wordnets, generating semantic wordnet relations on the basis of sense definitions, and finally, supplementing missing or implicit information.
more …
By
Kahn, Jeremy G.; Snover, Matthew; Ostendorf, Mari
3 Citations
Recent efforts to develop new machine translation evaluation methods have tried to account for allowable wording differences either in terms of syntactic structure or synonyms/paraphrases. This paper primarily considers syntactic structure, combining scores from partial syntactic dependency matches with standard local ngram matches using a statistical parser, and taking advantage of Nbest parse probabilities. The new scoring metric, expected dependency pair match (EDPM), is shown to outperform BLEU and TER in terms of correlation to human judgments and as a predictor of HTER. Further, we combine the syntactic features of EDPM with the alternative wording features of TERp, showing a benefit to accounting for syntactic structure on top of semantic equivalency features.
more …
By
Chan, Yee Seng; Ng, Hwee Tou
1 Citations
This paper evaluates the performance of our recently proposed automatic machine translation evaluation metric MaxSim and examines the impact of translation fluency on the metric. MaxSim calculates a similarity score between a pair of English systemreference sentences by comparing information items such as ngrams across the sentence pair. Unlike most metrics which perform binary matching, MaxSim also computes similarity scores between items and models them as nodes in a bipartite graph to select a maximum weight matching. Our experiments show that MaxSim is competitive with stateoftheart metrics on benchmark datasets.
more …
By
Padó, Sebastian; Cer, Daniel; Galley, Michel; Jurafsky, Dan; Manning, Christopher D.
Show all (5)
5 Citations
Current evaluation metrics for machine translation have increasing difficulty in distinguishing good from merely fair translations. We believe the main problem to be their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that assesses the quality of MT output through its semantic equivalence to the reference translation, based on a rich set of match and mismatch features motivated by textual entailment. We first evaluate this metric in an evaluation setting against a combination metric of four stateoftheart scores. Our metric predicts human judgments better than the combination metric. Combining the entailment and traditional features yields further improvements. Then, we demonstrate that the entailment metric can also be used as learning criterion in minimum error rate training (MERT) to improve parameter estimation in MT system training. A manual evaluation of the resulting translations indicates that the new model obtains a significant improvement in translation quality.
more …
By
Dębowski, Łukasz
3 Citations
This paper discusses two new procedures for extracting verb valences from raw texts, with an application to the Polish language. The first novel technique, the EM selection algorithm, performs unsupervised disambiguation of valence frame forests, obtained by applying a nonprobabilistic deep grammar parser and some postprocessing to the text. The second new idea concerns filtering of incorrect frames detected in the parsed text and is motivated by an observation that verbs which take similar arguments tend to have similar frames. This phenomenon is described in terms of newly introduced cooccurrence matrices. Using cooccurrence matrices, we split filtering into two steps. The list of valid arguments is first determined for each verb, whereas the pattern according to which the arguments are combined into frames is computed in the following stage. Our best extracted dictionary reaches an Fscore of 45%, compared to an Fscore of 39% for the standard framebased BHT filtering.
more …
By
Abraham, M.; Gabbay, Dov M.; Schild, U.
13 Citations
We motivate and introduce a new method of abduction, Matrix Abduction, and apply it to modelling the use of nondeductive inferences in the Talmud such as Analogy and the rule of Argumentum A Fortiori. Given a matrix
$${\mathbb {A}}$$
with entries in {0, 1}, we allow for one or more blank squares in the matrix, say a_{i,j} =?. The method allows us to decide whether to declare a_{i,j} = 0 or a_{i,j} = 1 or a_{i,j} =? undecided. This algorithmic method is then applied to modelling several legal and practical reasoning situations including the Talmudic rule of KalVachomer. We add an Appendix showing that this new rule of Matrix Abduction, arising from the Talmud, can also be applied to the analysis of paradoxes in voting and judgement aggregation. In fact we have here a general method for executing nondeductive inferences.
more …
By
Gabbay, Dov M.; Schlechta, Karl
We investigate different aspects of independence here, in the context of theory revision, generalizing slightly work by Chopra, Parikh, and Rodrigues, and in the context of preferential reasoning.
By
Boella, Guido; Gabbay, Dov M.; Genovese, Valerio; Torre, Leendert
Show all (4)
10 Citations
We study access control policies based on the says operator by introducing a logical framework called Fibred Security Language (FSL) which is able to deal with features like joint responsibility between sets of principals and to identify them by means of firstorder formulas. FSL is based on a multimodal logic methodology. We first discuss the main contributions from the expressiveness point of view, we give semantics for the language both for classical and intuitionistic fragment), we then prove that in order to express wellknown properties like ‘speaksfor’ or ‘handoff’, defined in terms of says, we do not need secondorder logic (unlike previous approaches) but a decidable fragment of firstorder logic suffices. We propose a modeldriven study of the says axiomatization by constraining the Kripke models in order to respect desirable security properties, we study how existing access control logics can be translated into FSL and we give completeness for the logic.
more …
By
Barker, Steve; Boella, Guido; Gabbay, Dov M.; Genovese, Valerio
Show all (4)
5 Citations
The issue of representing access control requirements continues to demand significant attention. The focus of researchers has traditionally been on developing particular access control models and policy specification languages for particular applications. However, this approach has resulted in an unnecessary surfeit of models and languages. In contrast, we describe a general access control model and a logicbased specification language from which both existing and novel access control models may be derived as particular cases and from which several approaches can be developed for domainspecific applications. We will argue that our general framework has a number of specific attractions and an implication of our work is to encourage a methodological shift from a study of the particulars of access control to its generalities.
more …
By
Gabbay, Dov M.; Szałas, Andrzej
Mathematical theory of voting and social choice has attracted much attention. In the general setting one can view social choice as a method of aggregating individual, often conflicting preferences and making a choice that is the best compromise. How preferences are expressed and what is the “best compromise” varies and heavily depends on a particular situation.
The method we propose in this paper depends on expressing individual preferences of voters and specifying properties of the resulting ranking by means of firstorder formulas. Then, as a technical tool, we use methods of secondorder quantifier elimination to analyze and compute results of voting. We show how to specify voting, how to compute resulting rankings and how to verify voting protocols.
more …
By
Riggle, Jason
2 Citations
This paper provides a brief algebraic characterization of constraint violations in Optimality Theory (OT). I show that if violations are taken to be multisets over a fixed basis set Con then the merge operator on multisets and a ‘min’ operation expressed in terms of harmonic inequality provide a semiring over violation profiles. This semiring allows standard optimization algorithms to be used for OT grammars with weighted finitestate constraints in which the weights are violationmultisets. Most usefully, because multisets are unordered, the merge operation is commutative and thus it is possible to give a single graph representation of the entire class of grammars (i.e. rankings) for a given constraint set. This allows a neat factorization of the optimization problem that isolates the main source of complexity into a single constant γ denoting the size of the graph representation of the whole constraint set. I show that the computational cost of optimization is linear in the length of the underlying form with the multiplicative constant γ. This perspective thus makes it straightforward to evaluate the complexity of optimization for different constraint sets.
more …
By
Aguzzoli, Stefano; Bianchi, Matteo; Marra, Vincenzo
3 Citations
In the context of truthfunctional propositional manyvalued logics, Hájek’s Basic Fuzzy Logic BL [14] plays a major rôle. The completeness theorem proved in [7] shows that BL is the logic of all continuous tnorms and their residua. This result, however, does not directly yield any meaningful interpretation of the truth values in BL per se. In an attempt to address this issue, in this paper we introduce a complete temporal semantics for BL. Specifically, we show that BL formulas can be interpreted as modal formulas over a flow of time, where the logic of each instant is Łukasiewicz, with a finite or infinite number of truth values. As a main result, we obtain validity with respect to all flows of times that are nonbranching to the future, and completeness with respect to all finite linear flows of time, or to an appropriate single infinite linear flow of time. It may be argued that this reduces the problem of establishing a meaningful interpretation of the truth values in BL logic to the analogous problem for Łukasiewicz logic.
more …
By
Malinowski, Grzegorz
The actual introduction of a nonreflexive and nonidempotent qconsequence gave birth to the concept of logical threevaluedness based on the idea of noncomplementary categories of rejection and acceptance. A qconsequence may not have bivalent description, the property claimed by Suszko’s Thesis on logical twovaluedness, (ST), of structural logics, i.e. structural consequence operations. Recall that (ST) shifts logical values over the set of matrix values and it refers to the division of matrix universe into two subsets of designated and undesignated elements using their characteristic functions as logical valuations, cf. [4] The extension of the idea operates with threevalued function, with the third value ascribed to those elements of the matrix which are neither rejected nor accepted. Accordingly, the logical threevaluedness departs naturally from the division of the matrix universe into three subsets and the (ST) counterpart says that any inference based on a structural qconsequence may have a bivalent or a threevalued description.
After a short presentation of the threevalued inferential framework, we discuss a solution for further exploration of the idea leading to logical nvaluedness for n > 3. Apparently, the first step in that direction is easy and it consists of a division of the matrix universe into more than three subsets. The next move, i.e. a definition of a matrix consequencelike relation being neither a consequence nor a qconsequence, seems extremely difficult. Therefore, here we consider only finite linear matrices with oneargument functions “labelling” respective matrix subsets. By means of these functions it is possible to represent a qconsequence as a “partial” Tarski’s consequence and, ultimately, to define a logically morevalued consequencelike relation. We believe, that the present partial proposal deserves an attention by itself but also that it may lead to a general approach to logically manyvalued inference.
more …
By
Marcos, João
12 Citations
What is the fundamental insight behind truthfunctionality? When is a logic interpretable by way of a truthfunctional semantics? To address such questions in a satisfactory way, a formal definition of truthfunctionality from the point of view of abstract logics is clearly called for. As a matter of fact, such a definition has been available at least since the 70s, though to this day it still remains not very widely wellknown.
A clear distinction can be drawn between logics characterizable through: (1) genuinely finitevalued truthtabular semantics; (2) no finitevalued but only an infinitevalued truthtabular semantics; (3) no truthtabular semantics at all. Any of those logics, however, can in principle be characterized through nontruthfunctional valuation semantics, at least as soon as their associated consequence relations respect the usual tarskian postulates. So, paradoxical as that might seem at first, it turns out that truthfunctional logics may be adequately characterized by nontruthfunctional semantics. Now, what feature of a given logic would guarantee it to dwell in class (1) or in class (2), irrespective of its circumstantial semantic characterization?
The present contribution will recall and examine the basic definitions, presuppositions and results concerning truthfunctionality of logics, and exhibit examples of logics indigenous to each of the aforementioned classes. Some problems pertaining to those definitions and to some of their conceivable generalizations will also be touched upon.
more …
By
Restall, Greg
14 Citations
I present an account of truth values for classical logic, intuitionistic logic, and the modal logic S5, in which truth values are not a fundamental category from which the logic is defined, but rather, an idealisation of more fundamental logical features in the proof theory for each system. The result is not a new set of semantic structures, but a new understanding of how the existing semantic structures may be understood in terms of a more fundamental notion of logical consequence.
more …
By
Zaitsev, Dmitry
9 Citations
In their useful logic for a computer network Shramko and Wansing generalize initial values of Belnap’s 4valued logic to the set 16 to be the powerset of Belnap’s 4. This generalization results in a very specific algebraic structure — the trilattice SIXTEEN_{3} with three orderings: information, truth and falsity. In this paper, a slightly different way of generalization is presented. As a base for further generalization a set 3 is chosen, where initial values are a — incoming data is asserted, d — incoming data is denied, and u — incoming data is neither asserted nor denied, that corresponds to the answer “don’t know”. In so doing, the powerset of 3, that is the set 8 is considered. It turns out that there are not three but four orderings naturally defined on the set 8 that form the tetralattice EIGHT_{4}. Besides three ordering relations mentioned above it is an extra uncertainty ordering. Quite predictably, the logics generated by a–order (truth order) and d–order (falsity order) coincide with firstdegree entailment. Finally logic with two kinds of operations (a–connectives and d–connectives) and consequence relation defined via a–ordering is considered. An adequate axiomatization for this logic is proposed.
more …
By
Cook, Roy T.
8 Citations
Truth values are, properly understood, merely proxies for the various relations that can hold between language and the world. Once truth values are understood in this way, consideration of the Liar paradox and the revenge problem shows that our language is indefinitely extensible, as is the class of truth values that statements of our language can take – in short, there is a proper class of such truth values. As a result, important and unexpected connections emerge between the semantic paradoxes and the settheoretic paradoxes.
more …
By
Avron, Arnon
4 Citations
According to Suszko’s Thesis, any multivalued semantics for a logical system can be replaced by an equivalent bivalent one. Moreover: bivalent semantics for families of logics can frequently be developed in a modular way. On the other hand bivalent semantics usually lacks the crucial property of analycity, a property which is guaranteed for the semantics of multivalued matrices. We show that one can get both modularity and analycity by using the semantic framework of multivalued nondeterministic matrices. We further show that for using this framework in a constructive way it is best to view “truthvalues” as information carriers, or “informationvalues”.
more …
By
Verhagen, Marc; Gaizauskas, Robert; Schilder, Frank; Hepple, Mark; Moszkowicz, Jessica; Pustejovsky, James
Show all (6)
28 Citations
TempEval is a framework for evaluating systems that automatically annotate texts with temporal relations. It was created in the context of the SemEval 2007 workshop and uses the TimeML annotation language. The evaluation consists of three subtasks of temporal annotation: anchoring an event to a time expression in the same sentence, anchoring an event to the document creation time, and ordering main events in consecutive sentences. In this paper we describe the TempEval task and the systems that participated in the evaluation. In addition, we describe how further task decomposition can bring even more structure to the evaluation of temporal relations.
more …
By
Markert, Katja; Nissim, Malvina
6 Citations
We describe the first shared task for figurative language resolution, which was organised within SemEval2007 and focused on metonymy. The paper motivates the linguistic principles of data sampling and annotation and shows the task’s feasibility via human agreement. The five participating systems mainly used supervised approaches exploiting a variety of features, of which grammatical relations proved to be the most useful. We compare the systems’ performance to automatic baselines as well as to a manually simulated approach based on selectional restriction violations, showing some limitations of this more traditional approach to metonymy recognition. The main problem supervised systems encountered is data sparseness, since metonymies in general tend to occur more rarely than literal uses. Also, within metonymies, the reading distribution is skewed towards a few frequent metonymy types. Future task developments should focus on addressing this issue.
more …
By
McCarthy, Diana; Navigli, Roberto
26 Citations
Since the inception of the Senseval series there has been a great deal of debate in the word sense disambiguation (WSD) community on what the right sense distinctions are for evaluation, with the consensus of opinion being that the distinctions should be relevant to the intended application. A solution to the above issue is lexical substitution, i.e. the replacement of a target word in context with a suitable alternative substitute. In this paper, we describe the English lexical substitution task and report an exhaustive evaluation of the systems participating in the task organized at SemEval2007. The aim of this task is to provide an evaluation where the sense inventory is not predefined and where performance on the task would bode well for applications. The task not only reflects WSD capabilities, but also can be used to compare lexical resources, whether manmade or automatically created, and has the potential to benefit several naturallanguage applications.
more …
By
Chen, Jinying; Palmer, Martha S.
4 Citations
This paper presents a highperformance broadcoverage supervised word sense disambiguation (WSD) system for English verbs that uses linguistically motivated features and a smoothed maximum entropy machine learning model. We describe three specific enhancements to our system’s treatment of linguistically motivated features which resulted in the best published results on SENSEVAL2 verbs. We then present the results of training our system on OntoNotes data, both the SemEval2007 task and additional data. OntoNotes data is designed to provide clear sense distinctions, based on using explicit syntactic and semantic criteria to group WordNet senses, with sufficient examples to constitute high quality, broad coverage training data. Using similar syntactic and semantic features for WSD, we achieve performance comparable to that of human taggers, and competitive with the top results for the SemEval2007 task. Empirical analysis of our results suggests that clarifying sense boundaries and/or increasing the number of training instances for certain verbs could further improve system performance.
more …
By
Girju, Roxana; Nakov, Preslav; Nastase, Vivi; Szpakowicz, Stan; Turney, Peter; Yuret, Deniz
Show all (6)
24 Citations
The NLP community has shown a renewed interest in deeper semantic analyses, among them automatic recognition of semantic relations in text. We present the development and evaluation of a semantic analysis task: automatic recognition of relations between pairs of nominals in a sentence. The task was part of SemEval2007, the fourth edition of the semantic evaluation event previously known as SensEval. Apart from the observations we have made, the longlasting effect of this task may be a framework for comparing approaches to the task. We introduce the problem of recognizing relations between nominals, and in particular the process of drafting and refining the definitions of the semantic relations. We show how we created the training and test data, list and briefly describe the 15 participating systems, discuss the results, and conclude with the lessons learned in the course of this exercise.
more …
By
Ferreira, Fernando; Ferreira, Gilda
9 Citations
Commuting conversions were introduced in the natural deduction calculus as ad hoc devices for the purpose of guaranteeing the subformula property in normal proofs. In a well known book, JeanYves Girard commented harshly on these conversions, saying that ‘one tends to think that natural deduction should be modified to correct such atrocities.’ We present an embedding of the intuitionistic predicate calculus into a secondorder predicative system for which there is no need for commuting conversions. Furthermore, we show that the redex and the conversum of a commuting conversion of the original calculus translate into equivalent derivations by means of a series of bidirectional applications of standard conversions.
more …
By
Ågotnes, Thomas; Hoek, Wiebe; RodríguezAguilar, Juan A.; Sierra, Carles; Wooldridge, Michael
Show all (5)
2 Citations
We define a multimodal version of Computation Tree Logic (ctl) by extending the language with path quantifiers E^{δ} and A^{δ} where δ denotes one of finitely many dimensions, interpreted over Kripke structures with one total relation for each dimension. As expected, the logic is axiomatised by taking a copy of a ctl axiomatisation for each dimension. Completeness is proved by employing the completeness result for ctl to obtain a model along each dimension in turn. We also show that the logic is decidable and that its satisfiability problem is no harder than the corresponding problem for ctl. We then demonstrate how Normative Systems can be conceived as a natural interpretation of such a multidimensional ctl logic.
more …
By
Fermüller, Christian G.; Metcalfe, George
14 Citations
In the 1970s, Robin Giles introduced a game combining Lorenzenstyle dialogue rules with a simple scheme for betting on the truth of atomic statements, and showed that the existence of winning strategies for the game corresponds to the validity of formulas in Łukasiewicz logic. In this paper, it is shown that ‘disjunctive strategies’ for Giles’s game, combining ordinary strategies for all instances of the game played on the same formula, may be interpreted as derivations in a corresponding proof system. In particular, such strategies mirror derivations in a hypersequent calculus developed in recent work on the proof theory of Łukasiewicz logic.
more …
By
Jarmużek, Tomasz; Pietruszczak, Andrzej
In this paper we examine Prior’s reconstruction of Master Argument [4] in some modaltense logic. This logic consists of a purely tense part and Diodorean definitions of modal alethic operators. Next we study this tense logic in the pure tense language. It is the logic K_{t}4 plus a new axiom (P): ‘p Λ Gp ⊃ P Gp’. This formula was used by Prior in his original analysis of Master Argument. (P) is usually added as an extra axiom to an axiomatization of the logic of linear time. In that case the set of moments is a total order and must be leftdiscrete without the least moment. However, the logic of Master Argument does not require linear time. We show what properties of the set of moments are exactly forced by (P) in the reconstruction of Prior. We make also some philosophical remarks on the analyzed reconstruction.
more …
By
Nurakunov, A. M.; Stronkowski, M. M.
3 Citations
For quasivarieties of algebras, we consider the property of having definable relative principal subcongruences, a generalization of the concepts of definable relative principal congruences and definable principal subcongruences. We prove that a quasivariety of algebras with definable relative principal subcongruences has a finite quasiequational basis if and only if the class of its relative (finitely) subdirectly irreducible algebras is strictly elementary. Since a finitely generated relatively congruencedistributive quasivariety has definable relative principal subcongruences, we get a new proof of the result due to D. Pigozzi: a finitely generated relatively congruencedistributive quasivariety has a finite quasiequational basis.
more …
By
Ferenczi, Miklós
If the language is extended by new individual variables, in classical first order logic, then the deduction system obtained is a conservative extension of the original one. This fails to be true for the logics with infinitary predicates. But it is shown that restricting the commutativity of quantifiers and the equality axioms in the extended system and supposing the merrygoround property in the original system, the foregoing extension is already conservative. It is shown that these restrictions are crucial for an extension to be conservative. The origin of the results is algebraic logic.
more …
By
Saurí, Roser; Pustejovsky, James
48 Citations
Recent work in computational linguistics points out the need for systems to be sensitive to the veracity or factuality of events as mentioned in text; that is, to recognize whether events are presented as corresponding to actual situations in the world, situations that have not happened, or situations of uncertain interpretation. Event factuality is an important aspect of the representation of events in discourse, but the annotation of such information poses a representational challenge, largely because factuality is expressed through the interaction of numerous linguistic markers and constructions. Many of these markers are already encoded in existing corpora, albeit in a somewhat fragmented way. In this article, we present FactBank, a corpus annotated with information concerning the factuality of events. Its annotation has been carried out from a descriptive framework of factuality grounded on both theoretical findings and data analysis. FactBank is built on top of TimeBank, adding to it an additional level of semantic information.
more …
By
Odintsov, Sergei P.
24 Citations
This work treats the problem of axiomatizing the truth and falsity consequence relations, ⊨_{t} and ⊨_{f} , determined via truth and falsity orderings on the trilattice SIXTEEN_{3} (Shramko and Wansing, 2005). The approach is based on a representation of SIXTEEN_{3} as a twiststructure over the twoelement Boolean algebra.
more …
By
Shramko, Yaroslav; Wansing, Heinrich
3 Citations
The famous “slingshot argument” developed by Church, Gödel, Quine and Davidson is often considered to be a formally strict proof of the Fregean conception that all true sentences, as well as all false ones, have one and the same denotation, namely their corresponding truth value: the true or the false. In this paper we examine the analysis of the slingshot argument by means of a nonFregean logic undertaken recently by A.Wóitowicz and put to the test her claim that the slingshot argument is in fact circular and presupposes what it intends to prove. We show that this claim is untenable. Nevertheless, the language of nonFregean logic can serve as a useful tool for representing the slingshot argument, and several versions of the slingshot argument in nonFregean logics are presented. In particular, a new version of the slingshot argument is presented, which can be circumvented neither by an appeal to a Russellian theory of definite descriptions nor by resorting to an analogous “Russellian” theory of λ–terms.
more …
By
Belnap, Nuel
2 Citations
The first section (§1) of this essay defends reliance on truth values against those who, on nominalistic grounds, would uniformly substitute a truth predicate. I rehearse some practical, Carnapian advantages of working with truth values in logic. In the second section (§2), after introducing the key idea of auxiliary parameters (§2.1), I look at several cases in which logics involve, as part of their semantics, an extra auxiliary parameter to which truth is relativized, a parameter that caters to special kinds of sentences. In many cases, this facility is said to produce truth values for sentences that on the face of it seem neither true nor false. Often enough, in this situation appeal is made to the method of supervaluations, which operate by “quantifying out” auxiliary parameters, and thereby produce something like a truth value. Logics of this kind exhibit striking differences. I first consider the role that Tarski gives to supervaluation in first order logic (§2.2), and then, after an interlude that asks whether neithertruenorfalse is itself a truth value (§2.3), I consider sentences with nondenoting terms (§2.4), vague sentences (§2.5), ambiguous sentences (§2.6), paradoxical sentences (§2.7), and futuretensed sentences in indeterministic tense logic (§2.8). I conclude my survey with a look at alethic modal logic considered as a cousin (§2.9), and finish with a few sentences of “advice to supervaluationists” (2.10), advice that is largely negative. The case for supervaluations as a road to truth is strong only when the auxiliary parameter that is “quantified out” is in fact irrelevant to the sentences of interest—as in Tarski’s definition of truth for classical logic. In all other cases, the best policy when reporting the results of supervaluation is to use only explicit phrases such as “settled true” or “determinately true,” never dropping the qualification.
more …
By
Fitting, Melvin
2 Citations
This is a largely expository paper in which the following simple idea is pursued. Take the truth value of a formula to be the set of agents that accept the formula as true. This means we work with an arbitrary (finite) Boolean algebra as the truth value space. When this is properly formalized, complete modal tableau systems exist, and there are natural versions of bisimulations that behave well from an algebraic point of view. There remain significant problems concerning the proper formalization, in this context, of natural language statements, particularly those involving negative knowledge and common knowledge. A case study is presented which brings these problems to the fore. None of the basic material presented here is new to this paper—all has appeared in several papers over many years, by the present author and by others. Much of the development in the literature is more general than here—we have confined things to the Boolean case for simplicity and clarity. Most proofs are omitted, but several of the examples are new. The main virtue of the present paper is its coherent presentation of a systematic point of view—identify the truth value of a formula with the set of those who say the formula is true.
more …
By
Hájek, Petr
10 Citations
Some aspects of vagueness as presented in Shapiro’s book Vagueness in Context [23] are analyzed from the point of fuzzy logic. Presented are some generalizations of Shapiro’s formal apparatus.
By
Font, Josep Maria
13 Citations
This is a contribution to the discussion on the role of truth degrees in manyvalued logics from the perspective of abstract algebraic logic. It starts with some thoughts on the socalled Suszko’s Thesis (that every logic is twovalued) and on the conception of semantics that underlies it, which includes the truthpreserving notion of consequence. The alternative usage of truth values in order to define logics that preserve degrees of truth is presented and discussed. Some recent works studying these in the particular cases of Łukasiewicz’s manyvalued logics and of logics associated with varieties of residuated lattices are also presented. Finally the extension of this paradigm to other, more general situations is discussed, highlighting the need for philosophical or applied motivations in the selection of the truth degrees, due both to the interpretation of the idea of truth degree and to some mathematical difficulties.
more …
By
Stevenson, Mark; Greenwood, Mark A.
9 Citations
Several techniques for the automatic acquisition of Information Extraction (IE) systems have used dependency trees to form the basis of an extraction pattern representation. These approaches have used a variety of pattern models (schemes for representing IE patterns based on particular parts of the dependency analysis). An appropriate pattern model should be expressive enough to represent the information which is to be extracted from text without being overly complex. Previous investigations into the appropriateness of the currently proposed models have been limited. This paper compares a variety of pattern models, including ones which have been previously reported and variations of them. Each model is evaluated using existing data consisting of IE scenarios from two very different domains (newswire stories and biomedical journal articles). The models are analysed in terms of their ability to represent relevant information, number of patterns generated and performance on an IE scenario. It was found that the best performance was observed from two models which use the majority of relevant portions of the dependency tree without including irrelevant sections.
more …
By
Seretan, Violeta; Wehrli, Eric
7 Citations
An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POStagging, or shallow parsing). This article presents a collocation extraction system based on the full parsing of source corpora, which supports four languages: English, French, Spanish, and Italian. The performance of the system is compared against that of the standard mobilewindow method. The evaluation experiment investigates several levels of the significance lists, uses a finegrained annotation schema, and covers all the languages supported. Consistent results were obtained for these languages: parsing, even if imperfect, leads to a significant improvement in the quality of results, in terms of collocational precision (between 16.4 and 29.7%, depending on the language; 20.1% overall), MWE precision (between 19.9 and 35.8%; 26.1% overall), and grammatical precision (between 47.3 and 67.4%; 55.6% overall). This positive result bears a high importance, especially in the perspective of the subsequent integration of extraction results in other NLP applications.
more …
By
Kallmeyer, Laura
3 Citations
Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of nonlocal MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. i.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing.
more …
By
Witt, Andreas; Heid, Ulrich; Sasaki, Felix; Sérasset, Gilles
Show all (4)
7 Citations
This article introduces the topic of “Multilingual language resources and interoperability”. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.
more …
By
Soria, Claudia; Monachini, Monica; Bertagna, Francesca; Calzolari, Nicoletta; Huang, ChuRen; Hsieh, ShuKai; Marchetti, Andrea; Tesconi, Maurizio
Show all (8)
3 Citations
In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing on the particular case of mutual linking and crosslingual enrichment of two wordnets, the ItalWordNet and Sinica BOW lexicons. This is intended as a casestudy investigating the needs and requirements of semiautomatic integration and interoperability of lexical resources, in the view of developing a prototype web application to support the GlobalWordNet Grid initiative.
more …
By
Tiede, HansJörg; Kepser, Stephan
1 Citations
Model theoretic syntax is concerned with studying the descriptive complexity of grammar formalisms for natural languages by defining their derivation trees in suitable logical formalisms. The central tool for model theoretic syntax has been monadic secondorder logic (MSO). Much of the recent research in this area has been concerned with finding more expressive logics to capture the derivation trees of grammar formalisms that generate noncontextfree languages. The motivation behind this search for more expressive logics is to describe formally certain mildly contextsensitive phenomena of natural languages. Several extensions to MSO have been proposed, most of which no longer define the derivation trees of grammar formalisms directly, while others introduce logically odd restrictions. We therefore propose to consider firstorder transitive closure logic. In this logic, derivation trees can be defined in a direct way. Our main result is that transitive closure logic, even deterministic transitive closure logic, is more expressive in defining classes of tree languages than MSO. (Deterministic) transitive closure logics are capable of defining nonregular tree languages that are of interest to linguistics.
more …
By
Sharoff, Serge; Babych, Bogdan; Hartley, Anthony
2 Citations
In this paper we present a tool that uses comparable corpora to find appropriate translation equivalents for expressions that are considered by translators as difficult. For a phrase in the source language the tool identifies a range of possible expressions used in similar contexts in target language corpora and presents them to the translator as a list of suggestions. In the paper we discuss the method and present results of human evaluation of the performance of the tool, which highlight its usefulness when dictionary solutions are lacking.
more …
By
Kozak, Michał
9 Citations
We prove the Finite Model Property (FMP) for Distributive Full Lambek Calculus (DFL) whose algebraic semantics is the class of distributive residuated lattices (DRL). The problem was left open in [8, 5]. We use the method of nuclei and quasi–embedding in the style of [10, 1].
more …
By
Almeida, Agostinho
4 Citations
This work is part of a wider investigation into latticestructured algebras and associated dual representations obtained via the methodology of canonical extensions. To this end, here we study lattices, not necessarily distributive, with negation operations.
We consider equational classes of lattices equipped with a negation operation ¬ which is dually selfadjoint (the pair (¬,¬) is a Galois connection) and other axioms are added so as to give classes of lattices in which the negation is De Morgan, orthonegation, antilogism, pseudocomplementation or weak pseudocomplementation. These classes are shown to be canonical and dual relational structures are given in a generalized Kripkestyle. The fact that the negation is dually selfadjoint plays an important role here, as it implies that it sends arbitrary joins to meets and that will allow us to define the dual structures in a uniform way.
Among these classes, all but one—that of lattices with a negation which is an antilogism—were previously studied by W. Dzik, E. Orłowska and C. van Alten using Urquhart duality.
In some cases in which a given axiom does not imply that negation is dually selfadjoint, canonicity is proven with the weaker assumption of antitonicity of the negation.
more …
By
Kamide, Norihiro
5 Citations
New propositional and firstorder paraconsistent logics (called L_{ω} and FL_{ω}, respectively) are introduced as Gentzentype sequent calculi with classical and paraconsistent negations. The embedding theorems of L_{ω} and FL_{ω} into propositional (firstorder, respectively) classical logic are shown, and the completeness theorems with respect to simple semantics for L_{ω} and FL_{ω} are proved. The cutelimination theorems for L_{ω} and FL_{ω} are shown using both syntactical ways via the embedding theorems and semantical ways via the completeness theorems.
more …
By
Hsiung, Ming
4 Citations
A relativized version of Tarski’s Tscheme is introduced as a new principle of the truth predicate. Under the relativized Tscheme, the paradoxical objects, such as the Liar sentence and Jourdain’s card sequence, are found to have certain relative contradictoriness. That is, they are contradictory only in some frames in the sense that any valuation admissible for them in these frames will lead to a contradiction. It is proved that for any positive integer n, the njump liar sentence is contradictory in and only in those frames containing at least an njump odd cycle. In particular, the Liar sentence is contradictory in and only in those frames containing at least an odd cycle. The Liar sentence is also proved to be less contradictory than Jourdain’s card sequence: the latter must be contradictory in those frames where the former is so, but not vice versa. Generally, the relative contradictoriness is the common characteristic of the paradoxical objects, but different paradoxical objects may have different relative contradictoriness.
more …
By
Avron, Arnon; Konikowska, Beata
8 Citations
In the paper we examine the use of nonclassical truth values for dealing with computation errors in program specification and validation. In that context, 3valued McCarthy logic is suitable for handling lazy sequential computation, while 3valued Kleene logic can be used for reasoning about parallel computation. If we want to be able to deal with both strategies without distinguishing between them, we combine Kleene and McCarthy logics into a logic based on a nondeterministic, 3valued matrix, incorporating both options as a nondeterministic choice. If the two strategies are to be distinguished, Kleene and McCarthy logics are combined into a logic based on a 4valued deterministic matrix featuring two kinds of computation errors which correspond to the two computation strategies described above. For the resulting logics, we provide sound and complete calculi of ordinary, twovalued sequents.
more …
By
Francopoulo, Gil; Bel, Nuria; George, Monte; Calzolari, Nicoletta; Monachini, Monica; Pet, Mandy; Soria, Claudia
Show all (7)
15 Citations
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting natural language processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that a consensual specification on monolingual, bilingual and multilingual lexicons can be a useful aid for the various NLP actors. Within ISO, one purpose of Lexical Markup Framework (LMF, ISO24613) is to define a standard for lexicons that covers multilingual lexical data.
more …
By
Alberucci, Luca; Facchini, Alessandro
12 Citations
We show that the modal μcalculus over GL collapses to the modal fragment by showing that the fixpoint formula is reached after two iterations and answer to a question posed by van Benthem in [4]. Further, we introduce the modal μ^{~}calculus by allowing fixpoint constructors for any formula where the fixpoint variable appears guarded but not necessarily positive and show that this calculus over GL collapses to the modal fragment, too. The latter result allows us a new proof of the de Jongh, Sambin Theorem and provides a simple algorithm to construct the fixpoint formula.
more …
By
Lin, Jimmy; Murray, G. Craig; Dorr, Bonnie J.; Hajič, Jan; Pecina, Pavel
Show all (5)
Thesauri and controlled vocabularies facilitate access to digital collections by explicitly representing the underlying principles of organization. Translation of such resources into multiple languages is an important component for providing multilingual access. However, the specificity of vocabulary terms in most thesauri precludes fullyautomatic translation using generaldomain lexical resources. In this paper, we present an efficient process for leveraging human translations to construct domainspecific lexical resources. This process is illustrated on a thesaurus of 56,000 concepts used to catalog a large archive of oral histories. We elicited human translations on a small subset of concepts, induced a probabilistic phrase dictionary from these translations, and used the resulting resource to automatically translate the rest of the thesaurus. Two separate evaluations demonstrate the acceptability of the automatic translations and the costeffectiveness of our approach.
more …
By
Polguère, Alain
4 Citations
We introduce a new type of lexical structure called lexical system, an interoperable model that can feed both monolingual and multilingual language resources. We begin with a formal characterization of lexical systems as simple directed graphs, solely made up of nodes corresponding to lexical entities and links. To illustrate our approach, we present data borrowed from a lexical system that has been generated from the French DiCo database. We later explain how the compilation of the original dictionarylike database into a netlike one has been made possible. Finally, we discuss the potential of the proposed lexical structure for designing multilingual lexical resources.
more …
By
Habash, Nizar; Dorr, Bonnie; Monz, Christof
9 Citations
The last few years have witnessed an increasing interest in hybridizing surfacebased statistical approaches and rulebased symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generationheavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in sourcepoor/targetrich language pairs by exploiting symbolic and statistical targetlanguage resources. GHMT’s statistical components are limited to targetlanguage models, which arguably makes it a simple form of a hybrid system. We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic–English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT—a primarily symbolic system extended with monolingual and bilingual statistical components—has a higher degree of grammaticality than a phrasebased statistical MT system, where grammaticality is measured in terms of correct verbargument realization and longdistance dependency translation.
more …
By
Tinsley, John; Way, Andy
3 Citations
Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for improvements to the current stateoftheart in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguisticallymotivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrasebased statistical MT (PBSMT) system leads to significant improvements in translation quality. Following this, we describe experiments in which we exploit the information encoded in the parallel treebank in other areas of the PBSMT framework, while investigating the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the possibility of exploiting automaticallygenerated parallel treebanks further in syntaxaware paradigms of MT.
more …
By
Németi, I.; Simon, A.
3 Citations
We show that the variety of ndimensional weakly higher order cylindric algebras, introduced in Németi [9], [8], is finitely axiomatizable when n > 2. Our result implies that in certain nonwellfounded set theories the finitization problem of algebraic logic admits a positive solution; and it shows that this variety is a good candidate for being the cylindric algebra theoretic counterpart of Tarski’s quasiprojective relation algebras.
more …
