Carter, Simon; Monz, Christof
This article describes a method that successfully exploits syntactic features for nbest translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over ngram language models in differentiating between Statistical Machine Translation output and human translations. Our approach uses discriminative language modelling to rerank the nbest translations generated by a statistical machine translation system. The performance is evaluated for ArabictoEnglish translation using NIST’s MTEval benchmarks. While deep features extracted from parse trees do not consistently help, we show how features extracted from a shallow PartofSpeech annotation layer outperform a competitive baseline and a stateoftheart comparative reranking approach, leading to significant BLEU improvements on three different test sets.
Abdul Rauf, Sadaf; Schwenk, Holger
A parallel corpus is an essential resource for statistical machine translation (SMT) but is often not available in the required amounts for all domains and languages. An approach is presented here which aims at producing parallel corpora from available comparable corpora. An SMT system is used to translate the sourcelanguage part of a comparable corpus and the translations are used as queries to conduct information retrieval from the targetlanguage side of the comparable corpus. Simple filters are then used to score the SMT output and the IRreturned sentence with the filter score defining the degree of similarity between the two. Using SMT system output gives the benefit of trying to correct one of the common errors by sentence tail removal. The approach was applied to Arabic–English and French–English systems using comparable news corpora and considerable improvements were achieved in the BLEU score. We show that our approach is independent of the quality of the SMT system used to make the queries, strengthening the claim of applicability of the approach for languages and domains with limited parallel corpora available to start with. We compare our approach with one of the earlier approaches and show that our approach is easier to implement and gives equally good improvements.
Søgaard, Anders
Range concatenation grammars are viewed as a hierarchy of synchronous grammars. It is shown how inversion transduction grammars (ITGs) and extensions thereof, including synchronous treeadjoining grammars, are captured by the hierarchy, and the expressivity and linguistic relevance of subclasses of the hierarchy are discussed. A
$${\mathcal{O}(Gn^6)}$$
time extension of ITGs is proposed. The extension translates crossserial dependencies into nested ones and handles complex kinds of discontinuous translation units and socalled insideout alignments. In fact, our
$${\mathcal{O}(Gn^6)}$$
time extension generates all possible alignments. It is shown that this additional expressivity comes at the cost of probabilistic parsing.
Bos, Johan; Spenader, Jennifer
Verb Phrase Ellipsis (VPE) has been studied in great depth in theoretical linguistics, but empirical studies of VPE are rare. We extend the few previous corpus studies with an annotated corpus of VPE in all 25 sections of the Wall Street Journal corpus (WSJ) distributed with the Penn Treebank. We annotated the raw files using a standoff annotation scheme that codes the auxiliary verb triggering the elided verb phrase, the start and end of the antecedent, the syntactic type of antecedent (VP, TV, NP, PP or AP), and the type of syntactic pattern between the source and target clauses of the VPE and its antecedent. We found 487 instances of VPE (including predicative ellipsis, antecedentcontained deletion, comparative constructions, and pseudogapping) plus 67 cases of related phenomena such as do so anaphora. Interannotator agreement was high, with a 0.97 average Fscore for three annotators for one section of the WSJ. Our annotation is theory neutral, and has better coverage than earlier efforts that relied on automatic methods, e.g. simply searching the parsed version of the Penn Treebank for empty VP’s achieves a high precision (0.95) but low recall (0.58) when compared with our manual annotation. The distribution of VPE source–target patterns deviates highly from the standard examples found in the theoretical linguistics literature on VPE, once more underlining the value of corpus studies. The resulting corpus will be useful for studying VPE phenomena as well as for evaluating natural language processing systems equipped with ellipsis resolution algorithms, and we propose evaluation measures for VPE detection and VPE antecedent selection. The standoff annotation is freely available for research purposes.
Bahrani, Mohammad; Sameti, Hossein; Hafezi Manshadi, Mehdi
In this paper, we present our attempts to design and implement a largecoverage computational grammar for the Persian language based on the Generalized Phrase Structured Grammar (GPSG) model. This grammatical model was developed for continuous speech recognition (CSR) applications, but is suitable for other applications that need the syntactic analysis of Persian. In this work, we investigate various syntactic structures relevant to the modern Persian language, and then describe these structures according to a phrase structure model. Noun (N), Verb (V), Adjective (ADJ), Adverb (ADV), and Preposition (P) are considered basic syntactic categories, and Xbar theory is used to define Noun phrases, Verb phrases, Adjective phrases, Adverbial phrases, and Prepositional phrases. However, we have to extend Noun phrase levels in Xbar theory to four levels due to certain complexities in the structure of Noun phrases in the Persian language. A set of 120 grammatical rules for describing different phrase structures of Persian is extracted, and a few instances of the rules are presented in this paper. These rules cover the major syntactic structures of the modern Persian language. For evaluation, the obtained grammatical model is utilized in a bottomup chart parser for parsing 100 Persian sentences. Our grammatical model can take 89 sentences into account. Incorporating this grammar in a Persian CSR system leads to a 31% reduction in word error rate.
Saquete, Estela; Pustejovsky, James
Until recently, most systems performing temporal extraction and reasoning from text have focused on recognizing and normalizing temporal expressions alone, for which the TIDES annotation scheme has been adopted. Temporal awareness of a text, however, involves not only identifying the temporal expressions, but the events which these expressions anchor, as well as other events which must be ordered relative to them. Because of these broader concerns, TimeML has been developed as an annotation specification that encompasses not only temporal expressions, but all temporally relevant aspects of a text. The annotation schemes, however, are not interchangeable, resulting in incompatible corpora and accompanying extraction algorithms for each standard. In this paper, we describe an automatic migration process from the
TIMEX2
tags of TIDES to the
TIMEX3
tags of TimeML. This transformation procedure has been implemented and evaluated with two different corpora, obtaining 93.3 and 89.2% overall FMeasure respectively.
Kolář, Jáchym
Structural metadata extraction (MDE) research aims to develop techniques for automatic conversion of raw speech recognition output to forms that are more useful to humans and downstream automatic processes. The MDE annotation includes inserting boundaries of sentencelike units to the flow of speech, labeling noncontent words like filled pauses and discourse markers for optional removal, and identifying sections of disfluent speech. This paper describes design, creation, and analysis of data resources for structural MDE from spoken Czech. The annotation is based on the LDC’s MDE annotation standard for English, with changes applied to accommodate specific phenomena of Czech. In addition to the necessary languagedependent modifications, we further proposed and applied several languageindependent modifications slightly refining the original annotation scheme. We created two Czech MDE speech corpora—one in the domain of broadcast news and the other in the domain of broadcast conversations. Both corpora have already been published at LDC. The analysis section of this paper presents a variety of statistics about fillers, edit disfluencies, and sentencelike units. The two Czech corpora are not only compared with each other, but also with statistics relating to the available English MDE corpora. We also report the statistics indicating that edit disfluencies have a different part of speech (POS) distribution in comparison with the overall POS distribution. The findings from the corpus analysis should help guide strategies for developing automatic MDE systems.
Daille, Béatrice; Dubreil, Estelle; Monceaux, Laura; Vernier, Matthieu
The blog phenomenon is universal. Blogs are characterized by their evaluative use, in that they enable Internet users to express their opinion on a given subject. From this point of view, they are an ideal resource for the constitution of an annotated sentiment analysis corpus, crossing the subject and the opinion expressed on this subject. This paper presents the Blogoscopy corpus for the French language which was built up with personal thematic blogs. The annotation was governed by three principles: theoretical, as opinion is grounded in a linguistic theory of evaluation, practical, as every opinion is linked to an object, and methodological as annotation rules and successive phases are defined to ensure quality and thoroughness.
Walker, Vern R.; Carie, Nathaniel; DeWitt, Courtney C.; Lesh, Eric
This article describes the Vaccine/Injury Project Corpus, a collection of legal decisions awarding or denying compensation for health injuries allegedly due to vaccinations, together with models of the logical structure of the reasoning of the factfinders in those cases. This unique corpus provides useful data for formal and informal logic theory, for naturallanguage research in linguistics, and for artificial intelligence research. More importantly, the article discusses lessons learned from developing protocols for manually extracting the logical structure and generating the logic models. It identifies subtasks in the extraction process, discusses challenges to automation, and provides insights into possible solutions for automation. In particular, the framework and strategies developed here, together with the corpus data, should allow “top–down” and contextual approaches to automation, which can supplement “bottomup” linguistic approaches. Illustrations throughout the article use examples drawn from the Corpus.
Boulet, Romain; Mazzega, Pierre; Bourcier, Danièle
We explore one aspect of the structure of a codified legal system at the national level using a new type of representation to understand the strong or weak dependencies between the various fields of law. In Part I of this study, we analyze the graph associated with the network in which each French legal code is a vertex and an edge is produced between two vertices when a code cites another code at least one time. We show that this network distinguishes from many other real networks from a high density, giving it a particular structure that we call concentrated world and that differentiates a national legal system (as considered with a resolution at the code level) from smallworld graphs identified in many social networks. Our analysis then shows that a few communities (groups of highly wired vertices) of codes covering large domains of regulation are structuring the whole system. Indeed we mainly find a central group of influent codes, a group of codes related to social issues and a group of codes dealing with territories and natural resources. The study of this codified legal system is also of interest in the field of the analysis of real networks. In particular we examine the impact of the high density on the structural characteristics of the graph and on the ways communities are searched for. Finally we provide an original visualization of this graph on an hemicylelike plot, this representation being based on a statistical reduction of dissimilarity measures between vertices. In Part II (a following paper) we show how the consideration of the weights attributed to each edge in the network in proportion to the number of citations between two vertices (codes) allows deepening the analysis of the French legal system.
Kacsuk, Zsófia
In patent law most of the crucial legal questions such as patentability and infringement are linked to the patent claims. The European Patent Office regards patent claims as a set of independent features which are examined separately in a more or less formal way. The author has found that this approach allows for developing a simple mathematical model which treats patent claim features as logical statements and patent claims as compound statements wherein the individual statements are connected by logical connectives. The proposed mathematical model provides a uniform system for examining various legal questions that are dealt with separately under current case law, moreover, it allows for developing an expert system for resolving complex legal situations and for automating the evaluation of a large number of patent claim variants that is currently not possible.
Garcia, Ignacio
Translation memory tools now offer the translator to insert postedited machine translation segments for which no match is found in the databases. The Google Translator Toolkit does this by default, advising in its Settings window: “Most users should not modify this”. Postediting of no matches appears to work on engines trained with specific bilingual data on a source written under controlled language constraints. Would this, however, work for any type of task as Google’s advice implies? We have tested this by carrying out experiments with English–Chinese trainees, using the Toolkit to translate from the source text (the control group) and by postediting (the experimental group). Results show that postediting gains in productivity are marginal. With regard to quality, however, postediting produces significantly better statistical results compared to translating manually. These gains in quality are observed independently of language direction, text difficulty or translator’s level of performance. In light of these findings, we discuss whether translators should consider postediting as a viable alternative to conventional translation.
Haque, Rejwanul; Naskar, Sudip Kumar; Bosch, Antal; Way, Andy
The translation features typically used in PhraseBased Statistical Machine Translation (PBSMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into loglinear PBSMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PBSMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of longdistance contextual features, such as dependency relations in combination with partofspeech tags in DutchtoEnglish subtitle translation, the combination of dependency parse and semantic role information in EnglishtoDutch parliamentary debate translation, or supertag features in EnglishtoChinese translation.
O’Brien, Sharon
Machine translation (MT) quality is generally measured via automatic metrics, producing scores that have no meaning for translators who are required to postedit MT output or for project managers who have to plan and budget for translation projects. This paper investigates correlations between two such automatic metrics (general text matcher and translation edit rate) and postediting productivity. For the purposes of this paper, productivity is measured via processing speed and cognitive measures of effort using eye tracking as a tool. Processing speed, average fixation time and count are found to correlate well with the scores for groups of segments. Segments with high GTM and TER scores require substantially less time and cognitive effort than medium or lowscoring segments. Future research involving score thresholds and confidence estimation is suggested.
Aucher, Guillaume; Boella, Guido; Torre, Leendert
Knowledge based privacy policies are more declarative than traditional action based ones, because they specify only what is permitted or forbidden to know, and leave the derivation of the permitted actions to a security monitor. This inference problem is already non trivial with a static privacy policy, and becomes challenging when privacy policies can change over time. We therefore introduce a dynamic modal logic that permits not only to reason about permitted and forbidden knowledge to derive the permitted actions, but also to represent explicitly the declarative privacy policies together with their dynamics. The logic can be used to check both regulatory and behavioral compliance, respectively by checking that the permissions and obligations set up by the security monitor of an organization are not in conflict with the privacy policies, and by checking that these obligations are indeed enforced.
Kallmeyer, Laura
This paper is concerned with the relation between mild contextsensitivity and the class of languages generated by Linear ContextFree Rewriting Systems (LCFRSs). We show that there are languages that are polynomial and of constant growth but that are not LCFRLs. Starting from this observation, we define an extension of LCFRS that, roughly, allows for a limited amount of copying (or intersection, if considered in a bottomup perspective). The proposed LCFRS extension, Literal Movement Grammars of constant nonlinearity (CNLLMG) is such that the parts that are copied into different places during a derivation cannot increase when iterating the nonlinear parts of a derivation. We show that this condition guarantees that the string languages are still of constant growth. As a consequence, CNLLMG is mildly contextsensitive while properly extending LCFRS. This result suggests that a limited and controlled amount of copying and intersection can remain tractable since it does not lead outside mild contextsensitivity. Furthermore, we show that there are natural language phenomena (in particular gapping and scrambling) where such a limited possibility of intersection gives the necessary expressive power beyond LCFRS to model these phenomena.
Bond, Francis; Oepen, Stephan; Nichols, Eric; Flickinger, Dan; Velldal, Erik; Haugereid, Petter
This paper summarizes ongoing efforts to provide software infrastructure (and methodology) for opensource machine translation that combines a deep semantic transfer approach with advanced stochastic models. The resulting infrastructure combines precise grammars for parsing and generation, a semantictransfer based translation engine and stochastic controllers. We provide both a qualitative and quantitative experience report from instantiating our general architecture for Japanese–English MT using only opensource components, including HPSGbased grammars of English and Japanese.
Dung, Phan Minh; Sartor, Giovanni
We provide a logical analysis of private international law, a rather esoteric, but increasingly important, domain of the law. Private international law addresses overlaps and conflicts between legal systems by distributing cases between the authorities of such systems (jurisdiction) and establishing what rules these authorities have to apply to each case (choice of law). A formal model of the resulting interactions between legal systems is proposed based on modular argumentation. It is argued that this model may also be useful for governing the interactions between heterogeneous agents, belonging to different and differently regulated virtual societies, without recourse to a central regulatory agency. The model also provides for multiple interpretations concerning rules of private international law as well as substantive rules of the different legal systems.
Phillips, Aaron B.
The Cunei machine translation platform is an opensource system for datadriven machine translation. Our platform is a synthesis of the traditional examplebased MT (EBMT) and statistical MT (SMT) paradigms. What makes Cunei unique is that it measures the relevance of each translation instance with a distance function. This distance function, represented as a loglinear model, operates over one translation instance at a time and enables us to score the translation instance relative to the specified input and/or the current target hypothesis. We describe how our system, Cunei, scores features individually for each translation instance and how it efficiently performs parameter tuning over the entire feature space. We also compare Cunei with three other opensource MT systems (Moses, CMUEBMT, and Marclator). In our experiments involving Korean–English and Czech–English translation Cunei clearly outperforms the traditional EBMT and SMT systems.
Burgemeestre, Brigitte; Hulstijn, Joris; Tan, YaoHua
Compliance is often achieved ‘by design’ through a coherent system of controls consisting of information systems and procedures. This systembased control requires a new approach to auditing in which companies must demonstrate to the regulator that they are ‘in control’. They must determine the relevance of a regulation for their business, justify which set of control measures they have taken to comply with it, and demonstrate that the control measures are operationally effective. In this paper we show how valuebased argumentation theory can be applied to the compliance domain. Corporate values motivate the selection of control measures (actions) which aim to fulfil control objectives, i.e. adopted norms (goals). In particular, we show how to formalize the audit dialogue in which companies justify their compliance decisions to regulators using valuebased argumentation. The approach is illustrated by a case study of the safety and security measures adopted in the context of EU customs regulation.
MacKinlay, Andrew; Dridan, Rebecca; Flickinger, Dan; Baldwin, Timothy
We examine the impact of domain on parse selection accuracy, in the context of precision HPSG parsing using the English Resource Grammar, using two training corpora and four test corpora and evaluating using exact tree matches as well as dependency Fscores. In addition to determining the relative impact of in vs. crossdomain parse selection training on parser performance, we propose strategies to avoid crossdomain performance penalty when limited indomain data is available. Our work supports previous research showing that indomain training data significantly improves parse selection accuracy, and that it provides greater parser accuracy than an outofdomain training corpus of the same size, but we verify experimentally that this holds for a handcrafted grammar, observing a 10–16% improvement in exact match and 5–6% improvement in dependency Fscore by using a domainmatched training corpus. We also find it is possible to considerably improve parse selection accuracy through construction of even smallscale indomain treebanks, and learning of parse selection models over indomain and outofdomain data. Naively adding an 11,000token indomain training corpus boosts dependency Fscore by 2–3% over using solely outofdomain data. We investigate more sophisticated strategies for combining data from these sources to train models: weighted linear interpolation between the singledomain models, and training a model from the combined data, optionally duplicating the smaller corpus to give it a higher weighting. The most successful strategy is training a monolithic model after duplicating the smaller corpus, which gives an improvement over a range of weightings, but we also show that the optimal value for these parameters can be estimated on a casebycase basis using a crossvalidation strategy. This domaintuning strategy provides a further performance improvement of up to 2.3% for exact match and 0.9% for dependency Fscore compared to the naive combination strategy using the same data.
Steinberger, Ralf; Ombuya, Sylvia; Kabadjov, Mijail; Pouliquen, Bruno; Rocca, Leo; Belyaeva, Jenya; Paola, Monica; Ignat, Camelia; Goot, Erik
The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the languagespecific resources for that language. We thus describe the type of languagespecific resources needed, the effort involved, and ways of bootstrapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people.
Grover, Aditi Sharma; Huyssteen, Gerhard B.; Pretorius, Marthinus W.
Human language technology (HLT) has been identified as a priority area by the South African government. However, despite efforts by government and the research and development (R&D) community, South Africa has not yet been able to maximise the opportunities of HLT and create a thriving HLT industry. One of the key challenges is the fact that there is insufficient codified knowledge about the current South African HLT components, their attributes and existing relationships. Hence a technology audit was conducted for the South African HLT landscape, to create a systematic and detailed inventory of the status of the HLT components across the eleven official languages. Based on the Basic Language Resource Kit (BLaRK) framework Krauwer (ELRA Newslett 3(2), 1998), we used various data collection methods (such as focus groups, questionnaires and personal consultations with HLT experts) to gather detailed information. The South African HLT landscape is analysed using a number of complementary approaches and based on the interpretations of the results, recommendations are made on how to accelerate HLT development in South Africa, as well as on how to conduct similar audits in other countries and contexts.
Badenhorst, Jaco; Heerden, Charl; Davel, Marelie; Barnard, Etienne
We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which contains data from the eleven official languages of South Africa. Because of practical constraints, the amount of speech per language is relatively small compared to major corpora in world languages, and we report on our investigation of the stability of the ASR models derived from the corpus. We also report on phoneme distance measures across languages, and describe initial phone recognisers that were developed using this data. We find that a surprisingly small number of speakers (fewer than 50) and around 10 to 20 h of speech per language are sufficient for the purposes of acceptable phonebased recognition.
Chiarcos, Christian; Fiedler, Ines; Grubic, Mira; Hartmann, Katharina; Ritz, Julia; Schwarz, Anne; Zeldes, Amir; Zimmermann, Malte
In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 “Information Structure”. These include deeply annotated data collections of 25 subSaharan languages that are described together with their annotation scheme, as well as the corpus tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the purpose of language documentation, distributed access, and the creation of data archives.
Mundici, Daniele
Building on Wójcicki’s work on infinitevalued Łukasiewicz logic Ł_{∞}, we give a selfcontained proof of the deductive interpolation theorem for Ł_{∞}. This paper aims at introducing the reader to the geometry of Łukasiewicz logic.
Raftery, J. G.
Logics that do not have a deductiondetachment theorem (briefly, a DDT) may still possess a contextual DDT—a syntactic notion introduced here for arbitrary deductive systems, along with a local variant. Substructural logics without sentential constants are natural witnesses to these phenomena. In the presence of a contextual DDT, we can still upgrade many weak completeness results to strong ones, e.g., the finite model property implies the strong finite model property. It turns out that a finitary system has a contextual DDT iff it is protoalgebraic and gives rise to a dually Brouwerian semilattice of compact deductive filters in every finitely generated algebra of the corresponding type. Any such system is filter distributive, although it may lack the filter extension property. More generally, filter distributivity and modularity are characterized for all finitary systems with a local contextual DDT, and several examples are discussed. For algebraizable logics, the wellknown correspondence between the DDT and the equational definability of principal congruences is adapted to the contextual case.
Urquhart, Alasdair
This paper investigates the depth of resolution proofs, that is to say, the length of the longest path in the proof from an input clause to the conclusion. An abstract characterization of the measure is given, as well as a discussion of its relation to other measures of space complexity for resolution proofs
Wansing, Heinrich; Kamide, Norihiro
A new combined temporal logic called synchronized lineartime temporal logic (SLTL) is introduced as a Gentzentype sequent calculus. SLTL can represent the nCartesian product of the set of natural numbers. The cutelimination and completeness theorems for SLTL are proved. Moreover, a display sequent calculus δSLTL is defined.
Font, Josep Maria; Jansana, Ramon
A pair of deductive systems (S,S’) is Leibnizlinked when S’ is an extension of S and on every algebra there is a map sending each filter of S to a filter of S’ with the same Leibniz congruence. We study this generalization to arbitrary deductive systems of the notion of the strong version of a protoalgebraic deductive system, studied in earlier papers, and of some results recently found for particular nonprotoalgebraic deductive systems. The necessary examples and counterexamples found in the literature are described.
Goldblatt, Robert
Grishin algebras are a generalisation of Boolean algebras that provide algebraic models for classical bilinear logic with two mutually cancelling negation connectives. We show how to build complete Grishin algebras as algebras of certain subsets (“propositions”) of cover systems that use an orthogonality relation to interpret the negations.
The variety of Grishin algebras is shown to be closed under MacNeille completion, and this is applied to embed an arbitrary Grishin algebra into the algebra of all propositions of some cover system, by a map that preserves all existing joins and meets.
This representation is then used to give a cover system semantics for a version of classical bilinear logic that has firstorder quantifiers and infinitary conjunctions and disjunctions.
Düntsch, Ivo; Orłowska, Ewa
We present two discrete dualities for double Stone algebras. Each of these dualities involves a different class of frames and a different definition of a complex algebra. We discuss relationships between these classes of frames and show that one of them is a weakening of the other. We propose a logic based on double Stone algebras.
Segerberg, Krister
The purpose of this paper is to suggest a formal modelling of metaphors as a lingustic tool capable of conveying meanings from one conceptual space to another. This modelling is done within DDL (dynamic doxastic logic).
González, Jorge; Casacuberta, Francisco
In this article, the first public release of GREAT as an opensource, statistical machine translation (SMT) software toolkit is described. GREAT is based on a bilingual language modelling approach for SMT, which is so far implemented for ngram models based on the framework of stochastic finitestate transducers. The use of finitestate models is motivated by their simplicity, their versatility, and the fact that they present a lower computational cost, if compared with other more expressive models. Moreover, if translation is assumed to be a subsequential process, finitestate models are enough for modelling the existing relations between a source and a target language. GREAT includes some characteristics usually present in stateoftheart SMT, such as phrasebased translation models or a loglinear framework for local features. Experimental results on a wellknown corpus such as Europarl are reported in order to validate this software. A competitive translation quality is achieved, yet using both a lower number of model parameters and a lower response time than the widelyused, stateoftheart SMT system Moses.
Fitting, Melvin
A mixture of propositional dynamic logic and epistemic logic that we call PDL + E is used to give a formalization of Artemov’s knowledge based reasoning approach to game theory, (KBR), [4, 5]. Epistemic states of players are represented explicitly and reasoned about formally. We give a detailed analysis of the Centipede game using both proof theoretic and semantic machinery. This helps make the case that PDL + E can be a useful basis for the logical investigation of game theory.
Maksimova, Larisa
In a previous paper [21] all extensions of Johansson’s minimal logic J with the weak interpolation property WIP were described. It was proved that WIP is decidable over J. It turned out that the weak interpolation problem in extensions of J is reducible to the same problem over a logic Gl, which arises from J by adding tertium non datur.
In this paper we consider extensions of the logic Gl. We prove that only finitely many logics over Gl have the Craig interpolation property CIP, the restricted interpolation property IPR or the projective Beth property PBP. The full list of Gllogics with the mentioned properties is found, and their description is given. We note that IPR and PBP are equivalent over Gl. It is proved that CIP, IPR and PBP are decidable over the logic Gl.
Rybakov, Vladimir V.
This paper offers a brief analysis of the unification problem in modal transitive logics related to the logic S4: S4 itself, K4, Grz and GödelLöb provability logic GL. As a result, new, but not the first, algorithms for the construction of ‘best’ unifiers in these logics are being proposed. The proposed algorithms are based on our earlier approach to solve in an algorithmic way the admissibility problem of inference rules for S4 and Grz. The first algorithms for the construction of ‘best’ unifiers in the above mentioned logics have been given by S. Ghilardi in [16]. Both the algorithms in [16] and ours are not much computationally efficient. They have, however, an obvious significant theoretical value a portion of which seems to be the fact that they stem from two different methodological approaches.
more …
Malinowski, Grzegorz; Woleński, Jan
For decades Ryszard Wójcicki has been a highly influential scholar in the community of logicians and philosophers. Our aim is to outline and comment on some essential issues on logic, methodology of science and semantics as seen from the perspective of distinguished contributions of Wójcicki to these areas of philosophical investigations.
more …
Arieli, O.; Avron, A.; Zamansky, A.
We define in precise terms the basic properties that an ‘ideal propositional paraconsistent logic’ is expected to have, and investigate the relations between them. This leads to a precise characterization of ideal propositional paraconsistent logics. We show that every threevalued paraconsistent logic which is contained in classical logic, and has a proper implication connective, is ideal. Then we show that for every n > 2 there exists an extensive family of ideal nvalued logics, each one of which is not equivalent to any kvalued logic with k < n.
Benthem, J.; Pacuit, E.
This paper adds evidence structure to standard models of belief, in the form of families of sets of worlds. We show how these more finegrained models support natural actions of “evidence management”, ranging from update with external new information to internal rearrangement. We show how this perspective leads to new richer languages for existing neighborhood semantics for modal logic. Our main results are relative completeness theorems for the resulting dynamic logic of evidence.
Bezhanishvili, Guram; Bezhanishvili, Nick
We introduce relativized modal algebra homomorphisms and show that the category of modal algebras and relativized modal algebra homomorphisms is dually equivalent to the category of modal spaces and partial continuous pmorphisms, thus extending the standard duality between the category of modal algebras and modal algebra homomorphisms and the category of modal spaces and continuous pmorphisms. In the transitive case, this yields an algebraic characterization of Zakharyaschev’s subreductions, cofinal subreductions, dense subreductions, and the closed domain condition. As a consequence, we give an algebraic description of canonical, subframe, and cofinal subframe formulas, and provide a new algebraic proof of Zakharyaschev’s theorem that each logic over K4 is axiomatizable by canonical formulas.
Indrzejczak, Andrzej
The paper is a brief survey of the most important semantic constructions founded on the concept of possible world. It is impossible to capture in one short paper the whole variety of the problems connected with manifold applications of possible worlds. Hence, after a brief explanation of some philosophical matters I take a look at possible worlds from rather technical standpoint of logic and focus on the applications in formal semantics. In particular, I would like to focus on the fruitful marriage of possible world semantics and algebra and its evolution leading to very general construction of Wójcicki called referential semantics and some of its refinements. The presentation is informal and sketchy; the main purpose is to put in one place a short, and readable I hope, description of the most important constructions and to point out the main sources of these solutions.
Vasyukov, Vladimir L.
Categoricaltheoretic semantics for the relevance logic is proposed which is based on the construction of the topos of functors from a relevant algebra (considered as a preorder category endowed with the special endofunctors) in the category of sets Set. The completeness of the relevant system R of entailment is proved in respect to the semantic considered.
Ertola Biraben, Rodolfo Cristian; San Martín, Hernán Javier
We study some operations that may be defined using the minimum operator in the context of a Heyting algebra. Our motivation comes from the fact that 1) already known compatible operations, such as the successor by Kuznetsov, the minimum dense by Smetanich and the operation G by Gabbay may be defined in this way, though almost never explicitly noted in the literature; 2) defining operations in this way is equivalent, from a logical point of view, to two clauses, one corresponding to an introduction rule and the other to an elimination rule, thus providing a manageable way to deal with these operations. Our main result is negative: all operations that arise turn out to be Heyting terms or the mentioned already known operations or operations interdefinable with them. However, it should be noted that some of the operations that arise may exist even if the known operations do not. We also study the extension of Priestley duality to Heyting algebras enriched with the new operations.
Fernández Duque, David
We show that given a finite, transitive and reflexive Kripke model 〈 W, ≼, ⟦ ⋅ ⟧ 〉 and
$${w \in W}$$
, the property of being simulated by w (i.e., lying on the image of a literalpreserving relation satisfying the ‘forth’ condition of bisimulation) is modally undefinable within the class of S4 Kripke models. Note the contrast to the fact that lying in the image of w under a bisimulation is definable in the standard modal language even over the class of K4 models, a fairly standard result for which we also provide a proof.
We then propose a minor extension of the language adding a sequent operator
$${\natural}$$
(‘tangle’) which can be interpreted over Kripke models as well as over topological spaces. Over finite Kripke models it indicates the existence of clusters satisfying a specified set of formulas, very similar to an operator introduced by Dawar and Otto. In the extended language
$${{\sf L}^+ = {\sf L}^{\square\natural}}$$
, being simulated by a point on a finite transitive Kripke model becomes definable, both over the class of (arbitrary) Kripke models and over the class of topological S4 models.
As a consequence of this we obtain the result that any class of finite, transitive models over finitely many propositional variables which is closed under simulability is also definable in L^{+}, as well as Boolean combinations of these classes. From this it follows that the μcalculus interpreted over any such class of models is decidable.
Rump, Wolfgang; Yang, Yichuan
In 2002, Dvurečenskij extended Mundici’s equivalence between unital abelian lgroups and MValgebras to the noncommutative case. We analyse the relationship to Bosbach’s cone algebras and clarify the rôle of the corresponding pair of Lalgebras. As a consequence, it follows that one of the two Lalgebra axioms can be dropped.
Nguyen, Linh Anh; Szałas, Andrzej
Grammar logics were introduced by Fariñas del Cerro and Penttonen in 1988 and have been widely studied. In this paper we consider regular grammar logics with converse (REG^{c} logics) and present sound and complete tableau calculi for the general satisfiability problem of REG^{c} logics and the problem of checking consistency of an ABox w.r.t. a TBox in a REG^{c} logic. Using our calculi we develop ExpTime (optimal) tableau decision procedures for the mentioned problems, to which various optimization techniques can be applied. We also prove a new result that the data complexity of the instance checking problem in REG^{c} logics is coNPcomplete.
Wintein, Stefan
In this paper, we present a framework in which we analyze three riddles about truth that are all (originally) due to Smullyan. We start with the riddle of the yesno brothers and then the somewhat more complicated riddle of the daja brothers is studied. Finally, we study the Hardest Logic Puzzle Ever (HLPE). We present the respective riddles as sets of sentences of quotational languages, which are interpreted by sentencestructures. Using a revisionprocess the consistency of these sets is established. In our formal framework we observe some interesting dissimilarities between HLPE’s available solutions that were hidden due to their previous formulation in natural language. Finally, we discuss more recent solutions to HLPE which, by means of selfreferential questions, reduce the number of questions that have to be asked in order to solve HLPE. Although the essence of the paper is to introduce a framework that allows us to formalize riddles about truth that do not involve selfreference, we will also shed some formal light on the selfreferential solutions to HLPE.
Parent, Xavier
The aim of this paper is to strengthen the point made by Horty about the relationship between reason holism and moral particularism. In the literature prima facie obligations have been considered as the only source of reason holism. I strengthen Horty’s point in two ways. First, I show that contrarytoduties provide another independent support for reason holism. Next I outline a formal theory that is able to capture these two sources of holism. While in simple settings the proposed account coincides with Horty’s one, this is not true in more complicated or “realistic” settings in which more than two norms collide. My chosen formalism is socalled input/output logic. A bottomline example is introduced. It raises the issue of whether the conventional wisdom is right in assuming that normative reasons run parallel to epistemic ones.
Brown, Ralf D.
This paper presents an indepth description of the features of the opensource CMUEBMT examplebased machine translation system. CMUEBMT is a complete endtoend system including lexicon induction, word and phrase alignment, corpus indexing and lookup, language model, decoder, and parameter tuning components. While it does not require them, it can take advantage of external alignment information and other annotations provided by GIZA++ and other systems. To illustrate a recent addition to CMUEBMT, experiments are presented which show an improvement of 0.16 BLEU points (0.9% relative) on a crossvalidated smalldata English–Haitian translation task when using a new set of finegrained loglinear feature values representing language model match lengths in addition to language model probabilities.
Moran, Steven
This paper presents the design and implementation of the Ontology for Accessing Transcription Systems (OATS), a knowledge base that supports interoperation over disparate transcription systems and practical orthographies. OATS uses RDF, SPARQL and Unicode to facilitate resource discovery and intelligent search over linguistic data. The knowledge base includes an ontological description of writing systems and relations for mapping transcription system segments to an interlingua pivot, the IPA. It includes orthographic and phonemic inventories from 203 African languages, which were mined from the Web. OATS is motivated by four use cases: querying data in the knowledge base via IPA, querying it in native orthography, error checking of digitized data, and conversion between transcription systems. The model in this paper implements each of these use cases.
Raybaud, Sylvain; Langlois, David; Smaïli, Kamel
Machine translation systems are not reliable enough to be used “as is”: except for the most simple tasks, they can only be used to grasp the general meaning of a text or assist human translators. The purpose of confidence measures is to detect erroneous words or sentences produced by a machine translation system. In this article, after reviewing the mathematical foundations of confidence estimation, we propose a comparison of several stateoftheart confidence measures, predictive parameters and classifiers. We also propose two original confidence measures based on Mutual Information and a method for automatically generating data for training and testing classifiers. We applied these techniques to data from the WMT campaign 2008 and found that the best confidence measures yielded an Equal Error Rate of 36.3% at word level and 34.2% at sentence level, but combining different measures reduced these rates to 35.0% and 29.0%, respectively. We also present the results of an experiment aimed at determining how helpful confidence measures are in a postediting task. Preliminary results suggest that our system is not yet ready to efficiently help posteditors, but we now have both software and a protocol that we can apply to further experiments, and user feedback has indicated aspects which must be improved in order to increase the level of helpfulness of confidence measures.
Pauw, Guy; Wagacha, Peter Waiganjo; Schryver, GillesMaurice
Research in machine translation and corpus annotation has greatly benefited from the increasing availability of wordaligned parallel corpora. This paper presents ongoing research on the development and application of the sawa corpus, a twomillionword parallel corpus English—Swahili. We describe the data collection phase and zero in on the difficulties of finding appropriate and easily accessible data for this language pair. In the data annotation phase, the corpus was semiautomatically sentence and wordaligned and morphosyntactic information was added to both the English and Swahili portion of the corpus. The annotated parallel corpus allows us to investigate two possible uses. We describe experiments with the projection of partofspeech tagging annotation from English onto Swahili, as well as the development of a basic statistical machine translation system for this language pair, using the parallel corpus and a consolidated database of existing English—Swahili translation dictionaries. We particularly focus on the difficulties of translating English into the morphologically more complex Bantu language of Swahili.
Karamanis, Nikiforos; Luz, Saturnino; Doherty, Gavin
This paper reports the results of a qualitative study which investigated localisation activities performed by translators working in two Language Service Providers. It argues that maintaining the appropriate quality level in this setting is a collaborative task which involves several translators. This perspective entails taking a broader view of the translation process than usually found in the Machine Translation (MT) literature and detailing the various knowledge sources which are deployed in this collaborative effort. The impact of collaboration on trust is examined, and a comparison is made between the relatively seamless flow of work between translators and the more strained relationships with remote contributors. In support of this view, the paper contrasts the flexibility of the analysed work practices with the rigid ways which tend to be followed when introducing MT into this setting. We identify the need to support collaboration and communication more actively as a broader issue in translation settings. While current strategies for introducing MT tend to further isolate translators from remote contributors, we propose that MT can serve as the catalyst for establishing a more dynamic and collaborative relationship between them.
Abraham, M.; Gabbay, D. M.; Schild, U.
This paper examines the deontic logic of the Talmud. We shall find, by looking at examples, that at first approximation we need deontic logic with several connectives:
O_{T}A
Talmudic obligation
F_{T}ATalmudic prohibition
F_{D}AStandard deontic prohibition
O_{D}AStandard deontic obligation.
In classical logic one would have expected that deontic obligation
O_{D} is definable by
$$O_DA \equiv F_D\neg A$$
and that
O_{T} and
F_{T} are connected by
$$O_TA \equiv F_T\neg A$$
This is not the case in the Talmud for the
T (Talmudic) operators, though it does hold for the
D operators. We must change our underlying logic. We have to regard {
O_{T},
F_{T}} and {
O_{D},
F_{D}} as two sets of operators, where
O_{T} and
F_{T} are independent of one another and where we have some connections between the two sets. We shall list the types of obligation patterns appearing in the Talmud and develop an intuitionistic deontic logic to accommodate them. We shall compare Talmudic deontic logic with modern deontic logic.
Demolombe, Robert
The paper presents a logical framework for the representation of interactions between institutional agents, human agents and software agents. A case study is used to analyze how obligations on institutional agents are “propagated” to human and software agents, and how actions performed by these agents count as actions that satisfy the obligations imposed to institutional agents. It is shown that the relationship between the different kinds of obligations and actions can be represented in terms of the concept of “count as” proposed by Searle, of role and of causality. The logical framework focus on those three concepts.
Mayor, Aingeru; Alegria, Iñaki; Díaz de Ilarraza, Arantza; Labaka, Gorka; Lersundi, Mikel; Sarasola, Kepa
We present the first publicly available machine translation (MT) system for Basque. The fact that Basque is both a morphologically rich and lessresourced language makes the use of statistical approaches difficult, and raises the need to develop a rulebased architecture which can be combined in the future with statistical techniques. The MT architecture proposed reuses several opensource tools and is based on a unique XML format to facilitate the flow between the different modules, which eases the interaction among different developers of tools and resources. The result is the rulebased Matxin MT system, an opensource toolkit, whose first implementation translates from Spanish to Basque. We have performed innovative work on the following tasks: construction of a dependency analyser for Spanish, use of rich linguistic information to translate prepositions and syntactic functions (such as subject and object markers), construction of an efficient module for verbal chunk transfer, and design and implementation of modules for ordering words and phrases, independently of the source language.
Dickinson, Markus
We describe a framework for performing morphological analysis to account for learner language, focusing on Russian as an example of an inflecting language. Because a set of linguistic analyses is needed to provide feedback on potentially noisy data, there is a large amount of ambiguity for even wellformed words. Using a segmented POS lexicon as a test case, we show how to analyze subparts of words, in order to analyze variations. After describing and implementing this framework for Russian, we focus on removing undesirable analyses to keep the task feasible. This is essentially an investigation of how much overgeneration of analyses is a problem and under what assumptions it can be reduced.
HauserBordalo, Gabriela
We recall some notions introduced and developed by António Aniceto Monteiro, and show how these notions have been used and generalised, thus establishing a direct and indirect influence of Monteiro’s work that extends to this day.
Cornejo, Juan Manuel
The purpose of this paper is to define a new logic
$${\mathcal {SI}}$$
called semiintuitionistic logic such that the semiHeyting algebras introduced in [4] by Sankappanavar are the semantics for
$${\mathcal {SI}}$$
. Besides, the intuitionistic logic will be an axiomatic extension of
$${\mathcal {SI}}$$
.
Castaño, Valeria; Muñoz Santis, Marcela
In this paper we obtain characterizations of subalgebras of Heyting algebras and De Morgan Heyting algebras. In both cases we obtain these characterizations by defining certain equivalence relations on the Priestleytype topological representations of the corresponding algebras. As a particular case we derive the characterization of maximal subalgebras of Heyting algebras given by M. Adams for the finite case.
Blyth, T. S.; Fang, J.
A pOalgebra
$${(L; f, \, ^{\star})}$$
is an algebra in which (L; f) is an Ockham algebra,
$${(L; \, ^{\star})}$$
is a palgebra, and the unary operations f and
$${^{\star}}$$
commute. Here we consider the endomorphism monoid of such an algebra. If
$${(L; f, \, ^{\star})}$$
is a subdirectly irreducible pK_{1,1} algebra then every endomorphism
$${\vartheta}$$
is a monomorphism or
$${\vartheta^3 = \vartheta}$$
. When L is finite the endomorphism monoid of L is regular, and we determine precisely when it is a Clifford monoid.
Cimadamore, C.; Díaz Varela, J. P.
In this paper we extend Mundici’s functor Γ to the category of monadic MValgebras. More precisely, we define monadic ℓgroups and we establish a natural equivalence between the category of monadic MValgebras and the category of monadic ℓgroups with strong unit. Some applications are given thereof.
Campercholi, M.; Vaggione, D.
Let A be an algebra. We say that the functions f_{1}, . . . , f_{m} : A^{n} → A are algebraic on A provided there is a finite system of termequalities
$${{\bigwedge t_{k}(\overline{x}, \overline{z}) = s_{k}(\overline{x}, \overline{z})}}$$
satisfying that for each
$${{\overline{a} \in A^{n}}}$$
, the mtuple
$${{(f_{1}(\overline{a}), \ldots , f_{m}(\overline{a}))}}$$
is the unique solution in A^{m} to the system
$${{\bigwedge t_{k}(\overline{a}, \overline{z}) = s_{k}(\overline{a}, \overline{z})}}$$
. In this work we present a collection of general tools for the study of algebraic functions, and apply them to obtain characterizations for algebraic functions on distributive lattices, Stone algebras, finite abelian groups and vector spaces, among other well known algebraic structures.
Sankappanavar, H. P.
This paper is a contribution toward developing a theory of expansions of semiHeyting algebras. It grew out of an attempt to settle a conjecture we had made in 1987. Firstly, we unify and extend strikingly similar results of [48] and [50] to the (new) equational class DHMSH of dually hemimorphic semiHeyting algebras, or to its subvariety BDQDSH of blended dual quasiDe Morgan semiHeyting algebras, thus settling the conjecture. Secondly, we give a criterion for a unary expansion of semiHeyting algebras to be a discriminator variety and give an algorithm to produce discriminator varieties. We then apply the criterion to exhibit an increasing sequence of discriminator subvarieties of BDQDSH. We also use it to prove that the variety DQSSH of dually quasiStone semi Heyting algebras is a discriminator variety. Thirdly, we investigate a binary expansion of semiHeyting algebras, namely the variety DblSH of double semiHeyting algebras by characterizing its simples, and use the characterization to present an increasing sequence of discriminator subvarieties of DblSH. Finally, we apply these results to give bases for “small” subvarieties of BDQDSH, DQSSH, and DblSH.
Bezhanishvili, Guram; Jansana, Ramon
We generalize Priestley duality for distributive lattices to a duality for distributive meetsemilattices. On the one hand, our generalized Priestley spaces are easier to work with than Celani’s DSspaces, and are similar to Hansoul’s Priestley structures. On the other hand, our generalized Priestley morphisms are similar to Celani’s meetrelations and are more general than Hansoul’s morphisms. As a result, our duality extends Hansoul’s duality and is an improvement of Celani’s duality.
Cignoli, Roberto
Let Γ be Mundici’s functor from the category
$${\mathcal{LG}}$$
whose objects are the latticeordered abelian groups (ℓgroups for short) with a distinguished strong order unit and the morphisms are the unital homomorphisms, onto the category
$${\mathcal{MV}}$$
of MValgebras and homomorphisms. It is shown that for each strong order unit u of an ℓgroup G, the Boolean skeleton of the MValgebra Γ(G, u) is isomorphic to the Boolean algebra of factor congruences of G.
Castaño, D.; Díaz Varela, J. P.; Torrens, A.
In this paper we prove that the free pseudocomplemented residuated lattices are decomposable if and only if they are Stone, i.e., if and only if they satisfy the identity ¬x ∨ ¬¬x = 1. Some applications are given.
Celani, Sergio A.
In this note we introduce the variety
$${{\mathcal C}{\mathcal D}{\mathcal M}_\square}$$
of classical modal De Morgan algebras as a generalization of the variety
$${{{\mathcal T}{\mathcal M}{\mathcal A}}}$$
of Tetravalent Modal algebras studied in [11]. We show that the variety
$${{\mathcal V}_0}$$
defined by H. P. Sankappanavar in [13], and the variety S of Involutive Stone algebras introduced by R. Cignoli and M. S de Gallego in [5], are examples of classical modal De Morgan algebras. We give a representation theory, and we study the regular filters, i.e., lattice filters closed under an implication operation. Finally we prove that the variety
$${{{\mathcal T}{\mathcal M}{\mathcal A}}}$$
has the Amalgamation Property and the Superamalgamation Property.
Campercholi, M.; Castaño, D.; Díaz Varela, J. P.
In this paper we study some questions concerning Łukasiewicz implication algebras. In particular, we show that every subquasivariety of Łukasiewicz implication algebras is, in fact, a variety. We also derive some characterizations of congruence permutable algebras. The starting point for these results is a representation of finite Łukasiewicz implication algebras as upwardlyclosed subsets in direct products of MVchains.
Díaz Varela, J. P.; López Martinolich, B. F.
There is a constructive method to define a structure of simple kcyclic Post algebra of order p, L_{p,k}, on a given finite field F(p^{k}), and conversely. There exists an interpretation Φ_{1} of the variety
$${\mathcal{V}(L_{p,k})}$$
generated by L_{p,k} into the variety
$${\mathcal{V}(F(p^k))}$$
generated by F(p^{k}) and an interpretation Φ_{2} of
$${\mathcal{V}(F(p^k))}$$
into
$${\mathcal{V}(L_{p,k})}$$
such that Φ_{2}Φ_{1}(B) = B for every
$${B \in \mathcal{V}(L_{p,k})}$$
and Φ_{1}Φ_{2}(R) = R for every
$${R \in \mathcal{V}(F(p^k))}$$
.
In this paper we show how we can solve an algebraic system of equations over an arbitrary cyclic Post algebra of order p, p prime, using the above interpretation, Gröbner bases and algorithms programmed in Maple.
Ledda, A.; Kowalski, T.; Paoli, F.
QuasiMV algebras are generalisations of MV algebras arising in quantum computational logic. Although a reasonably complete description of the lattice of subvarieties of quasiMV algebras has already been provided, the problem of extending this description to the setting of quasivarieties has so far remained open. Given its apparent logical repercussions, we tackle the issue in the present paper. We especially focus on quasivarieties whose generators either are subalgebras of the standard square quasiMV algebra S, or can be obtained therefrom through the addition of some fixpoints for the inverse.
Castiglioni, J. L.; San Martín, H. J.
This work extend to residuated lattices the results of [7]. It also provides a possible generalization to this context of frontal operators in the sense of [9].
Let L be a residuated lattice, and f : L^{k} → L a function. We give a necessary and sufficient condition for f to be compatible with respect to every congruence on L. We use this characterization of compatible functions in order to prove that the variety of residuated lattices is locally affine complete.
We study some compatible functions on residuated lattices which are a generalization of frontal operators. We also give conditions for two operations P(x, y) and Q(x, y) on a residuated lattice L which imply that the function
$${x \mapsto min\{y \in L : P(x, y) \leq Q(x, y)\}}$$
when defined, is equational and compatible. Finally we discuss the affine completeness of residuated lattices equipped with some additional operators.
Scannell, Kevin P.
Many languages in Africa are written using Latinbased scripts, but often with extra diacritics (e.g. dots below in Igbo:
$${\d i}, {\d o}, {\d u}$$
) or modifications to the letters themselves (e.g. open vowels “e” and “o” in Lingala: ɛ, ɔ). While it is possible to render these characters accurately in Unicode, oftentimes keyboard input methods are not easily accessible or are cumbersome to use, and so the vast majority of electronic texts in many African languages are written in plain ASCII. We call the process of converting an ASCII text to its proper Unicode form unicodification. This paper describes an opensource package which performs automatic unicodification, implementing a variant of an algorithm described in previous work of De Pauw, Wagacha, and de Schryver. We have trained models for more than 100 languages using web data, and have evaluated each language using a range of feature sets.
Barreiro, Anabela; Scott, Bernard; Kasper, Walter; Kiefer, Bernd
This paper reviews the OpenLogos rulebased machine translation system, and describes its model architecture as an incremental pipeline process. The paper also describes OpenLogos resources and their customization to specific application domains. One of the key aspects of rulebased machine translation systems intelligence is the symbology employed by these systems in representing natural language internally. The paper offers details about the OpenLogos semanticosyntactic abstract representation language known as SAL. The paper also shows how OpenLogos has addressed classic problems of rulebased machine translation, such as the cognitive complexity and ambiguity encountered in natural language processing, illustrating how SAL helps overcome them in ways distinct from other existing rulebased machine translation systems. The paper illustrates how the intelligence inherent in SAL contributes to translation quality, presenting examples of OpenLogos output of a kind that nonlinguistic systems would likely have difficulty emulating. The paper shows the unique manner in which OpenLogos applies the rulebase to the input stream and the kind of results produced that are characteristic of the OpenLogos output. Finally, the paper deals with an important advantage of rulebased machine translation systems, namely, the customization and adaption to applicationspecific needs with respect to their special terminology and transfer requirements. OpenLogos offers users a set of comfortable customization tools that do not require special knowledge of the system internals. An overview of the possibilities that these tools provide will be presented.
Forcada, Mikel L.; GinestíRosell, Mireia; Nordfalk, Jacob; O’Regan, Jim; OrtizRojas, Sergio; PérezOrtiz, Juan Antonio; SánchezMartínez, Felipe; RamírezSánchez, Gema; Tyers, Francis M.
Apertium is a free/opensource platform for rulebased machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with relatedlanguage pairs) where shallow transfer suffices to produce good quality translations, although it has also proven useful in assimilation scenarios with more distant pairs involved. This article summarises the Apertium platform: the translation engine, the encoding of linguistic data, and the tools developed around the platform. The present limitations of the platform and the challenges posed for the coming years are also discussed. Finally, evaluation results for some of the most active language pairs are presented. An appendix describes Apertium as a free/opensource project.
Farrús, Mireia; Costajussà, Marta R.; Mariño, José B.; Poch, Marc; Hernández, Adolfo; Henríquez, Carlos; Fonollosa, José A. R.
This work aims to improve an Ngrambased statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish–Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre and postprocessing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the SpanishtoCatalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource.
Wong, Wilson; Liu, Wei; Bennamoun, Mohammed
The role of the Web for text corpus construction is becoming increasingly significant. However, the contribution of the Web is largely confined to building a general virtual corpus or low quality specialised corpora. In this paper, we introduce a new technique called SPARTAN for constructing specialised corpora from the Web by systematically analysing website contents. Our evaluations show that the corpora constructed using our technique are independent of the search engines employed. In particular, SPARTANderived corpora outperform all corpora based on existing techniques for the task of term recognition.
Sak, Haşim; Güngör, Tunga; Saraçlar, Murat
We present a set of language resources and tools—a morphological parser, a morphological disambiguator, and a text corpus—for exploiting Turkish morphology in natural language processing applications. The morphological parser is a stateoftheart finitestate transducerbased implementation of Turkish morphology. The disambiguator is based on the averaged perceptron algorithm and has the best accuracy reported for Turkish in the literature. The text corpus has been compiled from the web and contains about 500 million tokens. This is the largest Turkish web corpus published.
Costajussà, Marta R.; Fonollosa, José A. R.; Monte, Enric
Statistical machine translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. In this paper, we show that SMT can take advantage of inductive learning in order to solve reordering problems. Given a word alignment, we identify those pairs of consecutive source blocks (sequences of words) whose translation is swapped, i.e. those blocks which, if swapped, generate a correct monotonic translation. Afterwards, we classify these pairs into groups, following recursively a cooccurrence block criterion, in order to infer reorderings. Inside the same group, we allow new internal combination in order to generalize the reorder to unseen pairs of blocks. Then, we identify the pairs of blocks in the source corpora (both training and test) which belong to the same group. We swap them and we use the modified source training corpora to realign and to build the final translation system. We have evaluated our reordering approach both in alignment and translation quality. In addition, we have used two stateoftheart SMT systems: a Phrasedbased and an Ngrambased. Experiments are reported on the EuroParl task, showing improvements almost over 1 point in the standard MT evaluation metrics (mWER and BLEU).
Gamallo, Pablo; Bordag, Stefan
In this paper, we analyze the behaviour of Singular Value Decomposition in a number of word similarity extraction tasks, namely acquisition of translation equivalents from comparable corpora. Special attention is paid to two different aspects: computational efficiency and extraction quality. The main objective of the paper is to describe several experiments comparing methods based on Singular Value Decomposition (SVD) to other strategies. The results lead us to conclude that SVD makes the extraction less computationally efficient and much less precise than other more basic models for the task of extracting translation equivalents from comparable corpora.
Pociello, Elisabete; Agirre, Eneko; Aldezabal, Izaskun
Semantic interpretation of language requires extensive and rich lexical knowledge bases (LKB). The Basque WordNet is a LKB based on WordNet and its multilingual counterparts EuroWordNet and the Multilingual Central Repository. This paper reviews the theoretical and practical aspects of the Basque WordNet lexical knowledge base, as well as the steps and methodology followed in its construction. Our methodology is based on the joint development of wordnets and annotated corpora. The Basque WordNet contains 32,456 synsets and 26,565 lemmas, and is complemented by a handtagged corpus comprising 59,968 annotations.
Bijankhan, Mahmood; Sheykhzadegan, Javad; Bahrani, Mohammad; Ghayoomi, Masood
This paper addresses some of the issues learned during the course of building a written language resource, called ‘Peykare’, for the contemporary Persian. After defining five linguistic varieties and 24 different registers based on these linguistic varieties, we collected texts for Peykare to do a linguistic analysis, including crossregister differences. For tokenization of Persian, we propose a descriptive generalization to normalize orthographic variations existing in texts. To annotate Peykare, we use EAGLES guidelines which result to have a hierarchy in the partofspeech tags. To this aim, we apply a semiautomatic approach for the annotation methodology. In the paper, we also give a special attention to the Ezafe construction and homographs which are important in Persian text analyses.
Boccuni, Francesca
PG (Plural Grundgesetze) is a predicative monadic secondorder system which is aimed to derive secondorder Peano arithmetic. It exploits the notion of plural quantification and a few Fregean devices, among which the infamous Basic Law V. In this paper, a modeltheoretical consistency proof for the system PG is provided.
Pambuccian, Victor
Using the axiom system provided by Carsten Augat in [1], it is shown that the only 6variable statement among the axioms of the axiom system for plane hyperbolic geometry (in Tarski’s language L_{B≡}), we had provided in [3], is superfluous. The resulting axiom system is the simplest possible one, in the sense that each axiom is a statement in prenex form about at most 5 points, and there is no axiom system consisting entirely of at most 4variable statements.
Seki, Takahiro
The γadmissibility is one of the most important problems in the realm of relevant logics. To prove the γadmissibility, either the method of normal models or the method using metavaluations may be employed. The γadmissibility of a wide class of relevant modal logics has been discussed in Part I based on a former method, but the γadmissibility based on metavaluations has not hitherto been fully considered. Sahlqvist axioms are well known as a means of expressing generalized forms of formulas with modal operators. This paper shows that γ is admissible for relevant modal logics with restricted Sahlqvist axioms in terms of the method using metavaluations.
Shtakser, G.; Leonenko, L.
It is known that the Restricted Predicate Calculus (RPC) can be embedded in an elementary theory, the signature of which consists of exactly two equivalences. Some special models for the mentioned theory were constructed to prove this fact. Besides formal adequacy of these models, a question may be posed concerning their conceptual simplicity, “transparency” of interpretations they assigned to the two stated equivalences. In works known to us these interpretations are rather complex, and can be called “technical”, serving only the purpose of embedding. We propose a conversion method, which transforms an arbitrary model of RPC into some model of the elementary theory TR, which includes three equivalences. RPC is embeddable in TR, and it appears possible to assign some “natural” interpretations to three equivalences using the “Track of Relation” concept (abbreviated to TR).
Bourdaillet, Julien; Huet, Stéphane; Langlais, Philippe; Lapalme, Guy
As basic as bilingual concordancers may appear, they are some of the most widely used computerassisted translation tools among professional translators. Nevertheless, they still do not benefit from recent breakthroughs in machine translation. This paper describes the improvement of the commercial bilingual concordancer TransSearch in order to embed a word alignment feature. The use of statistical word alignment methods allows the system to spot user query translations, and thus the tool is transformed into a translation search engine. We describe several translation identification and postprocessing algorithms that enhance the application. The excellent results obtained using a large translation memory consisting of 8.3 million sentence pairs are confirmed via human evaluation.
Clough, Paul; Stevenson, Mark
Plagiarism is widely acknowledged to be a significant and increasing problem for higher education institutions (McCabe 2005; Judge 2008). A wide range of solutions, including several commercial systems, have been proposed to assist the educator in the task of identifying plagiarised work, or even to detect them automatically. Direct comparison of these systems is made difficult by the problems in obtaining genuine examples of plagiarised student work. We describe our initial experiences with constructing a corpus consisting of answers to short questions in which plagiarism has been simulated. This corpus is designed to represent types of plagiarism that are not included in existing corpora and will be a useful addition to the set of resources available for the evaluation of plagiarism detection systems.
