Biçici, Ergun; Way, Andy
Referential translation machines (RTMs) are a computational model effective at judging monolingual and bilingual similarity while identifying translation acts between any two data sets with respect to interpretants, data close to the task instances. RTMs pioneer a languageindependent approach to all similarity tasks and remove the need to access any task or domainspecific information or resource. We use RTMs for predicting the semantic similarity of text and present stateoftheart results showing that RTMs can achieve better results on the test set than on the training set. Interpretants are used to derive features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and the presence of the acts of translation, which may ubiquitously be observed in communication. RTMs can achieve top performance at SemEval in various semantic similarity prediction tasks as well as similarity prediction tasks in bilingual settings. We obtain rankings of various prediction tasks using the performance of RTM and relative evaluation metrics, which can help identify which tasks and subtasks require more work by design.
Li, Liangyou; Parra Escartín, Carla; Way, Andy; Liu, Qun
The combination of translation memories (TMs) and statistical machine translation (SMT) has been demonstrated to be beneficial. In this paper, we present a combination approach which integrates TMs into SMT by using sparse features extracted at runtime during decoding. These features can be used on both phrasebased SMT and syntaxbased SMT. We conducted experiments on a publicly available English–French data set and an English–Spanish industrial data set. Our experimental results show that these features significantly improve our phrasebased and syntaxbased SMT baselines on both language pairs.
Zubiaga, Arkaitz; Vicente, Iñaki San; Gamallo, Pablo; Pichel, José Ramom; Alegria, Iñaki; Aranberri, Nora; Ezeiza, Aitzol; Fresno, Víctor
Language identification, as the task of determining the language a given text is written in, has progressed substantially in recent decades. However, three main issues remain still unresolved: (1) distinction of similar languages, (2) detection of multilingualism in a single document, and (3) identifying the language of short texts. In this paper, we describe our work on the development of a benchmark to encourage further research in these three directions, set forth an evaluation framework suitable for the task, and make a dataset of annotated tweets publicly available for research purposes. We also describe the shared task we organized to validate and assess the evaluation framework and dataset with systems submitted by seven different participants, and analyze the performance of these systems. The evaluation of the results submitted by the participants of the shared task helped us shed some light on the shortcomings of stateoftheart language identification systems, and gives insight into the extent to which the brevity, multilingualism, and language similarity found in texts exacerbate the performance of language identifiers. Our dataset with nearly 35,000 tweets and the evaluation framework provide researchers and practitioners with suitable resources to further study the aforementioned issues on language identification within a common setting that enables to compare results with one another.
Barbu, Eduard; Parra Escartín, Carla; Bentivogli, Luisa; Negri, Matteo; Turchi, Marco; Orasan, Constantin; Federico, Marcello
This paper reports on the organization and results of the first Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at finding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchersoriented survey aimed at gathering information about the opinion of participants on the shared task, the translatorsoriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys.
Silfverberg, Miikka; Ruokolainen, Teemu; Lindén, Krister; Kurimo, Mikko
This paper describes FinnPos, an opensource morphological tagging and lemmatization toolkit for Finnish. The morphological tagging model is based on the averaged structured perceptron classifier. Given training data, new taggers are estimated in a computationally efficient manner using a combination of beam search and model cascade. The lemmatization is performed employing a combination of a rulebased morphological analyzer, OMorFi, and a datadriven lemmatization model. The toolkit is readily applicable for tagging and lemmatization of running text with models learned from the recently published Finnish Turku Dependency Treebank and FinnTreeBank. Empirical evaluation on these corpora shows that FinnPos performs favorably compared to reference systems in terms of tagging and lemmatization accuracy. In addition, we demonstrate that our system is highly competitive with regard to computational efficiency of learning new models and assigning analyses to novel sentences.
Wolff, Friedel
We present a system to identify erroneous entries in a translation memory. It is a machine learning system that learns to classify entries according to either a strict or a permissive view on correctness. It is trained on features relating to segment length, translation quality checks, spelling and grammar errors, and additionally uses external data for detecting problems with fluency and lexical choice.
Wolff, Friedel; Pretorius, Laurette; Dugast, Loïc; Buitelaar, Paul
A translation memory system attempts to retrieve useful suggestions from previous translations to assist a translator in a new translation task. While assisting the translator with a specific segment, some similarity metric is usually employed to select the best matches from previously translated segments to present to a translator. Automated methods for evaluating a translation memory system usually use reference translations and some similarity metric. Such evaluation methods might be expected to assist in choosing between competing systems. No single evaluation method has gained widespread use; additionally the similarity metric used in each of these methods is not standardised either. This paper investigates the consequences of substituting the similarity metric in such an evaluation method, and finds that the similarity metrics exhibit a strong bias for the system using the same metric for retrieval. Consequently the choice of similarity metric in the evaluation of translation memory systems should be carefully reconsidered.
ElviraGarcía, Wendy; Roseano, Paolo; FernándezPlanas, Ana María; MartínezCeldrán, Eugenio
Show all (4)
This article presents Eti_ToBI, a tool that automatically labels intonational events in Spanish and Catalan utterances according to the Sp_ToBI and Cat_ToBI current conventions. The system consists in a Praat script that assigns ToBI labels to pitch movements basing the assignments on lexical data introduced by the researcher and the acoustical data that it extracts from sound files. The first part of the article explains the methodological approach that has made possible the automatisation and describes the algorithms used by the script to perform the analysis. The second part presents the reliability results for both Catalan and Spanish corpora showing a level of agreement equal to the one shown by human transcribers among them in the literature.
Mollá, Diego; SantiagoMartínez, María Elena; Sarker, Abeed; Paris, Cécile
Show all (4)
Evidence based medicine (EBM) urges the medical doctor to incorporate the latest available clinical evidence at point of care. A major stumbling block in the practice of EBM is the difficulty to keep up to date with the clinical advances. In this paper we describe a corpus designed for the development and testing of text processing tools for EBM, in particular for tasks related to the extraction and summarisation of answers and corresponding evidence related to a clinical query. The corpus is based on material from the Clinical Inquiries section of The Journal of Family Practice. It was gathered and annotated by a combination of automated information extraction, crowdsourcing tasks, and manual annotation. It has been used for the original summarisation task for which it was designed, as well as for other related tasks such as the appraisal of clinical evidence and the clustering of the results. The corpus is available at SourceForge (
http://sourceforge.net/projects/ebmsumcorpus/
).
Castaño, Diego; Cornejo, Juan Manuel
The variety
$${\mathcal{SH}}$$
of semiHeyting algebras was introduced by H. P. Sankappanavar (in: Proceedings of the 9th “Dr. Antonio A. R. Monteiro” Congress, Universidad Nacional del Sur, Bahía Blanca, 2008) [13] as an abstraction of the variety of Heyting algebras. SemiHeyting algebras are the algebraic models for a logic HsH, known as semiintuitionistic logic, which is equivalent to the one defined by a Hilbert style calculus in Cornejo (Studia Logica 98(1–2):9–25, 2011) [6]. In this article we introduce a Gentzen style sequent calculus GsH for the semiintuitionistic logic whose associated logic
GsH is the same as HsH. The advantage of this presentation of the logic is that we can prove a cutelimination theorem for GsH that allows us to prove the decidability of the logic. As a direct consequence, we also obtain the decidability of the equational theory of semiHeyting algebras.
Onishi, Takuro
A starfree relational semantics for relevant logic is presented together with a sound and complete sequent proof theory (display calculus). It is an extension of the dualist approach to negation regarded as modality, according to which de Morgan negation in relevant logic is better understood as the confusion of two negative modalities. The present work shows a way to define them in terms of implication and a new connective, coimplication, which is modeled by respective ternary relations. The defined negations are confused by a special constraint on ternary relation, called the generalized star postulate, which implies definability of the Routley star in the frame. The resultant logic is shown to be equivalent to the wellknown relevant logic R. Thus it can be seen as a reconstruction of R in the dualist framework.
Bezhanishvili, Guram; Bezhanishvili, Nick; Ilin, Julia
We generalize the
$${(\wedge, \vee)}$$
canonical formulas to
$${(\wedge, \vee)}$$
canonical rules, and prove that each intuitionistic multiconclusion consequence relation is axiomatizable by
$${(\wedge, \vee)}$$
canonical rules. This yields a convenient characterization of stable superintuitionistic logics. The
$${(\wedge, \vee)}$$
canonical formulas are analogues of the
$${(\wedge,\to)}$$
canonical formulas, which are the algebraic counterpart of Zakharyaschev’s canonical formulas for superintuitionistic logics (silogics for short). Consequently, stable silogics are analogues of subframe silogics. We introduce cofinal stable intuitionistic multiconclusion consequence relations and cofinal stable silogics, thus answering the question of what the analogues of cofinal subframe logics should be. This is done by utilizing the
$${(\wedge,\vee,\neg)}$$
reduct of Heyting algebras. We prove that every cofinal stable silogic has the finite model property, and that there are continuum many cofinal stable silogics that are not stable. We conclude with several examples showing the similarities and differences between the classes of stable, cofinal stable, subframe, and cofinal subframe silogics.
Bazhenov, Nikolay
We investigate effective categoricity for polymodal algebras (i.e., Boolean algebras with distinguished modalities). We prove that the class of polymodal algebras is complete with respect to degree spectra of nontrivial structures, effective dimensions, expansion by constants, and degree spectra of relations. In particular, this implies that every categoricity spectrum is the categoricity spectrum of a polymodal algebra.
more …
Eva, Benjamin
Topos quantum theory (TQT) represents a whole new approach to the formalization of nonrelativistic quantum theory. It is well known that TQT replaces the orthomodular quantum logic of the traditional Hilbert space formalism with a new intuitionistic logic that arises naturally from the topos theoretic structure of the theory. However, it is less well known that TQT also has a dual logical structure that is paraconsistent. In this paper, we investigate the relationship between these two logical structures and study the implications of this relationship for the definition of modal operators in TQT.
Marti, J.; Pinosio, R.
In this paper we introduce a game semantics for System P, one of the most studied axiomatic systems for nonmonotonic reasoning, conditional logic and belief revision. We prove soundness and completeness of the game semantics with respect to the rules of System P, and show that an inference is valid with respect to the game semantics if and only if it is valid with respect to the standard order semantics of System P. Combining these two results leads to a new completeness proof for System P with respect to its order semantics. Our approach allows us to construct for every inference either a concrete proof of the inference from the rules in System P or a countermodel in the order semantics. Our results rely on the notion of a witnessing set for an inference, whose existence is a concise, necessary and sufficient condition for validity of an inferences in System P. We also introduce an infinitary variant of System P and use the game semantics to show its completeness for the restricted class of wellfounded orders.
Giuntini, Roberto; Ledda, Antonio; Paoli, Francesco
We investigate certain BrouwerZadeh lattices that serve as abstract counterparts of lattices of effects in Hilbert spaces under the spectral ordering. These algebras, called PBZ*lattices, can also be seen as generalisations of orthomodular lattices and are remarkable for the collapse of three notions of “sharpness” that are distinct in general BrouwerZadeh lattices. We investigate the structure theory of PBZ*lattices and their reducts; in particular, we prove some embedding results for PBZ*lattices and provide an initial description of the lattice of PBZ*varieties.
Přenosil, Adam
The proofs of some results of abstract algebraic logic, in particular of the transfer principle of Czelakowski, assume the existence of socalled natural extensions of a logic by a set of new variables. Various constructions of natural extensions, claimed to be equivalent, may be found in the literature. In particular, these include a syntactic construction due to Shoesmith and Smiley and a related construction due to Łoś and Suszko. However, it was recently observed by Cintula and Noguera that both of these constructions fail in the sense that they do not necessarily yield a logic. Here we show that whenever the Łoś–Suszko construction yields a logic, so does the Shoesmith–Smiley construction, but not vice versa. We also describe the smallest and the largest conservative extension of a logic by a set of new variables and show that contrary to some previous claims in the literature, a logic of cardinality
$${\kappa}$$
may have more than one conservative extension of cardinality
$${\kappa}$$
by a set of new variables. In this connection we then correct a mistake in the formulation of a theorem of Dellunde and Jansana.
Goudsmit, Jeroen P.
Many intermediate logics, even extremely wellbehaved ones such as IPC, lack the finite model property for admissible rules. We give conditions under which this failure holds. We show that frames which validate all admissible rules necessarily satisfy a certain closure condition, and we prove that this condition, in the finite case, ensures that the frame is of width 2. Finally, we indicate how this result is related to some classical results on finite, free Heyting algebras.
Nakazawa, Koji; Fujita, Kenetsu
This paper gives new confluence proofs for several lambda calculi with permutationlike reduction, including lambda calculi corresponding to intuitionistic and classical natural deduction with disjunction and permutative conversions, and a lambda calculus with explicit substitutions. For lambda calculi with permutative conversion, naïve parallel reduction technique does not work, and (if we consider untyped terms, and hence we do not use strong normalization) traditional notion of residuals is required as Ando pointed out. This paper shows that the difficulties can be avoided by extending the technique proposed by Dehornoy and van Oostrom, called the Z theorem: existence of a mapping on terms with the Z property concludes the confluence. Since it is still hard to directly define a mapping with the Z property for the lambda calculi with permutative conversions, this paper extends the Z theorem to compositional functions, called compositional Z, and shows that we can adopt it to the calculi.
Joosten, Joost J.
Turing progressions have been often used to measure the prooftheoretic strength of mathematical theories: iterate adding consistency of some weak base theory until you “hit” the target theory. Turing progressions based on nconsistency give rise to a
$${\Pi_{n+1}}$$
prooftheoretic ordinal
$${U_{\Pi^0_{n+1}}}$$
also denoted
$${U_n}$$
. As such, to each theory U we can assign the sequence of corresponding
$${\Pi_{n+1}}$$
ordinals
$${\langle U_n\rangle_{n > 0}}$$
. We call this sequence a TuringTaylor expansion or spectrum of a theory. In this paper, we relate TuringTaylor expansions of subtheories of Peano Arithmetic to Ignatiev’s universal model for the closed fragment of the polymodal provability logic
$${\mathsf{GLP}_\omega}$$
. In particular, we observe that each point in the Ignatiev model can be seen as TuringTaylor expansions of formal mathematical theories. Moreover, each subtheory of Peano Arithmetic that allows for a TuringTaylor expansion will define a unique point in Ignatiev’s model.
Hammo, Bassam; Yagi, Sane; Ismail, Omaima; AbuShariah, Mohammad
This paper presents a historical Arabic corpus named HAC. At this early embryonic stage of the project, we report about the design, the architecture and some of the experiments which we have conducted on HAC. The corpus, and accordingly the search results, will be represented using a primary XML exchange format. This will serve as an intermediate exchange tool within the project and will allow the user to process the results offline using some external tools. HAC is made up of Classical Arabic texts that cover 1600 years of language use; the Quranic text, Modern Standard Arabic texts, as well as a variety of monolingual Arabic dictionaries. The development of this historical corpus assists linguists and Arabic language learners to effectively explore, understand, and discover interesting knowledge hidden in millions of instances of language use. We used techniques from the field of natural language processing to process the data and a graphbased representation for the corpus. We provided researchers with an export facility to render further linguistic analysis possible.
Bawden, Rachel; Clavel, Chloé; Landragin, Frédéric
We present a corpusbased prosodic analysis with the aim of uncovering the relationship between dialogue acts, personality and prosody in view to providing guidelines for the ECA Greta’s texttospeech system. The corpus used is the SEMAINE corpus, featuring four different personalities, further annotated for dialogue acts and prosodic features. In order to show the importance of the choice of dialogue act taxonomy, two different taxonomies were used, the first corresponding to Searle’s taxonomy of speech acts and the second, inspired by Bunt’s DIT++, including a division of directive acts into finer categories. Our results show that finergrained distinctions are important when choosing a taxonomy. We also show with some preliminary results that the prosodic correlates of dialogue acts are not always as cited in the literature and prove more complex and variable. By studying the realisation of different directive acts, we also observe differences in the communicative strategies of the ECA depending on personality, in view to providing input to a speech system.
Takano, Mitio
Sequent calculi for trilattice logics, including those that are determined by the truth entailment, the falsity entailment and their intersection, are given. This partly answers the problems in ShramkoWansing (J Philos Logic 34:121–153, 2005).
Badia, Guillermo
Biintuitionistic logic is the result of adding the dual of intuitionistic implication to intuitionistic logic. In this note, we characterize the expressive power of this logic by showing that the first order formulas equivalent to translations of biintuitionistic propositional formulas are exactly those preserved under biintuitionistic directed bisimulations. The proof technique is originally due to Lindström and, in contrast to the most common proofs of this kind of result, it does not use the machinery of neither saturated models nor elementary chains.
Torrens, Antoni
In any variety of bounded integral residuated latticeordered commutative monoids (bounded residuated lattices for short) the class of its semisimple members is closed under isomorphic images, subalgebras and products, but it is not closed under homomorphic images, and so it is not a variety. In this paper we study varieties of bounded residuated lattices whose semisimple members form a variety, and we give an equational presentation for them. We also study locally representable varieties whose semisimple members form a variety. Finally, we analyze the relationship with the property “to have radical term”, especially for kradical varieties, and for the hierarchy of varieties (WL_{k})_{k>0} defined in Cignoli and Torrens (Studia Logica 100:1107–1136, 2012 [7]).
Straßer, Christian; Beirlaen, Mathieu; Van De Putte, Frederik
We translate unconstrained and constrained input/output logics as introduced by Makinson and van der Torre to modal logics, using adaptive logics for the constrained case. The resulting reformulation has some additional benefits. First, we obtain a prooftheoretic (dynamic) characterization of input/output logics. Second, we demonstrate that our framework naturally gives rise to useful variants and allows to express important notions that go beyond the expressive means of input/output logics, such as violations and sanctions.
Urzyczyn, Paweł
We investigate a simple game paradigm for intuitionistic logic, inspired by Wajsberg’s implicit inhabitation algorithm and Beth tableaux. The principal idea is that one player, ∃ros, is trying to construct a proof in normal form (positions in the game represent his progress in proof construction) while his opponent, ∀phrodite, attempts to build a countermodel (positions or plays can be seen as states in a Kripke model). The determinacy of the game (a proofconstruction and a modelconstruction game in one) implies therefore both completeness and semantic cutelimination.
Poggiolesi, Francesca
In this paper we present labelled sequent calculi and labelled natural deduction calculi for the counterfactual logics CK + {ID, MP}. As for the sequent calculi we prove, in a semantic manner, that the cutrule is admissible. As for the natural deduction calculi we prove, in a purely syntactic way, the normalization theorem. Finally, we demonstrate that both calculi are sound and complete with respect to Nute semantics [12] and that the natural deduction calculi can be effectively transformed into the sequent calculi.
Orellano, Aldo Figallo
In this paper we investigate the class of MValgebras equipped with two quantifiers which commute as a natural generalization of diagonalfree twodimensional cylindric algebras (see Henkin et al., in Cylindric algebras, 1985). In the 40s, Tarski first introduced cylindric algebras in order to provide an algebraic apparatus for the study of classical predicate calculus. The diagonal–free twodimensional cylindric algebras are special cylindric algebras. The treatment here of MValgebras is done in terms of implication and negation. This allows us to simplify some results due to Di Nola and Grigolia (Ann Pure Appl Logic 128(13):125–139, 2004) related to the characterization of a quantifier in terms of some special subalgebra associated to it. On the other hand, we present a topological duality for this class of algebras and we apply it to characterize the congruences of one algebra via certain closed sets. Finally, we study the subvariety of this class generated by a chain of length n + 1 (n < ω). We prove that the subvariety is semisimple and we characterize their simple algebras. Using a special functional algebra, we determine all the simple finite algebras of this subvariety.
Asheghi, Noushin Rezapour; Sharoff, Serge; Markert, Katja
Recently, genre collection and automatic genre identification for the web has attracted much attention. However, currently there is no genreannotated corpus of web pages where interannotator reliability has been established, i.e. the corpora are either not tested for interannotator reliability or exhibit low intercoder agreement. Annotation has also mostly been carried out by a small number of experts, leading to concerns with regard to scalability of these annotation efforts and transferability of the schemes to annotators outside these small expert groups. In this paper, we tackle these problems by using crowdsourcing for genre annotation, leading to the Leeds Web Genre Corpus—the first web corpus which is, demonstrably reliably annotated for genre and which can be easily and costeffectively expanded using naive annotators. We also show that the corpus is source and topic diverse.
Taulé, Mariona; Peris, Aina; Rodríguez, Horacio
This article presents the Spanish IargAnCora corpus (400 kwords, 13,883 sentences) annotated with the implicit arguments of deverbal nominalizations (18,397 occurrences). We describe the methodology used to create it, focusing on the annotation scheme and criteria adopted. The corpus was manually annotated and an interannotator agreement test was conducted (81 % observed agreement) in order to ensure the reliability of the final resource. The annotation of implicit arguments results in an important gain in argument and thematic role coverage (128 % on average). It is the first corpus annotated with implicit arguments for the Spanish language with a wide coverage that is freely available. This corpus can subsequently be used by machine learningbased semantic role labeling systems, and for the linguistic analysis of implicit arguments grounded on real data. Semantic analyzers are essential components of current language technology applications, which need to obtain a deeper understanding of the text in order to make inferences at the highest level to obtain qualitative improvements in the results.
Bai, Xiaopeng; Xue, Nianwen
The Chinese Proposition Bank (CPB) is a
corpus annotated with semantic roles for the arguments of verbal and nominalized predicates. The semantic roles for the core arguments are defined in a predicatespecific manner. That is, a set of semantic roles, numerically identified, are defined for each sense of a predicate lemma and recorded in a valency lexicon called frame files. The predicatespecific manner in which the semantic roles are defined reduces the cognitive burden on the annotators since they only need to internalize a few roles at a time and this has contributed to the consistency in annotation. It was also a sensible approach given the contentious issue of how many semantic roles are needed if one were to adopt of set of global semantic roles that apply to all predicates. A downside of this approach, however, is that the predicatespecific roles may not be consistent across predicates, and this inconsistency has a negative impact on training automatic systems. Given the progress that has been made in defining semantic roles in the last decade or so, time is ripe for adopting a set of general semantic roles. In this article, we describe our effort to “reannotate” the CPB with a set of “global” semantic roles that are predicateindependent and investigate their impact on automatic semantic role labeling systems. When defining these global semantic roles, we strive to make them compatible with a recently published ISO standards on the annotation of semantic roles (ISO 246174:2014 SemAFSR) while taking the linguistic characteristics of the Chinese language into account. We show that in spite of the much larger number of global semantic roles, the accuracy of an offtheshelf semantic role labeling system retrained on the data reannotated with global semantic roles is comparable to that trained on the data set with the original predicatespecific semantic roles. We also argue that the reannotated data set, together with the original data, provides the user with more flexibility when using the corpus.
Dehkharghani, Rahim; Saygin, Yucel; Yanikoglu, Berrin; Oflazer, Kemal
Sentiment analysis aims to extract the sentiment polarity of given segment of text. Polarity resources that indicate the sentiment polarity of words are commonly used in different approaches. While English is the richest language in regard to having such resources, the majority of other languages, including Turkish, lack polarity resources. In this work we present the first comprehensive Turkish polarity resource, SentiTurkNet, where three polarity scores are assigned to each synset in the Turkish WordNet, indicating its positivity, negativity, and objectivity (neutrality) levels. Our method is general and applicable to other languages. Evaluation results for Turkish show that the polarity scores obtained through this method are more accurate compared to those obtained through direct translation (mapping) from SentiWordNet.
Savkov, Aleksandar; Carroll, John; Koeling, Rob; Cassell, Jackie
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and nonstandard word order. To support information extraction and classification tasks over such text, we describe a deidentified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.
Reimerink, Arianne; LeónAraúz, Pilar; Faber, Pamela
Images play an important role in the representation and acquisition of specialized knowledge. Not surprisingly, terminological knowledge bases (TKBs) often include images as a way to enhance the information in concept entries. However, the selection of these images should not be random, but rather based on specific guidelines that take into account the type and nature of the concept being described. This paper presents a proposal on how to combine the features of images with the conceptual propositions in EcoLexicon, a multilingual TKB on the environment. This proposal is based on the following: (1) the combinatory possibilities of concept types; (2) image types, such as photographs, drawings and flow charts; (3) morphological features or visual knowledge patterns (VKPs), such as labels, colours, arrows, and their effect on the functional nature of each image type. Currently, images are stored in association with concept entries according to the semantic content of their definitions, but they are not described or annotated according to the parameters that guided their selection, which would undoubtedly contribute to the systematization and automatization of the process. First, the images included in EcoLexicon were analyzed in terms of their adequateness, the semantic relations expressed, the concept types and their VKPs. Then, with these data, guidelines for image selection and annotation were created. The final aim is twofold: (1) to systematize the selection of images and (2) to start annotating old and new images so that the system can automatically allocate them in different concept entries based on shared conceptual propositions.
Bimba, Andrew; Idris, Norisma; Khamis, Norazlina; Noor, Nurul Fazmidar Mohd
Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their preprocessing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affixstripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference lookup consisting of 1500 Hausa root words. The overstemming index, understemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference lookup on the strength and accuracy of the stemmer. It was observed that reference lookup aided in reducing both overstemming and understemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified.
Żelasko, Piotr; Ziółko, Bartosz; Jadczyk, Tomasz; Skurzok, Dawid
A corpus of Polish speech, which has been collected for the purpose of automatic speech recognition (ASR) and texttospeech (TTS) systems applications, is presented. The corpus consists of several groups of recordings: read sentences, spoken commands, a phonetically balanced TTS training corpus, telephonic speech and others. In summary duration of recordings is above 25 h. Number of unique speakers amounts to 166. The majority of them being in an age group of 20–35 and one third of them being female.
Analysis of unique word occurrence frequency in relation to larger text resources has been concluded. From them, most commonly appearing words have been found and presented. The corpus was used as training data for the ASR system. Results of crossvalidation training and testing the SARMATA ASR system using our corpus have shown that phrase recognition rate is 91.9 %. The corpus was additionally evaluated in comparative test against the CORPORA corpus, which had shown major increase in phrase recognition rate in favour of our corpus.
Su, Hang
2 Citations
This study, drawing on insights from the Appraisal framework, the parameterbased approach to evaluation and corpus linguistics, investigates the evaluative language used in customer review texts. The primary goal of this investigation is to develop a framework of evaluation that can be used to account adequately for evaluative expressions in customer review texts, and the ultimate goal is to support the argument that the modelling and theorising of evaluation is contextspecific. Based on the investigation into a corpus compiled of review texts retrieved from
www.amazon.co.uk
, this study proposes a datadriven, parameterbased and appraisalinformed framework of evaluation which comprises four parameters—Quality, Satisfactoriness, Recommendability and Worthiness. Since these parameters are not thoughtup, but are generalised from real data, it is arguable that the proposed framework of evaluation is certainly valid and thus can be used to describe and analyse evaluative language used in this particular context. This in turn indicates that the description and theorising of evaluation is indeed highly dependent on the discourse type that is under examination.
Metallinou, Angeliki; Yang, Zhaojun; Lee, Chichun; Busso, Carlos; Carnicke, Sharon; Narayanan, Shrikanth
Improvised acting is a viable technique to study expressive human communication and to shed light into actors’ creativity. The USC CreativeIT database provides a novel, freelyavailable multimodal resource for the study of theatrical improvisation and rich expressive human behavior (speech and body language) in dyadic interactions. The theoretical design of the database is based on the wellestablished improvisation technique of Active Analysis in order to provide naturally induced affective and expressive, goaldriven interactions. This database contains dyadic theatrical improvisations performed by 16 actors, providing detailed full body motion capture data and audio data of each participant in an interaction. The carefully engineered data collection, the improvisation design to elicit natural emotions and expressive speech and body language, as well as the welldeveloped annotation processes provide a gateway to study and model various aspects of theatrical performance, expressive behaviors and human communication and interaction.
Maurer, Harald
Based on the mathematics of nonlinear Dynamical System Theory, neurocognition can be analyzed by convergent fluid and transient neurodynamics in abstract ndimensional system phase spaces in the form of nonlinear vector fields, vector streams or vector flows (the socalled “vectorial form”). This processual or dynamical perspective on cognition, including the dynamical binding mechanisms in cognitive neuroarchitectures, has the advantage of a more accurately modeling of the transient cognitive processes. Thus, neurocognition can be considered as being organized by integrative synchronization mechanisms which best explain the liquid flow of neurocognitive information orchestrated in a network of positive and/or negative feedback loops in the subcortical and cortical areas. The human neurocognitive system can be regarded as a nonlinear, dynamical and open nonequilibrium system. This new fluid or liquid perspective in cognitive science and cognitive neuroscience can be regarded as a contribution towards bridging the gap between the discrete, abstract symbolic description of propositions in the mind, and their continuous, numerical implementation in selforganizing neural networks modelling the neural information processing in the human brain.
Verma, Gautam; Sharma, Shubham; Goyal, Rinkaj
Background
Communication and sharing of opinions play a crucial role in shaping the views of a person in a society. Interactions with other people enable a person to interpret their views and expound his opinion. Ordinarily, people tend to change their opinions in compliance with those having significantly higher expertise thereby leading to a bipartite society of two intellectual groups i.e. mavens (highly intellectual and confident people) and laypeople (diffident people with little or no experience and knowledge). However, the sharing of information in a group is influenced by the weight of advice with which people consider opinion of others and several control factors like interaction procedure adopted, possibility of mutual exchange of information, and the time at which information is updated. Moreover, the effects of these factors are observable in both physical and digital societies during opinion formation. This study is build upon the prior work of Moussad et al. (PLoS ONE 8:78433, 2013).
Findings
In this study, we use agent based modeling to analyze five types of interaction (including ideal cases) using an integrated selection process to empirically investigate the influence of above mentioned control factors in such a society. Through the simulations, we identify the minimum number of iterations required to reach an agreement in such a group of people and the critical proportion of the respective group to become observable in the opinion formation under different scenarios.
Conclusions
We observe that increasing the weight of advice has a positive effect on the quality of consensus reached as well as the speed of convergence of crowd towards an opinion. Furthermore, the interaction procedure adopted plays a dominant role in demarcating the critical proportions of the groups to dominate the consensus.
Ågotnes, T.; Ditmarsch, H.; French, T.
This paper demonstrates the undecidability of a number of logics with quantification over public announcements: arbitrary public announcement logic (APAL), group announcement logic (GAL), and coalition announcement logic (CAL). In APAL we consider the informative consequences of any announcement, in GAL we consider the informative consequences of a group of agents (this group may be a proper subset of the set of all agents) all of which are simultaneously (and publicly) making known announcements. So this is more restrictive than APAL. Finally, CAL is as GAL except that we now quantify over anything the agents not in that group may announce simultaneously as well. The logic CAL therefore has some features of game logic and of ATL. We show that when there are multiple agents in the language, the satisfiability problem is undecidable for APAL, GAL, and CAL. In the single agent case, the satisfiability problem is decidable for all three logics.
Lorini, Emiliano; Sartor, Giovanni
In this paper we propose a method for modeling social influence within the STIT approach to action. Our proposal consists in extending the STIT language with special operators that allow us to represent the consequences of an agent’s choices over the rational choices of another agent.
WoźnaSzcześniak, Bożena; Zbrzezny, Andrzej
We investigate a SATbased bounded model checking (BMC) method for EMTLK (the existential fragment of the metric temporal logic with knowledge) that is interpreted over timed models generated by timed interpreted systems. In particular, we translate the existential model checking problem for EMTLK to the existential model checking problem for a variant of linear temporal logic (called HLTLK), and we provide a SATbased BMC technique for HLTLK. We evaluated the performance of our BMC by means of a variant of a timed generic pipeline paradigm scenario and a timed train controller system.
Chen, Taolue; Primiero, Giuseppe; Raimondi, Franco; Rungta, Neha
Show all (4)
Modelling, reasoning and verifying complex situations involving a system of agents is crucial in all phases of the development of a number of safetycritical systems. In particular, it is of fundamental importance to have tools and techniques to reason about the doxastic and epistemic states of agents, to make sure that the agents behave as intended. In this paper we introduce a computationally grounded logic called COGWED and we present two types of semantics that support a range of practical situations. We provide model checking algorithms, complexity characterisations and a prototype implementation. We validate our proposal against a case study from the avionic domain: we assess and verify the situational awareness of pilots flying an aircraft with several automated components in offnominal conditions.
Belle, Vaishak; Levesque, Hector J.
A central problem in applying logical knowledge representation formalisms to traditional robotics is that the treatment of belief change is categorical in the former, while probabilistic in the latter. A typical example is the fundamental capability of localization where a robot uses its noisy sensors to situate itself in a dynamic world. Domain designers are then left with the rather unfortunate task of abstracting probabilistic sensors in terms of categorical ones, or more drastically, completely abandoning the inner workings of sensors to blackbox probabilistic tools and then interpreting their outputs in an abstract way. Building on a firstprinciples approach by Bacchus, Halpern and Levesque, and a recent continuous extension to it by Belle and Levesque, we provide an axiomatization that shows how localization can be realized wrt a basic action theory, thereby demonstrating how such capabilities can be enabled in a single logical framework. We then show how the framework can also enable localization for multiple agents, where an agent can appeal to the sensing already performed by another agent and the knowledge of their relative positions to localize itself.
De Giacomo, Giuseppe; Lespérance, Yves; Patrizi, Fabio; Vassos, Stavros
Show all (4)
We investigate agents that have incomplete information and make decisions based on their beliefs expressed as situation calculus bounded action theories. Such theories have an infinite object domain, but the number of objects that belong to fluents at each time point is bounded by a given constant. Recently, it has been shown that verifying temporal properties over such theories is decidable. We take a firstperson view and use the theory to capture what the agent believes about the domain of interest and the actions affecting it. In this paper, we study verification of temporal properties over online executions. These are executions resulting from agents performing only actions that are feasible according to their beliefs. To do so, we first examine progression, which captures belief state update resulting from actions in the situation calculus. We show that, for bounded action theories, progression, and hence belief states, can always be represented as a bounded firstorder logic theory. Then, based on this result, we prove decidability of temporal verification over online executions for bounded action theories.
Harrenstein, Paul; Turrini, Paolo; Wooldridge, Michael
A fundamental problem in game theory is the possibility of reaching equilibrium outcomes with undesirable properties, e.g., inefficiency. The economics literature abounds with models that attempt to modify games in order to avoid such undesirable properties, for example through the use of subsidies and taxation, or by allowing players to undergo a bargaining phase before their decision. In this paper, we consider the effect of such transformations in Boolean games with costs, where players control propositional variables that they can set to true or false, and are primarily motivated to seek the satisfaction of some goal formula, while secondarily motivated to minimise the costs of their actions. We adopt (pure) preparation sets (prep sets) as our basic solution concept. A preparation set is a set of outcomes that contains for every player at least one best response to every outcome in the set. Prep sets are wellsuited to the analysis of Boolean games, because we can naturally represent prep sets as propositional formulas, which in turn allows us to refer to prep formulas. The preference structure of Boolean games with costs makes it possible to distinguish between hard and soft prep sets. The hard prep sets of a game are sets of valuations that would be prep sets in that game no matter what the cost function of the game was. The properties defined by hard prep sets typically relate to goalseeking behaviour, and as such these properties cannot be eliminated from games by, for example, taxation or subsidies. In contrast, soft prep sets can be eliminated by an appropriate system of incentives. Besides considering what can happen in a game by unrestricted manipulation of players’ cost function, we also investigate several mechanisms that allow groups of players to form coalitions and eliminate undesirable outcomes from the game, even when taxes or subsidies are not a possibility.
Stevens, Catherine J.; Pinchbeck, Bronwyn; Lewis, Trent; Luerssen, Martin; Pfitzner, Darius; Powers, David M. W.; Abrahamyan, Arman; Leung, Yvonne; Gibert, Guillaume
Background
Two experiments investigated the effect of features of human behaviour on the quality of interaction with an Embodied Conversational Agent (ECA).
Methods
In Experiment 1, visual prominence cues (head nod, eyebrow raise) of the ECA were manipulated to explore the hypothesis that likeability of an ECA increases as a function of interpersonal mimicry. In the context of an error detection task, the ECA either mimicked or did not mimic a head nod or brow raise that humans produced to give emphasis to a word when correcting the ECA’s vocabulary. In Experiment 2, presence versus absence of facial expressions on comprehension accuracy of two computerdriven ECA monologues was investigated.
Results
In Experiment 1, evidence for a positive relationship between ECA mimicry and lifelikeness was obtained. However, a mimicking agent did not elicit more human gestures. In Experiment 2, expressiveness was associated with greater comprehension and higher ratings of humour and engagement.
Conclusion
Influences from mimicry can be explained by visual and motor simulation, and bidirectional links between similarity and liking. Cue redundancy and minimizing cognitive load are potential explanations for expressiveness aiding comprehension.
Pécheux, Nicolas; Allauzen, Alexandre; Niehues, Jan; Yvon, François
In Statistical Machine Translation (SMT), the constraints on word reorderings have a great impact on the set of potential translations that is explored during search. Notwithstanding computational issues, the reordering space of a SMT system needs to be designed with great care: if a larger search space is likely to yield better translations, it may also lead to more decoding errors, because of the added ambiguity and the interaction with the pruning strategy. In this paper, we study the reordering search space, using a stateofthe art translation system, where all reorderings are represented in a permutation lattice prior to decoding. This allows us to directly explore and compare different reordering schemes and oracle settings. We also study in detail a rulebased preordering system, varying the length and number of rules, the tagset used, as well as contrasting with purely combinatorial subsets of permutations. We carry out experiments on three language pairs in both directions: EnglishFrench, a close language pair; EnglishGerman and EnglishCzech, two much more challenging pairs. We show that even though it might be desirable to design better reordering spaces, model and search errors seem to be the most important issues. Therefore, improvements of the reordering space should come along with improvements of the associated models to be really effective.
Zhang, Jingyi; Utiyama, Masao; Sumita, Eiichro; Zhao, Hai; Neubig, Graham; Nakamura, Satoshi
Statistical models for reordering source words have been used to enhance hierarchical phrasebased statistical machine translation. There are existing wordreordering models that learn reorderings for any two source words in a sentence or only for two contiguous words. This paper proposes a series of separate submodels to learn reorderings for word pairs with different distances. Our experiments demonstrate that reordering submodels for word pairs with distances less than a specific threshold are useful to improve translation quality. Compared with previous work, our method more effectively and efficiently exploits helpful wordreordering information; it improves a basic hierarchical phrasebased system by 2.43.1 BLEU points and keeps the average time of translating one sentence under 10 s.
Mariani, Joseph; Paroubek, Patrick; Francopoulo, Gil; Hamon, Olivier
This paper analyzes the content of the proceedings of the Language Resources and Evaluation Conference (LREC) over the past 17 years (1998–2014), with the goal of gaining a picture of the LREC community and the topics that are most relevant to the field. We follow the methodology used in similar studies, including the survey of the IEEE ICASSP conference proceedings from 1976 to 1990, the survey of the Association of Computational Linguistics conference proceedings over 50 years, and the survey of the proceedings of the conferences contained in the ISCA Archive over 25 years (1987–2012). We expand on results originally presented at LREC 2014, but include the proceedings of LREC 2014 itself in the study together with an analysis of various citation graphs. We show the evolution over time of the number of papers and authors, including their distribution by gender and affiliation, as well as collaborations and citation patterns among authors and papers, funding sources for reported research, and plagiarism and reuse in LREC papers; results for LREC are compared with similar results for major conferences in related fields. We also consider the evolution of research topics over time and identify the authors who introduced key terms. Finally, we propose and apply a measure of a researcher’s notability and provide the results for LREC authors. The study uses NLP methods that have been published in the corpus considered in the study. In addition to providing a revealing characterization of the LRE community, the study also demonstrates the need for establishing a system for unique identification of authors, papers and other sources to facilitate this type of analysis.
Xia, Fei; Lewis, William D.; Goodman, Michael Wayne; Slayden, Glenn; Georgi, Ryan; Crowgey, Joshua; Bender, Emily M.
The majority of the world’s languages have little to no NLP resources or tools. This is due to a lack of training data (“resources”) over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swath of the world’s languages. In many cases this involves bootstrapping the learning process with enriched or partially enriched resources. We propose that Interlinear Glossed Text (IGT), a very common form of annotated data used in the field of linguistics, has great potential for bootstrapping NLP tools for resourcepoor languages. Although IGT is generally very richly annotated, and can be enriched even further (e.g., through structural projection), much of the content is not easily consumable by machines since it remains “trapped” in linguistic scholarly documents and in human readable form. In this paper, we describe the expansion of the ODIN resource—a database containing many thousands of instances of IGT for over a thousand languages. We enrich the original IGT data by adding word alignment and syntactic structure. To make the data in ODIN more readily consumable by tool developers and NLP researchers, we adopt and extend a new XML format for IGT, called Xigt. We also develop two packages for manipulating IGT data: one, INTENT, enriches raw IGT automatically, and the other, XigtEdit, is a graphical IGT editor.
Lopez de Lacalle, Maddalen; Laparra, Egoitz; Aldabe, Itziar; Rigau, German
This paper presents a novel approach to improve the interoperability between four semantic resources that incorporate predicate information. Our proposal defines a set of automatic methods for mapping the semantic knowledge included in WordNet, VerbNet, PropBank and FrameNet. We use advanced graphbased word sense disambiguation algorithms and corpus alignment methods to automatically establish the appropriate mappings among their lexical entries and roles. We study different settings for each method using SemLink as a goldstandard for evaluation. The results show that the new approach provides productive and reliable mappings. In fact, the mappings obtained automatically outnumber the set of original mappings in SemLink. Finally, we also present a new version of the Predicate Matrix, a lexicalsemantic resource resulting from the integration of the mappings obtained by our automatic methods and SemLink.
Mori, Shinsuke; Neubig, Graham
In this paper, we investigate the relative effect of two strategies for language resource addition for Japanese morphological analysis, a joint task of word segmentation and partofspeech tagging. The first strategy is adding entries to the dictionary and the second is adding annotated sentences to the training corpus. The experimental results showed that addition of annotated sentences to the training corpus is better than the addition of entries to the dictionary. In particular, adding annotated sentences is especially efficient when we add new words with contexts of several real occurrences as partially annotated sentences, i.e. sentences in which only some words are annotated with word boundary information. According to this knowledge, we performed real annotation experiments on invention disclosure texts and observed word segmentation accuracy. Finally we investigated various language resource addition cases and introduced the notion of nonmaleficence, asymmetricity, and additivity of language resources for a task. In the WS case, we found that language resource addition is nonmaleficent (adding new resources causes no harm in other domains) and sometimes additive (adding new resources helps other domains). We conclude that it is reasonable for us, NLP tool providers, to distribute only one generaldomain model trained from all the language resources we have.
Rosén, Victoria; Thunes, Martha; Haugereid, Petter; Losnegaard, Gyri Smørdal; Dyvik, Helge; Meurer, Paul; Lyse, Gunn Inger; Smedt, Koenraad
Automatic syntactic analysis of a corpus requires detailed lexical and morphological information that cannot always be harvested from traditional dictionaries. Therefore the development of a treebank presents an opportunity to simultaneously enrich the lexicon. In building NorGramBank, we use an incremental parsebanking approach, in which a corpus is parsed and disambiguated, and after improvements to the grammar and the lexicon, reparsed. In this context we have implemented a text preprocessing interface where annotators can enter unknown words or missing lexical information either before parsing or during disambiguation. The information added to the lexicon in this way may be of great interest both to lexicographers and to other language technology efforts.
Malisz, Zofia; Włodarczak, Marcin; Buschmeier, Hendrik; Skubisz, Joanna; Kopp, Stefan; Wagner, Petra
The Active Listening Corpus (ALICO) is a multimodal data set of spontaneous dyadic conversations in German with diverse speech and gestural annotations of both dialogue partners. The annotations consist of short feedback expression transcriptions with corresponding communicative function interpretations as well as segmentations of interpausal units, words, rhythmic prominence intervals and voweltovowel intervals. Additionally, ALICO contains head gesture annotations of both interlocutors. The corpus contributes to research on spontaneous human–human interaction, on functional relations between modalities, and timing variability in dialogue. It also provides data that differentiates between distracted and attentive listeners. We describe the main characteristics of the corpus and briefly present the most important results obtained from analyses in recent years.
Nishio, Naoto; Sutcliffe, Richard F. E.
How should you select a person to carry out a translation? One approach is to request a sample translation and to evaluate it by hand. Quality Estimation addresses the problem of evaluation at least for Machine Translation output as a prediction task. This approach facilitates lowcost evaluation of MT outputs without expensive reference translations. However, the prediction of human translation in this way is difficult due to its subtlety of expression. We aimed to find out whether the qualifications, hobbies or personality traits of a person could predict their proficiency at translation. First, we gathered information about 82 participants; for each one we established the values of 146 attributes via a questionnaire. Second, we asked them to carry out some JapanesetoEnglish translations which we scored by hand. Third, we used the attributes as input and the translation scores as output to train the J48 decisiontree algorithm in order to predict the score of a translator from their attributes. This was then evaluated using tenfold cross validation. When limiting to professional translators in Experiment 6, the best Fscore was with Wrapper selection (0.775). The result was statistically significant (
$$p < 0.05$$
). This classifier also showed the second highest Precision on Good (0.813). The second best Fscore (0.737) has the highest Precision on Good (0.909), using Manual feature selection. Once again this was significant (
$$p < 0.05$$
). The results suggest that certain attributes affect the prediction; in addition to experience and qualifications in translating into the target language, interest in going to the Opera, playing Scrabble or Contract Bridge, or enjoyment of cryptic crossword puzzles are important factors as well.
Gupta, Rohit; Orăsan, Constantin; Zampieri, Marcos; Vela, Mihaela; Genabith, Josef; Mitkov, Ruslan
Most current translation memory (TM) systems work on the string level (character or word level) and lack semantic knowledge while matching. They use simple editdistance (ED) calculated on the surface form or some variation on it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing (PP) in the ED metric. The approach computes ED while efficiently considering paraphrases using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured postediting time, keystrokes, HTER, HMETEOR, and carried out three rounds of subjective evaluations. Our results show that PP substantially improves TM matching and retrieval, resulting in translation performance increases when translators use paraphraseenhanced TMs.
Rehm, Georg; Uszkoreit, Hans; Ananiadou, Sophia; Bel, Núria; Bielevičienė, Audronė; Borin, Lars; Branco, António; Budin, Gerhard; Calzolari, Nicoletta; Daelemans, Walter; Garabík, Radovan; Grobelnik, Marko; GarcíaMateo, Carmen; Genabith, Josef; Hajič, Jan; Hernáez, Inma; Judge, John; Koeva, Svetla; Krek, Simon; Krstev, Cvetana; Lindén, Krister; Magnini, Bernardo; Mariani, Joseph; McNaught, John; Melero, Maite; Monachini, Monica; Moreno, Asunción; Odijk, Jan; Ogrodniczuk, Maciej; Pęzik, Piotr; Piperidis, Stelios; Przepiórkowski, Adam; Rögnvaldsson, Eiríkur; Rosner, Mike; Pedersen, Bolette Sandford; Skadiņa, Inguna; Smedt, Koenraad; Tadić, Marko; Thompson, Paul; Tufiş, Dan; Váradi, Tamás; Vasiļjevs, Andrejs; Vider, Kadri; Zabarskaitė, Jolanta
This article provides an overview of the dissemination work carried out in METANET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.
Jansana, Ramon
We consider the equationally orderable quasivarieties and associate with them deductive systems defined using the order. The method of definition of these deductive systems encompasses the definition of logics preserving degrees of truth we find in the research areas of substructural logics and mathematical fuzzy logic. We prove several general results, for example that the deductive systems so defined are finitary and that the ones associated with equationally orderable varieties are congruential.
Cabrer, L. M.; Priestley, H. A.
This paper focuses on natural dualities for varieties of bilatticebased algebras. Such varieties have been widely studied as semantic models in situations where information is incomplete or inconsistent. The most popular tool for studying bilatticesbased algebras is product representation. The authors recently set up a widely applicable algebraic framework which enabled product representations over a base variety to be derived in a uniform and categorical manner. By combining this methodology with that of natural duality theory, we demonstrate how to build a natural duality for any bilatticebased variety which has a suitable product representation over a dualisable base variety. This procedure allows us systematically to present economical natural dualities for many bilatticebased varieties, for most of which no dual representation has previously been given. Among our results we highlight that for bilattices with a generalised conflation operation (not assumed to be an involution or commute with negation). Here both the associated product representation and the duality are new. Finally we outline analogous procedures for prebilatticebased algebras (so negation is absent).
Raftery, J. G.; Świrydowicz, K.
It is proved that the relevance logic
$${\mathbf{R}}$$
(without sentential constants) has no structurally complete consistent axiomatic extension, except for classical propositional logic. In fact, no other such extension is even passively structurally complete.
more …
Přenosil, Adam
We introduce a novel expansion of the fourvalued Belnap–Dunn logic by a unary operator representing reductio ad contradictionem and study its algebraic semantics. This expansion thus contains both the direct, noninferential negation of the Belnap–Dunn logic and an inferential negation akin to the negation of Johansson’s minimal logic. We formulate a sequent calculus for this logic and introduce the variety of reductio algebras as an algebraic semantics for this calculus. We then investigate some basic algebraic properties of this variety, in particular we show that it is locally finite and has EDPC. We identify the subdirectly irreducible algebras in this variety and describe the lattice of varieties of reductio algebras. In particular, we prove that this lattice contains an interval isomorphic to the lattice of classes of finite nonempty graphs with loops closed under surjective graph homomorphisms.
Cornejo, Juan M.; Sankappanavar, Hanamantagouda P.
The variety
$${\mathbf{I}}$$
of implication zroupoids (using a binary operation
$${\to}$$
and a constant 0) was defined and investigated by Sankappanavar (Scientia Mathematica Japonica 75(1):21–50, 2012), as a generalization of De Morgan algebras. Also, in Sankappanavar (Scientia Mathematica Japonica 75(1):21–50, 2012), several subvarieties of
$${\mathbf{I}}$$
were introduced, including the subvariety
$${\mathbf{I_{2,0}}}$$
, defined by the identity:
$${x^{\prime \prime}\approx x}$$
, which plays a crucial role in this paper. Some more new subvarieties of
$${\mathbf{I}}$$
are studied in Cornejo and Sankappanavar (Algebra Univ, 2015) that includes the subvariety
$${\mathbf{SL}}$$
of semilattices with a least element 0. An explicit description of semisimple subvarieties of
$${\mathbf{I}}$$
is given in Cornejo and Sankappanavar (Soft Computing, 2015). It is a well known fact that there is a partial order (denote it by
$${\sqsubseteq}$$
) induced by the operation ∧, both in the variety
$${\mathbf{SL}}$$
of semilattices with a least element and in the variety
$${\mathbf{DM}}$$
of De Morgan algebras. As both
$${\mathbf{SL}}$$
and
$${\mathbf{DM}}$$
are subvarieties of
$${\mathbf{I}}$$
and the definition of partial order can be expressed in terms of the implication and the constant, it is but natural to ask whether the relation
$${\sqsubseteq}$$
on
$${\mathbf{I}}$$
is actually a partial order in some (larger) subvariety of
$${\mathbf{I}}$$
that includes both
$${\mathbf{SL}}$$
and
$${\mathbf{DM}}$$
. The purpose of the present paper is twofold: Firstly, a complete answer is given to the above mentioned problem. Indeed, our first main theorem shows that the variety
$${\mathbf{I_{2,0}}}$$
is a maximal subvariety of
$${\mathbf{I}}$$
with respect to the property that the relation
$${\sqsubseteq}$$
is a partial order on its members. In view of this result, one is then naturally led to consider the problem of determining the number of nonisomorphic algebras in
$${\mathbf{I_{2,0}}}$$
that can be defined on an nelement chain (herein called
$${\mathbf{I_{2,0}}}$$
chains), n being a natural number. Secondly, we answer this problem in our second main theorem which says that, for each
$${n \in \mathbb{N}}$$
, there are exactly n nonisomorphic
$${\mathbf{I_{2,0}}}$$
chains of size n.
Hampson, C.; Kikot, S.; Kurucz, A.
In the propositional modal (and algebraic) treatment of twovariable firstorder logic equality is modelled by a ‘diagonal’ constant, interpreted in square products of universal frames as the identity (also known as the ‘diagonal’) relation. Here we study the decision problem of products of two arbitrary modal logics equipped with such a diagonal. As the presence or absence of equality in twovariable firstorder logic does not influence the complexity of its satisfiability problem, one might expect that adding a diagonal to product logics in general is similarly harmless. We show that this is far from being the case, and there can be quite a big jump in complexity, even from decidable to the highly undecidable. Our undecidable logics can also be viewed as new fragments of firstorder logic where adding equality changes a decidable fragment to undecidable. We prove our results by a novel application of counter machine problems. While our formalism apparently cannot force reliable counter machine computations directly, the presence of a unique diagonal in the models makes it possible to encode both lossy and insertionerror computations, for the same sequence of instructions. We show that, given such a pair of faulty computations, it is then possible to reconstruct a reliable run from them.
Kremer, Philip
The simplest combination of unimodal logics
$${\mathrm{L}_1 \rm and \mathrm{L}_2}$$
into a bimodal logic is their fusion,
$${\mathrm{L}_1 \otimes \mathrm{L}_2}$$
, axiomatized by the theorems of
$${\mathrm{L}_1 \rm for \square_1 \rm and of \mathrm{L}_2 \rm for \square_{2}}$$
. Shehtman introduced combinations that are not only bimodal, but twodimensional: he defined 2d Cartesian products of 1d Kripke frames, using these Cartesian products to define the frame product
$${\mathrm{L}_1 \times \mathrm{L}_2 \rm of \mathrm{L}_1 \rm and \mathrm{L}_2}$$
. Van Benthem, Bezhanishvili, ten Cate and Sarenac generalized Shehtman’s idea and introduced the topological product
$${\mathrm{L}_1 \times_{t}\mathrm{L}_2}$$
, using Cartesian products of topological spaces rather than of Kripke frames. Frame products have been extensively studied, but much less is known about topological products. The goal of the current paper is to give necessary and sufficient conditions for the topological product to match the frame product, for Kripke complete extensions of
$${\mathrm{S}4: \mathrm{L}_1 \times_t \mathrm{L}_2 = \mathrm{L}_1 \times \mathrm{L}_2 \rm iff \mathrm{L}_1 \supsetneq \mathrm{S}5 \rm or \mathrm{L}_2 \supsetneq \mathrm{S}5 \rm or \mathrm{L}_1, \mathrm{L}_2 = \mathrm{S}5}$$
.
Wintein, Stefan
By using the notions of exact truth (‘true and not false’) and exact falsity (‘false and not true’), one can give 16 distinct definitions of classical consequence. This paper studies the class of relations that results from these definitions in settings that are paracomplete, paraconsistent or both and that are governed by the (extended) Strong Kleene schema. Besides familiar logics such as Strong Kleene logic (K3), the Logic of Paradox (LP) and First Degree Entailment (FDE), the resulting class of all Strong Kleene generalizations of classical logic also contains a host of unfamiliar logics. We first study the members of our class semantically, after which we present a uniform sequent calculus (the SK calculus) that is sound and complete with respect to all of them. Two further sequent calculi (the
$${{\bf SK}^\mathcal{P}}$$
and
$${\bf SK}^{\mathcal{N}}$$
calculus) will be considered, which serve the same purpose and which are obtained by applying general methods (due to Baaz et al.) to construct sequent calculi for manyvalued logics. Rules and proofs in the SK calculus are much simpler and shorter than those of the
$${\bf SK}^{\mathcal{P}}$$
and the
$${\bf SK}^{\mathcal{N}}$$
calculus, which is one of the reasons to prefer the SK calculus over the latter two. Besides favourably comparing the SK calculus to both the
$${\bf SK}^{\mathcal{P}}$$
and the
$${\bf SK}^{\mathcal{N}}$$
calculus, we also hint at its philosophical significance.
RodríguezFuentes, Luis Javier; Penagarikano, Mikel; Varona, Amparo; Diez, Mireia; Bordel, Germán
Show all (5)
KALAKA3 is a speech database specifically designed for the development and evaluation of Spoken Language Recognition (SLR) systems. The database provides TV broadcast speech for training, and audio data extracted from YouTube videos for tuning and testing. The database was created to support the Albayzin 2012 Language Recognition Evaluation (LRE), which featured two language recognition tasks, both dealing with European languages. The first one involved six target languages (Basque, Catalan, English, Galician, Portuguese and Spanish) for which there was plenty of training data, whereas the second one involved four target languages (French, German, Greek and Italian) for which no training data was provided. This second task tried to simulate the use case of low resource languages. Two separate sets of YouTube audio files were provided to test the performance of language recognition systems on both tasks. To allow openset tests, these datasets included speech in 11 additional (OutOfSet) European languages. In this paper, we first discuss the design issues considered when creating the database and describe the data collection procedure. Then, we present the results attained in the Albayzin 2012 LRE, along with the performance of stateoftheart systems on the four evaluation tracks defined on the database. Both series of results demonstrate the usefulness of KALAKA3 as a challenging benchmark for the advancement of SLR technology. As far as we know, this is the first database specifically designed to benchmark SLR technology on YouTube audios.
Vieira, Lucas Nunes
There has been growing interest of late in the cognitive effort required by postediting of machine translation. Compared to number of editing operations, cognitive (or mental) effort is frequently considered a more decisive indicator of the overall effort expended by posteditors. Estimating cognitive effort is not straightforward, however. Previous studies often triangulate different measures to obtain a consensus, but little postediting research to date has attempted to show how measures of cognitive effort relate to each other in a multivariate analysis. This paper addresses this by presenting an exploratory comparison of cognitive measures based on eye tracking, pauses, editing time, and subjective ratings collected in a postediting task carried out by professional and nonprofessional participants. All measures correlated with each other, but a principal components analysis showed that the measures cluster together in different ways. In particular, measures that increase with task time alone behaved differently from the others, with higher mutual associations and higher reliability. Regarding differences between professional and nonprofessional participants, it was observed that subjective ratings were overall more strongly associated with objective measures in the case of professionals. Surprising findings from previous research based on pause ratio are discussed. The paper argues that a pause typology will benefit the study of pause lengths and cognitive effort in postediting.
GolińskaPilarek, Joanna
The paper concerns Grzegorczyk’s nonFregean logics that are intended to be a formal representation of the equimeaning relation defined on descriptions. We argue that the main Grzegorczyk logics discussed in the literature are too strong and we propose a new logical system,
$${\mathsf{MGL}}$$
, which satisfies Grzegorczyk’s fundamental requirements. We present a sound and complete semantics for
$${\mathsf{MGL}}$$
and we prove that it is decidable. Finally, we show that many nonclassical logics are extensions of
$${\mathsf{MGL}}$$
, which makes it a generic nonFregean logic.
Soncodi, Adrian
In this paper we analyze the propositional extensions of the minimal classical modal logic system E, which form a lattice denoted as CExtE. Our method of analysis uses algebraic calculations with canonical forms, which are a generalization of the normal forms applicable to normal modal logics. As an application, we identify a group of automorphisms of CExtE that is isomorphic to the symmetric group S_{4}.
Hyndman, Jennifer; Nation, J. B.; Nishida, Joy
The duality between congruence lattices of semilattices, and algebraic subsets of an algebraic lattice, is extended to include semilattices with operators. For a set G of operators on a semilattice S, we have
$${{\rm Con}(S,+,0,G) \cong^{d} {{\rm S}_{p}}(L,H)}$$
, where L is the ideal lattice of S, and H is a corresponding set of adjoint maps on L. This duality is used to find some representations of lattices as congruence lattices of semilattices with operators. It is also shown that these congruence lattices satisfy the Jónsson–Kiefer property.
Teheux, Bruno
We study two notions of definability for classes of relational structures based on modal extensions of Łukasiewicz finitelyvalued logics. The main results of the paper are the equivalent of the GoldblattThomason theorem for these notions of definability.
Sagastume, Marta S.; San Martín, Hernán J.
An equivalence between the category of MValgebras and the category
$${{\rm MV^{\bullet}}}$$
is given in Castiglioni et al. (Studia Logica 102(1):67–92, 2014). An integral residuated lattice with bottom is an MValgebra if and only if it satisfies the equations
$${a = \neg \neg a, (a \rightarrow b) \vee (b\rightarrow a) = 1}$$
and
$${a \odot (a\rightarrow b) = a \wedge b}$$
. An object of
$${{\rm MV^{\bullet}}}$$
is a residuated lattice which in particular satisfies some equations which correspond to the previous equations. In this paper we extend the equivalence to the category whose objects are pairs (A, I), where A is an MValgebra and I is an ideal of A.
Taylor, R. Gregory
An idea attributable to Russell serves to extend Zermelo’s theory of systems of infinitely long propositions to infinitary relations. Specifically, relations over a given domain
$${\mathfrak{D}}$$
of individuals will now be identified with propositions over an auxiliary domain
$${\mathfrak{D}^{\mathord{\ast}}}$$
subsuming
$${\mathfrak{D}}$$
. Three applications of the resulting theory of infinitary relations are presented. First, it is used to reconstruct Zermelo’s original theory of urelements and sets in a manner that achieves most, if not all, of his early aims. Second, the new account of infinitary relations makes possible a concise characterization of parametric definability with respect to a purely relational structure. Finally, based on his foundational philosophy of the primacy of the infinite, Zermelo rejected Gödel’s First Incompleteness Theorem; it is shown that the new theory of infinitary relations can be brought to bear, positively, in that connection as well.
Połacik, Tomasz
The aim of this paper is to describe from a semantic perspective the problem of conservativity of classical firstorder theories over their intuitionistic counterparts. In particular, we describe a class of formulae for which such conservativity results can be proven in case of any intuitionistic theory T which is complete with respect to a class of Tnormal Kripke models. We also prove conservativity results for intuitionistic theories which are closed under the Friedman translation and complete with respect to a class of conversely wellfounded Kripke models. The results can be applied to a wide class of intuitionistic theories and can be viewed as generalization of the results obtained by syntactic methods.
Dzikovska, Myroslava O.; Nielsen, Rodney D.; Leacock, Claudia
We present the results of the joint student response analysis (SRA) and 8th recognizing textual entailment challenge. The goal of this challenge was to bring together researchers from the educational natural language processing and computational semantics communities. The goal of the SRA task is to assess student responses to questions in the science domain, focusing on correctness and completeness of the response content. Nine teams took part in the challenge, submitting a total of 18 runs using methods and features adapted from previous research on automated short answer grading, recognizing textual entailment and semantic textual similarity. We provide an extended analysis of the results focusing on the impact of evaluation metrics, application scenarios and the methods and features used by the participants. We conclude that additional research is required to be able to leverage syntactic dependency features and external semantic resources for this task, possibly due to limited coverage of scientific domains in existing resources. However, each of three approaches to using features and models adjusted to application scenarios achieved better system performance, meriting further investigation by the research community.
Jurgens, David; Pilehvar, Mohammad Taher; Navigli, Roberto
Semantic similarity has typically been measured across items of approximately similar sizes. As a result, similarity measures have largely ignored the fact that different types of linguistic item can potentially have similar or even identical meanings, and therefore are designed to compare only one type of linguistic item. Furthermore, nearly all current similarity benchmarks within NLP contain pairs of approximately the same size, such as word or sentence pairs, preventing the evaluation of methods that are capable of comparing different sized items. To address this, we introduce a new semantic evaluation called crosslevel semantic similarity (CLSS), which measures the degree to which the meaning of a larger linguistic item, such as a paragraph, is captured by a smaller item, such as a sentence. Our pilot CLSS task was presented as part of SemEval2014, which attracted 19 teams who submitted 38 systems. CLSS data contains a rich mixture of pairs, spanning from paragraphs to word senses to fully evaluate similarity measures that are capable of comparing items of any type. Furthermore, data sources were drawn from diverse corpora beyond just newswire, including domainspecific texts and social media. We describe the annotation process and its challenges, including a comparison with crowdsourcing, and identify the factors that make the dataset a rigorous assessment of a method’s quality. Furthermore, we examine in detail the systems participating in the SemEval task to identify the common factors associated with high performance and which aspects proved difficult to all systems. Our findings demonstrate that CLSS poses a significant challenge for similarity methods and provides clear directions for future work on universal similarity methods that can compare any pair of items.
Nakov, Preslav; Rosenthal, Sara; Kiritchenko, Svetlana; Mohammad, Saif M.; Kozareva, Zornitsa; Ritter, Alan; Stoyanov, Veselin; Zhu, Xiaodan
We present the development and evaluation of a semantic analysis task that lies at the intersection of two very trendy lines of research in contemporary computational linguistics: (1) sentiment analysis, and (2) natural language processing of social media text. The task was part of SemEval, the International Workshop on Semantic Evaluation, a semantic evaluation forum previously known as SensEval. The task ran in 2013 and 2014, attracting the highest number of participating teams at SemEval in both years, and there is an ongoing edition in 2015. The task included the creation of a large contextual and messagelevel polarity corpus consisting of tweets, SMS messages, LiveJournal messages, and a special test set of sarcastic tweets. The evaluation attracted 44 teams in 2013 and 46 in 2014, who used a variety of approaches. The best teams were able to outperform several baselines by sizable margins with improvement across the 2 years the task has been run. We hope that the longlasting role of this task and the accompanying datasets will be to serve as a test bed for comparing different approaches, thus facilitating research.
Kashyap, Abhay; Han, Lushan; Yus, Roberto; Sleeman, Jennifer; Satyapanich, Taneeya; Gandhi, Sunil; Finin, Tim
Show all (7)
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines latent semantic analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM 2013 task on Semantic Textual Similarity, our best performing system ranked first among the 89 submitted runs. In the SemEval2014 task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval2014 task on CrossLevel Semantic Similarity, we ranked first in Sentence–Phrase, Phrase–Word, and Word–Sense subtasks and second in the Paragraph–Sentence subtask.
Bentivogli, Luisa; Bernardi, Raffaella; Marelli, Marco; Menini, Stefano; Baroni, Marco; Zamparelli, Roberto
Show all (6)
This paper is an extended description of SemEval2014 Task 1, the task on the evaluation of Compositional Distributional Semantics Models on full sentences. Systems participating in the task were presented with pairs of sentences and were evaluated on their ability to predict human judgments on (1) semantic relatedness and (2) entailment.
Training and testing data were subsets of the SICK (Sentences Involving Compositional Knowledge) data set. SICK was developed with the aim of providing a proper benchmark to evaluate compositional semantic systems, though task participation was open to systems based on any approach. Taking advantage of the SemEval experience, in this paper we analyze the SICK data set, in order to evaluate the extent to which it meets its design goal and to shed light on the linguistic phenomena that are still challenging for stateoftheart computational semantic systems.
Qualitative and quantitative error analyses show that many systems are quite sensitive to changes in the proportion of sentence pair types, and degrade in the presence of additional lexicosyntactic complexities which do not affect human judgements. More compositional systems seem to perform better when the task proportions are changed, but the effect needs further confirmation.
Maffezioli, Paolo
We present a sequent calculus for extensional mereology. It extends the classical firstorder sequent calculus with identity by rules of inference corresponding to wellknown mereological axioms. Structural rules, including cut, are admissible.
Grigolia, Revaz; Kiseliova, Tatiana; Odisharia, Vladimer
Gödel logic (alias Dummett logic) is the extension of intuitionistic logic by the linearity axiom. Symmetric Gödel logic is a logical system, the language of which is an enrichment of the language of Gödel logic with their dual logical connectives. Symmetric Gödel logic is the extension of symmetric intuitionistic logic (L. Esakia, C. Rauszer). The proofintuitionistic calculus, the language of which is an enrichment of the language of intuitionistic logic by modal operator was investigated by Kuznetsov and Muravitsky. Bimodal symmetric Gödel logic is a logical system, the language of which is an enrichment of the language of Gödel logic with their dual logical connectives and two modal operators. Bimodal symmetric Gödel logic is embedded into an extension of (bimodal) Gödel–Löb logic (provability logic), the language of which contains disjunction, conjunction, negation and two (conjugate) modal operators. The variety of bimodal symmetric Gödel algebras, that represent the algebraic counterparts of bimodal symmetric Gödel logic, is investigated. Description of free algebras and characterization of projective algebras in the variety of bimodal symmetric Gödel algebras is given. All finitely generated projective bimodal symmetric Gödel algebras are infinite, while finitely generated projective symmetric Gödel algebras are finite.
Ferguson, Thomas Macaulay
Despite a renewed interest in Richard Angell’s logic of analytic containment (
$${\mathsf{AC}}$$
), the first semantics for
$${\mathsf{AC}}$$
introduced by Fabrice Correia has remained largely unexamined. This paper describes a reasonable approach to Correia semantics by means of a correspondence with a ninevalued semantics for
$${\mathsf{AC}}$$
. The present inquiry employs this correspondence to provide characterizations of a number of propositional logics intermediate between
$${\mathsf{AC}}$$
and classical logic. In particular, we examine Correia’s purported characterization of classical logic with respect to his semantics, showing the condition Correia cites in fact characterizes the “logic of paradox”
$${\mathsf{LP}}$$
and provide a correct characterization. Finally, we consider some remarks on related matters, such as the applicability of the present correspondence to the analysis of the system
$${\mathsf{AC}^{\ast}}$$
and an intriguing relationship between Correia’s models and articular models for first degree entailment.
Verdée, Peter; Batens, Diderik
It is shown that a set of semirecursive logics, including many fragments of CL (Classical Logic), can be embedded within CL in an interesting way. A logic belongs to the set iff it has a certain type of semantics, called nice semantics. The set includes many logics presented in the literature. The embedding reveals structural properties of the embedded logic. The embedding turns finite premise sets into finite premise sets. The partial decision methods for CL that are goal directed with respect to CL are turned into partial decision methods that are goal directed with respect to the embedded logics.
Beklemishev, Lev; Flaminio, Tommaso
Franco Montagna, a prominent logician and one of the leaders of the Italian school on Mathematical Logic, passed away on February 18, 2015. We survey some of his results and ideas in the two disciplines he greatly contributed along his career: provability logic and manyvalued logic.
