Way, Andy
2 Citations
A very useful service to the examplebased machine translation (EBMT) community was provided by Harold Somers in his summary article which appeared in 1999, and was extended in our 2003 book Recent advances in examplebased machine translation. As well as providing a comprehensive review of the paradigm, Somers gives a categorisation of the different instantiations of the basic model. In this paper, we provide a complementary view to that of Somers. Today’s EBMT systems learn by analogy. Perhaps even more so than statistical models of translation, one might view these systems as being incapable of forgetting. We researchers and system developers, on the other hand, often forget or are ignorant of techniques and models presented in prior research. The primary aim of this paper is to try to ensure that golden nuggets from past (now quite distantly so) EBMT research papers are gathered together and presented here for a new generation of researchers keen to operate in the paradigm, especially given the spate of recent opensource releases of EBMT systems. We revisit the findings of the previous main research papers, relate them to some of the major research efforts which have taken place since then, and examine especially the prophecies given in the older pieces of work to see the extent to which they have been borne out in the newer research. Given the strong convergence between the leading corpusbased approaches to MT, especially since the introduction of phrasebased statistical MT, a further hope is that these findings may also prove useful to researchers and developers in other areas of MT.
Giménez, Jesús; Màrquez, Lluís
4 Citations
Assessing the quality of candidate translations involves diverse linguistic facets. However, most automatic evaluation methods in use today rely on limited quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences. In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic Machine Translation evaluation. We have compiled a rich set of specialized similarity measures operating at different linguistic dimensions and analyzed their individual and collective behaviour over a wide range of evaluation scenarios. Results show that measures based on syntactic and semantic information are able to provide more reliable system rankings than lexical measures, especially when the systems under evaluation are based on different paradigms. At the sentence level, while some linguistic measures perform better than most lexical measures, some others perform substantially worse, mainly due to parsing problems. Their scores are, however, suitable for combination, yielding a substantially improved evaluation quality.
Recasens, Marta; Martí, M. Antònia
17 Citations
This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k each) with coreference links between pronouns (including elliptical subjects and clitics), full noun phrases (including proper nouns), and discourse segments. The coding scheme distinguishes between identity links, predicative relations, and discourse deixis. Interannotator agreement on the link types is 85–89% above chance, and we provide an analysis of the sources of disagreement. The resulting corpora make it possible to train and test learningbased algorithms for automatic coreference resolution, as well as to carry out bottomup linguistic descriptions of coreference relations as they occur in real data.
Specia, Lucia; Stevenson, Mark; das Graças Volpe Nunes, Maria
Corpusbased techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from firstorder logic representations that allows corpusbased evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources.
Calhoun, Sasha; Carletta, Jean; Brenier, Jason M.; Mayo, Neil; Jurafsky, Dan; Steedman, Mark; Beaver, David
14 Citations
This paper describes a recently completed common resource for the study of spoken discourse, the NXTformat Switchboard Corpus. Switchboard is a longstanding corpus of telephone conversations (Godfrey et al. in SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP92, pp. 517–520, 1992). We have brought together transcriptions with existing annotations for syntax, disfluency, speech acts, animacy, information status, coreference, and prosody; along with substantial new annotations of focus/contrast, more prosody, syllables and phones. The combined corpus uses the format of the NITE XML Toolkit, which allows these annotations to be browsed and searched as a coherent set (Carletta et al. in Lang Resour Eval J 39(4):313–334, 2005). The resulting corpus is a rich resource for the investigation of the linguistic features of dialogue and how they interact. As well as describing the corpus itself, we discuss our approach to overcoming issues involved in such a data integration project, relevant to both users of the corpus and others in the language resource community undertaking similar projects.
Saygın, Ayşe Pınar
We provide a corpusbased computational approach to analyzing acquisition data on Turkish, a richly inflected language. We describe the process by which transcripts from the CHILDES database for 16 children aged 2;0 to 3;0 were morphologically tagged and parsed. We computed a number of imitation, overlap, and repetition measures on the transcripts using CLAN and CHIP programs. These measures tended to decrease as a function of mean length of utterance, which was broadly consistent with previously published work on Englishspeaking children. The data also revealed additional usage patterns, where the adult utterances provided children with rich morphosyntax in the input, while at the same time helping them to maintain discourse. Children on the other hand, tended to omit optional constituents and repeat morphemes from the previous utterance. The Turkish data and previously published English data showed crosslinguistic differences in repetition patterns that were congruent with the typological differences between the two languages. More generally, the data were consistent with a usagebased model for the acquisition of Turkish as a first language. The corpora and methods provided here can be extended to future applications.
Zuber, R.
5 Citations
Denotations of anaphoric determiners are treated as functions from sets and relations to sets. Constraints on such denotations are studied, and shown to generalize anaphor conditions known from the study of simpler cases of nominal anaphors. In addition a generalisation of the notion of conservativity as applied to anaphoric functions is presented. Two classes of anaphoric determiners are discussed as examples: complex anaphoric determiners in English (e.g. no…except herself) and simple anaphoric determiners in Slavic languages (e.g. Polish swój “his own”)
Fang, Jie; Wang, LeiBo
1 Citations
In this note we shall describe the lattice of the congruences on a balanced Ockham algebra with the pseudocomplementation whose quotient algebras are boolean. This is an extension of the result obtained by Rodrigues and Silva who gave a description of the lattice of congruences on an Ockham algebra whose quotient algebras are boolean.
Ma, Yanjun
This is a review of a recent book “Introduction to Chinese Natural Language Processing” by KamFai Wong, Wenjie Li, Ruifeng Xu and Zhengsheng Zhang. The structure and contents of the book will be briefly described and some comments and observations are presented.
Barrio, Eduardo Alejandro
7 Citations
The aim of this paper is to show that it’s not a good idea to have a theory of truth that is consistent but ωinconsistent. In order to bring out this point, it is useful to consider a particular case: Yablo’s Paradox. In theories of truth without standard models, the introduction of the truthpredicate to a first order theory does not maintain the standard ontology. Firstly, I exhibit some conceptual problems that follow from so introducing it. Secondly, I show that in second order theories with standard semantics the same procedure yields a theory that doesn’t have models. So, while having an ω inconsistent theory is a bad thing, having an unsatisfiable theory of truth is actually worse. This casts doubts on whether the predicate in question is, after all, a truthpredicate for that language. Finally, I present some alternatives to prove an inconsistency adding plausible principles to certain theories of truth.
Ciuni, Roberto; Zanardo, Alberto
4 Citations
In this paper we present BTC, which is a complete logic for branchingtime whose modal operator quantifies over histories and whose temporal operators involve a restricted quantification over histories in a given possible choice. This is a technical novelty, since the operators of the usual logics for branchingtime such as CTL express an unrestricted quantification over histories and moments. The value of the apparatus we introduce is connected to those logics of agency that are interpreted on branchingtime, as for instance Stit Logics.
Gratzl, Norbert
3 Citations
This article presents a sequent calculus for a negative free logic with identity, called N. The main theorem (in part 1) is the admissibility of the Cutrule. The second part of this essay is devoted to proofs of soundness, compactness and completeness of N relative to a standard semantics for negative free logic.
Schwartz, Yehuda; Tourlakis, George
We introduce a Gentzenstyle modal predicate logic and prove the cutelimination theorem for it. This sequent calculus of cutfree proofs is chosen as a proxy to develop the prooftheory of the logics introduced in [14, 15, 4]. We present syntactic proofs for all the metatheoretical results that were proved modeltheoretically in loc. cit. and moreover prove that the form of weak reflection proved in these papers is as strong as possible.
Pineda, Luis A.; Castellanos, Hayde; Cuétara, Javier; Galescu, Lucian; Juárez, Janet; Llisterri, Joaquim; Pérez, Patricia; Villaseñor, Luis
11 Citations
In this paper the transcription and evaluation of the corpus DIMEx100 for Mexican Spanish is presented. First we describe the corpus and explain the linguistic and computational motivation for its design and collection process; then, the phonetic antecedents and the alphabet adopted for the transcription task are presented; the corpus has been transcribed at three different granularity levels, which are also specified in detail. The corpus statistics for each transcription level are also presented. A set of phonetic rules describing phonetic context observed empirically in spontaneous conversation is also validated with the transcription. The corpus has been used for the construction of acoustic models and a phonetic dictionary for the construction of a speech recognition system. Initial performance results suggest that the data can be used to train good quality acoustic models.
Rojc, Matej; Höge, Harald; Kačič, Zdravko
The ECESS consortium (European Center of Excellence in Speech Synthesis) aims to speed up progress in speech synthesis technology, by providing an appropriate evaluation framework. The key element of the evaluation framework is based on the partition of a texttospeech synthesis system into distributed TTS modules. A text processing, prosody generation, and an acoustic synthesis module have been specified currently. A split into various modules has the advantage that the developers of an institution active in ECESS, can concentrate its efforts on a single module, and test its performance in a complete system using missing modules from the developers of other institutions. In this way, complete TTS systems can be built using high performance modules from different institutions. In order to evaluate the modules and to connect modules efficiently, a remote evaluation platform—the Remote Evaluation System (RES) based on the existing internet infrastructure—has been developed within ECESS. The RES is based on client–server architecture. It consists of RES module servers, which encapsulate the modules of the developers, a RES client, which sends data to and receives data from the RES module servers, and a RES server, which connects the RES module servers, and organizes the flow of information. RES can be used by developers for selecting RES module from the internet, which contains a missing TTS module needed to test and improve the performances of their own modules. Finally, the RES allows for the evaluation of TTS modules running at different institutions worldwide. When using the RES client, the institution performing the evaluation is able to setup and performs various evaluation tasks by sending test data via the RES client and receiving results from the RES module servers. Currently ELDA is settingup an evaluation using the RES client, which will then be extended to an evaluation client specializing in the envisaged evaluation tasks.
www.elda.org
is settingup an evaluation using the RES client, which will then be extended to an evaluation client specializing in the envisaged evaluation tasks.
Baker, Alan
In a 2005 paper, John Burgess and Gideon Rosen offer a new argument against nominalism in the philosophy of mathematics. The argument proceeds from the thesis that mathematics is part of science, and that core existence theorems in mathematics are both accepted by mathematicians and acceptable by mathematical standards. David Liggins (2007) criticizes the argument on the grounds that no adequate interpretation of “acceptable by mathematical standards” can be given which preserves the soundness of the overall argument. In this discussion I offer a defense of the BurgessRosen argument against Liggins’s objection. I show how plausible versions of the argument can be constructed based on either of two interpretations of mathematical acceptability, and I locate the argument in the space of contemporary antinominalist views.
Kasa, Ivan
2 Citations
Hartry Field’s formulation of an epistemological argument against platonism requires knowledge to be causally constrained. Contrary to recent claims (e.g. in [6], [7]), it thus fails the very same criterion usually taken to discredit Benacerraf’s earlier version.
Gabbay, Michael
1 Citations
In this paper I present a formalist philosophy mathematics and apply it directly to Arithmetic. I propose that formalists concentrate on presenting compositional truth theories for mathematical languages that ultimately depend on formal methods. I argue that this proposal occupies a lush middle ground between traditional formalism, fictionalism, logicism and realism.
Urbaniak, Rafal
4 Citations
The goal is to sketch a nominalist approach to mathematics which just like neologicism employs abstraction principles, but unlike neologicism is not committed to the idea that mathematical objects exist and does not insist that abstraction principles establish the reference of abstract terms. It is wellknown that neologicism runs into certain philosophical problems and faces the technical difficulty of finding appropriate acceptability criteria for abstraction principles. I will argue that a modal and iterative nominalist approach to abstraction principles circumvents those difficulties while still being able to put abstraction principles to a foundational use.
Antonutti Marfori, Marianna
9 Citations
The aim of this paper is to provide epistemic reasons for investigating the notions of informal rigour and informal provability. I argue that the standard view of mathematical proof and rigour yields an implausible account of mathematical knowledge, and falls short of explaining the success of mathematical practice. I conclude that careful consideration of mathematical practice urges us to pursue a theory of informal provability.
Boccuni, Francesca
5 Citations
PG (Plural Grundgesetze) is a predicative monadic secondorder system which exploits the notion of plural quantification and a few Fregean devices, among which a formulation of the infamous Basic Law V. It is shown that secondorder Peano arithmetic can be derived in PG. I also investigate the philosophical issue of predicativism connected to PG. In particular, as predicativism about concepts seems rather unFregean, I analyse whether there is a way to make predicativism compatible with Frege’s logicism.
Carrara, Massimiliano; Martino, Enrico
7 Citations
Aim of the paper is to revise Boolos’ reinterpretation of secondorder monadic logic in terms of plural quantification ([4], [5]) and expand it to full second order logic. Introducing the idealization of plural acts of choice, performed by a suitable team of agents, we will develop a notion of plural reference. Plural quantification will be then explained in terms of plural reference. As an application, we will sketch a structuralist reconstruction of secondorder arithmetic based on the axiom of infinite à la Dedekind, as the unique nonlogical axiom. We will also sketch a virtual interpretation of the classical continuum involving no other infinite than a countable plurality of individuals.
Goethe, Norma B.; Friend, Michèle
7 Citations
In this paper, we discuss the prevailing view amongst philosophers and many mathematicians concerning mathematical proof. Following Cellucci, we call the prevailing view the “axiomatic conception” of proof. The conception includes the ideas that: a proof is finite, it proceeds from axioms and it is the final word on the matter of the conclusion. This received view can be traced back to Frege, Hilbert and Gentzen, amongst others, and is prevalent in both mathematical text books and logic text books.
Along with Cellucci, Rav, GrattanGuinness and Grosholz, we deplore this view of mathematical proof, and favour instead the “analytic conception” of mathematical proof, where the axiomatic proof, when it exists at all, is only the core of a proof. An analytic proof solves a problem, by making hypotheses and using a mixture of deductive moves and induction (loosely construed to include diagrams, etc.) to present a solution to the problem. This implies that proofs are not always finite, that it might involve much more than axioms and straight logical inferences from these deductions and a proof can always be questioned. Moreover, this is where a lot of the interesting conceptual work of mathematics takes place. We view proofs as communicative acts made within the mathematical community which ensures correctness through application, context and standards of rigor.
Irvine, Andrew D.
1 Citations
In the Grundlagen, Frege offers eight main arguments, together with a series of more minor supporting arguments, against Mill’s view that numbers are “properties of external things”. This paper reviews all eight of these arguments, arguing that none are conclusive.
Pleitz, Martin
I propose an account of the metaphysics of the expressions of a mathematical language which brings together the structuralist construal of a mathematical object as a place in a structure, the semantic notion of indexicality and Kit Fine’s ontological theory of qua objects. By contrasting this indexical qua objects account with several other accounts of the metaphysics of mathematical expressions, I show that it does justice both to the abstractness that mathematical expressions have because they are mathematical objects and to the element of concreteness that they have because they are also used as signs. In a concluding section, I comment on the pragmatic element that has entered ontology by way of the notion of indexicality and use it to give an answer to a question Stewart Shapiro has recently posed about the status of metamathematics in the structuralist philosophy of mathematics.
Rizza, Davide
2 Citations
In this paper I introduce a novel strategy to deal with the indiscernibility problem for ante rem structuralism. The ante rem structuralist takes the ontology of mathematics to consist of abstract systems of pure relata. Many of such systems are totally symmetrical, in the sense that all of their elements are relationally indiscernible, so the ante rem structuralist seems committed to positing indiscernible yet distinct relata. If she decides to identify them, she falls into mathematical inconsistency while, accepting their distinctness, she finds herself unable to account for it. I show that the ante rem structuralist has in fact the resources to account for the distinctness of indiscernibles and that these resources come from the very symmetry properties of the mathematical objects that seem to pose problems for her.
Goldblatt, Robert
2 Citations
The variety MBA of monadic bounded algebras consists of Boolean algebras with a distinguished element E, thought of as an existence predicate, and an operator
$${\exists}$$
reflecting the properties of the existential quantifier in free logic. This variety is generated by a certain class FMBA of algebras isomorphic to ones whose elements are propositional functions.
We show that FMBA is characterised by the disjunction of the equations
$${\exists}E = 1$$
and
$${{\exists}E = 0}$$
. We also define a weaker notion of “relatively functional” algebra, and show that every member of MBA is isomorphic to a relatively functional one.
Isles, David
If the collection of models for the axioms
$${\mathfrak{A}}$$
of elementary number theory (Peano arithmetic) is enlarged to include not just the “natural numbers” or their nonstandard infinitistic extensions but also what are here called “primitive recursive notations”, questions arise about the reliability of firstorder derivations from
$${\mathfrak{A}}$$
. In this enlarged set of “models” some derivations usually accepted as “reliable” may be problematic. This paper criticizes two of these derivations which claim, respectively, to establish the totality of exponentiation and to prove Euclid’s theorem about the infinity of primes.
Akishev, Galym; Goldblatt, Robert
3 Citations
We introduce the equational notion of a monadic bounded algebra (MBA), intended to capture algebraic properties of bounded quantification. The variety of all MBA’s is shown to be generated by certain algebras of twovalued propositional functions that correspond to models of monadic free logic with an existence predicate. Every MBA is a subdirect product of such functional algebras, a fact that can be seen as an algebraic counterpart to semantic completeness for monadic free logic. The analysis involves the representation of MBA’s as powerset algebras of certain directed graphs with a set of “marked” points.
It is shown that there are only countably many varieties of MBA’s, all are generated by their finite members, and all have finite equational axiomatisations classifying them into fourteen kinds of variety. The universal theory of each variety is decidable.
Finitely generated MBA’s are shown to be finite, with the free algebra on r generators having exactly
$${{2^{{{3.2}^r}.2^{2^r  1}}}}$$
elements. An explicit procedure is given for constructing this freely generated algebra as the powerset algebra of a certain marked graph determined by the number r.
Odintsov, Sergei P.
4 Citations
The variety of
$${{\bf N4}^\perp}$$
lattices provides an algebraic semantics for the logic
$${{\bf N4}^\perp}$$
, a version of Nelson’s logic combining paraconsistent strong negation and explosive intuitionistic negation. In this paper we construct the Priestley duality for the category of
$${{\bf N4}^\perp}$$
lattices and their homomorphisms. The obtained duality naturally extends the Priestley duality for Nelson algebras constructed by R. Cignoli and A. Sendlewski.
Pereira, Luiz C.; Haeusler, Edward H.; Costa, Vaston G.; Sanz, Wagner
2 Citations
The introduction and elimination rules for material implication in natural deduction are not complete with respect to the implicational fragment of classical logic. A natural way to complete the system is through the addition of a new natural deduction rule corresponding to Peirce’s formula (((A → B) → A) → A). E. Zimmermann [6] has shown how to extend Prawitz’ normalization strategy to Peirce’s rule: applications of Peirce’s rule can be restricted to atomic conclusions. The aim of the present paper is to extend Seldin’s normalization strategy to Peirce’s rule by showing that every derivation Π in the implicational fragment can be transformed into a derivation Π′ such that no application of Peirce’s rule in Π′ occurs above applications of →introduction and →elimination. As a corollary of Seldin’s normalization strategy we obtain a form of Glivenko’s theorem for the classical {→}fragment.
Chan, Erwin; Lignos, Constantine
1 Citations
We develop an unsupervised algorithm for morphological acquisition to investigate the relationship between linguistic representation, data statistics, and learning algorithms. We model the phenomenon that children acquire the morphological inflections of a language monotonically by introducing an algorithm that uses a bootstrapped, frequencydriven learning procedure to acquire rules monotonically. The algorithm learns a morphological grammar in terms of a Base and Transforms representation, a simple rulebased model of morphology. When tested on corpora of childdirected speech in English from CHILDES (MacWhinney in The CHILDESProject: Tools for analyzing talk. Erlbaum, Hillsdale, 2000), the algorithm learns the most salient rules of English morphology and the order of acquisition is similar to that of children as observed by Brown (A first language: the early stages. Harvard University Press, Cambridge, 1973). Investigations of statistical distributions in corpora reveal that the algorithm is able to acquire morphological grammars due to its exploitation of Zipfian distributions in morphology through typefrequency statistics. These investigations suggest that the computation and frequencydriven selection of discrete morphological rules may be important factors in children’s acquisition of basic inflectional morphological systems.
Georgila, Kallirroi; Wolters, Maria; Moore, Johanna D.; Logie, Robert H.
4 Citations
We present the MATCH corpus, a unique data set of 447 dialogues in which 26 older and 24 younger adults interact with nine different spoken dialogue systems. The systems varied in the number of options presented and the confirmation strategy used. The corpus also contains information about the users’ cognitive abilities and detailed usability assessments of each dialogue system. The corpus, which was collected using a WizardofOz methodology, has been fully transcribed and annotated with dialogue acts and “Information State Update” (ISU) representations of dialogue context. Dialogue act and ISU annotations were performed semiautomatically. In addition to describing the corpus collection and annotation, we present a quantitative analysis of the interaction behaviour of older and younger users and discuss further applications of the corpus. We expect that the corpus will provide a key resource for modelling older people’s interaction with spoken dialogue systems.
Mitchener, William Garrett; Becker, Misha
4 Citations
We consider the task of learning three verb classes: raising (e.g., seem), control (e.g., try) and ambiguous verbs that can be used either way (e.g., begin). These verbs occur in sentences with similar surface forms, but have distinct syntactic and semantic properties. They present a conundrum because it would seem that their meaning must be known to infer their syntax, and that their syntax must be known to infer their meaning. Previous research with human speakers pointed to the usefulness of two cues found in sentences containing these verbs: animacy of the sentence subject and eventivity of the predicate embedded under the main verb. We apply a variety of algorithms to this classification problem to determine whether the primary linguistic data is sufficiently rich in this kind of information to enable children to resolve the conundrum, and whether this information can be extracted in a way that reflects distinctive features of child language acquisition. The input consists of counts of how often various verbs occur with animate subjects and eventive predicates in two corpora of naturalistic speech, one adultdirected and the other childdirected. Proportions of the semantic frames are insufficient. A Bayesian attachment model designed for a related language learning task does not work well at all. A hierarchical Bayesian model (HBM) gives significantly better results. We also develop and test a saturating accumulator that can successfully distinguish the three classes of verbs. Since the HBM and saturating accumulator are successful at the classification task using biologically realistic calculations, we conclude that there is sufficient information given subject animacy and predicate eventivity to bootstrap the process of learning the syntax and semantics of these verbs.
Pearl, Lisa; Goldwater, Sharon; Steyvers, Mark
9 Citations
In recent years, Bayesian models have become increasingly popular as a way of understanding human cognition. Ideal learner Bayesian models assume that cognition can be usefully understood as optimal behavior under uncertainty, a hypothesis that has been supported by a number of modeling studies across various domains (e.g., Griffiths and Tenenbaum, Cognitive Psychology, 51, 354–384, 2005; Xu and Tenenbaum, Psychological Review, 114, 245–272, 2007). The models in these studies aim to explain why humans behave as they do given the task and data they encounter, but typically avoid some questions addressed by more traditional psychological models, such as how the observed behavior is produced given constraints on memory and processing. Here, we use the task of word segmentation as a case study for investigating these questions within a Bayesian framework. We consider some limitations of the infant learner, and develop several online learning algorithms that take these limitations into account. Each algorithm can be viewed as a different method of approximating the same ideal learner. When tested on corpora of English childdirected speech, we find that the constrained learner’s behavior depends nontrivially on how the learner’s limitations are implemented. Interestingly, sometimes biases that are helpful to an ideal learner hinder a constrained learner, and in a few cases, constrained learners perform equivalently or better than the ideal learner. This suggests that the transition from a computationallevel solution for acquisition to an algorithmiclevel one is not straightforward.
Duran, Daniel; Schütze, Hinrich; Möbius, Bernd; Walsh, Michael
2 Citations
In this paper, we develop a new conceptual framework for an important problem in language acquisition, the correspondence problem: the fact that a given utterance has different manifestations in the speech and articulation of different speakers and that the correspondence of these manifestations is difficult to learn. We put forward the CorrespondencebySegmentation Hypothesis, which states that correspondence is primarily learned by first segmenting speech in an unsupervised manner and then mapping the acoustics of different speakers onto each other. We show that a rudimentary segmentation of speech can be learned in an unsupervised fashion. We then demonstrate that, using the previously learned segmentation, different instances of a word can be mapped onto each other with high accuracy when trained on utterancelabel pairs for a small set of words.
Boguraev, Branimir; Neff, Mary
2 Citations
Pattern matching, or querying, over annotations is a general purpose paradigm for inspecting, navigating, mining, and transforming annotation repositories—the common representation basis for modern pipelined text processing architectures. The openended nature of these architectures and expressiveness of feature structurebased annotation schemes account for the natural tendency of such annotation repositories to become very dense, as multiple levels of analysis get encoded as layered annotations. This particular characteristic presents challenges for the design of a pattern matching framework capable of interpreting ‘flat’ patterns over arbitrarily dense annotation lattices. We present an approach where a finite state device applies (compiled) pattern grammars over what is, in effect, a linearized ‘projection’ of a particular route through the lattice. The route is derived by a mix of static grammar analysis and runtime interpretation of navigational directives within an extended grammar formalism; it selects just the annotations sequence appropriate for the patterns at hand. For expressive and efficient pattern matching in dense annotations stores, our implemented approach achieves a mix of lattice traversal and finite state scanning by exposing a language which, to its user, provides constructs for specifying sequential, structural, and configurational constraints among annotations.
Brutti, Alessio; Cristoforetti, Luca; Kellermann, Walter; Marquardt, Lutz; Omologo, Maurizio
4 Citations
This paper describes a multichannel acoustic data collection recorded under the European DICIT project, during Wizard of Oz (WOZ) experiments carried out at FAU and FBKirst laboratories. The application of interest in DICIT is a distanttalking interface for control of interactive TV working in a typical living room, with many interfering devices. The objective of the experiments was to collect a database supporting efficient development and tuning of acoustic processing algorithms for signal enhancement. In DICIT, techniques for sound source localization, multichannel acoustic echo cancellation, blind source separation, speech activity detection, speaker identification and verification as well as beamforming are combined to achieve a maximum possible reduction of the user speech impairments typical of distanttalking interfaces. The collected database permitted to simulate at preliminary stage a realistic scenario and to tailor the involved algorithms to the observed user behaviors. In order to match the project requirements, the WOZ experiments were recorded in three languages: English, German and Italian. Besides the user inputs, the database also contains nonspeech related acoustic events, room impulse response measurements and video data, the latter used to compute threedimensional positions of each subject. Sessions were manually transcribed and segmented at word level, introducing also specific labels for acoustic events.
Kilgarriff, Adam
1 Citations
The verb google is intriguing for the study of morphology, loanwords, assimilation, language contrast and neologisms. We present data for it for nineteen languages from nine language families.
14 Citations
SpatialML is an annotation scheme for marking up references to places in natural language. It covers both named and nominal references to places, grounding them where possible with geocoordinates, and characterizes relationships among places in terms of a region calculus. A freely available annotation editor has been developed for SpatialML, along with several annotated corpora. Interannotator agreement on SpatialML extents is 91.3 Fmeasure on a corpus of SpatialMLannotated ACE documents released by the Linguistic Data Consortium. Disambiguation agreement on geocoordinates on ACE is 87.93 Fmeasure. An automatic tagger for SpatialML extents scores 86.9 F on ACE, while a disambiguator scores 93.0 F on it. Results are also presented for two other corpora. In adapting the extent tagger to new domains, merging the training data from the ACE corpus with annotated data in the new domain provides the best performance.
Kolany, Adam
In the following we show that general property S considered by Cowen [1], Cowen and Kolany in [3] and earlier by Cowen in [2] and Kolany in [4] as hypergraph satisfiability, can be constructively reduced to (3, 2) · SAT, that is to satisfiability of (at most) triples with twoelement forbidden sets. This is an analogue of the“classical” result on the reduction of SAT to 3 · SAT.
The estimated cost of the reduction is
$${\mathcal {O}(\kappa \times {\bf wd}({\mathcal E}) + \lambda \times {\bf wd}(\mathcal {F}))}$$
, where κ is the number of elements of
$${\mathcal {E}}$$
with more than thee elements, λ is the number of elements of
$${\mathcal {F}}$$
with more than two elements and
$${{\bf wd}(\mathcal {A})}$$
is the maximal cardinality of an element of
$${\mathcal {A}}$$
. It is consistent with the classical case of κ · SAT reducibility, since then λ = 0,
$${\kappa \in \mathcal {O}(\# \mathcal {E})}$$
, where
$${\# \mathcal {E}}$$
is the cardinality of
$${\mathcal {E}}$$
and
$${{\bf wd}(\mathcal {E}) \in \mathcal {O}(k)}$$
, and thus we obtain
$${\mathcal {O}(\kappa \times {\bf wd}(\mathcal {E}) + \lambda \times {\bf wd}(\mathcal{F})) = {\mathcal {O}}(\# \mathcal {E} \, · \, k)}$$
which is the case.
Vel, MLJ
A firstorder theory
$${{\mathcal T}}$$
has the Independence Property provided
$${{{\mathcal T} \, \, \vdash (Q)(\Phi \Rightarrow {\Phi_1} \vee.\,.\,.\vee {\Phi_n})}}$$
implies
$${{{\mathcal T} \, \, \vdash (Q)(\Phi \Rightarrow {\Phi_i})}}$$
for some i whenever
$${{\Phi,\Phi_1,\,.\,.\,.\,,\Phi_n}}$$
are formulae of a suitable type and (Q) is any quantifier sequence. Variants of this property have been noticed for some time in logic programming and in linear programming.
We show that a first order theory has the independence property for the class of basic formulae provided it can be axiomatised with Horn sentences. This condition, called crispness, is to some extent also necessary, but the properties are not equivalent.
The existence of socalled free models is a useful intermediate result. The independence property is also a tool to decide that a sentence cannot be deduced. We illustrate this with the case of the classical Carathéodory theorem for PaschPeano geometries.
Bimbó, Katalin
3 Citations
We briefly overview some of the historical landmarks on the path leading to the reduction of the number of logical connectives in classical logic. Relying on the duality inherent in Boolean algebras, we introduce a new operator (Nallor) that is the dual of Schönfinkel’s operator. We outline the proof that this operator by itself is sufficient to define all the connectives and operators of classical firstorder logic (Fol). Having scrutinized the proof, we pinpoint the theorems of Fol that are needed in the proof. Using the insights gained from the proof, we show that there are four binary operators that each can serve as the only undefined logical constant for Fol. Finally, we show that from every nary connective that yields a functionally complete singleton set of connectives two Schönfinkeltype operators are definable, and all the latter are so definable.
Ferenczi, Miklós
1 Citations
Internal sets and the Boolean algebras of the collection of the internal sets are of central importance in nonstandard analysis. Boolean algebras are the algebraization of propositional logic while the logic applied in nonstandard analysis (in nonstandard stochastics) is the first order or the higher order logic (type theory). We present here a first order logic algebraization for the collection of internal sets rather than the Boolean one. Further, we define an unusual probability on this algebraization.
Climent Vidal, J.; Soliveres Tur, J.
After defining, for each manysorted signature Σ = (S, Σ), the category Ter(Σ), of generalized terms for Σ (which is the dual of the Kleisli category for
$${\mathbb {T}_{\bf \Sigma}}$$
, the monad in Set^{S} determined by the adjunction
$${{\bf T}_{\bf \Sigma} \dashv {\rm G}_{\bf \Sigma}}$$
from Set^{S} to Alg(Σ), the category of Σalgebras), we assign, to a signature morphism d from Σ to Λ, the functor
$${{\bf d}_\diamond}$$
from Ter(Σ) to Ter(Λ). Once defined the mappings that assign, respectively, to a manysorted signature the corresponding category of generalized terms and to a signature morphism the functor between the associated categories of generalized terms, we state that both mappings are actually the components of a pseudofunctor Ter from Sig to the 2category Cat. Next we prove that there is a functor Tr^{Σ}, of realization of generalized terms as term operations, from Alg(Σ) × Ter(Σ) to Set, that simultaneously formalizes the procedure of realization of generalized terms and its naturalness (by taking into account the variation of the algebras through the homomorphisms between them). We remark that from this fact we will get the invariance of the relation of satisfaction under signature change. Moreover, we prove that, for each signature morphism d from Σ to Λ, there exists a natural isomorphism θ^{d} from the functor
$${{{\rm Tr}^{\bf {\bf \Lambda}} \circ ({\rm Id} \times {\bf d}_\diamond)}}$$
to the functor
$${{\rm Tr}^{\bf \Sigma} \circ ({\bf d}^* \times {\rm Id})}$$
, both from the category Alg(Λ) × Ter(Σ) to the category Set, where d* is the value at d of the arrow mapping of a contravariant functor Alg from Sig to Cat, that shows the invariant character of the procedure of realization of generalized terms under signature change. Finally, we construct the manysorted term institution by combining adequately the above components (and, in a derived way, the manysorted specification institution), but for a strict generalization of the standard notion of institution.
Palmigiano, A.; Re, R.
1 Citations
We associate a canonical unital involutive quantale to a topological groupoid. When the groupoid is also étale, this association is compatible with but independent from the theory of localic étale groupoids and their quantales [9] of P. Resende. As a motivating example, we describe the connection between the quantale and the C*algebra that both classify Penrose tilings, which was left as an open problem in [5].
Kowalski, T.; Paoli, F.; Giuntini, R.; Ledda, A.
2 Citations
In the present paper we continue the investigation of the lattice of subvarieties of the variety of
$${\sqrt{\prime}}$$
quasiMV algebras, already started in [6]. Beside some general results on the structure of such a lattice, the main contribution of this work is the solution of a longstanding open problem concerning these algebras: namely, we show that the variety generated by the standard disk algebra D_{r} is not finitely based, and we provide an infinite equational basis for the same variety.
Ashcroft, Michael
In this article I argue that there is a sense in which logic is empirical, and hence open to influence from science. One of the roles of logic is the modelling and extending of natural language reasoning. It does so by providing a formal system which succeeds in modelling the structure of a paradigmatic set of our natural language inferences and which then permits us to extend this structure to novel cases with relative ease. In choosing the best system of those that succeed in this, we seek certain virtues of such structures such as simplicity and naturalness (which will be explained).
Science can influence logic by bringing us, as in the case of quantum mechanics, to make natural language inferences about new kinds of systems and thereby extend the set of paradigmatic cases that our formal logic ought to model as simply and naturally as possible. This can alter which structures ought to be used to provide semantics for such models. I show why such a revolution could have led us to reject one logic for another through explaining why complex claims about quantum mechanical systems failed to lead us to reject classical logic for quantum logic.
Corbett, J. V.; Durt, T.
1 Citations
The double slit experiment for a massive scalar particle is described using intuitionistic logic with quantum real numbers as the numerical values of the particle’s position and momentum. The model assigns physical reality to single quantum particles. Its truth values are given open subsets of state space interpreted as the ontological conditions of a particle. Each condition determines quantum real number values for all the particle’s attributes. Questions, unanswerable in the standard theories, concerning the behaviour of single particles in the experiment are answered.
Durt, Thomas
According to the socalled Quantum Darwinist approach, the emergence of “classical islands” from a quantum background is assumed to obey a (selection) principle of maximal information. We illustrate this idea by considering the coupling of two oscillators (modes). As our approach suggests that the classical limit could have emerged throughout a long and progressive Evolution mechanism, it is likely that primitive living organisms behave in a “more quantum”, “less classical” way than more evolved ones. This brings us to seriously consider the possibility to measure departures from classicality exhibited by biological systems. We describe an experimental proposal the aimed at revealing the presence of entanglement in the biophotonic radiation emitted by biological sources.
French, Steven; Krause, Décio
11 Citations
Quasiset theory has been proposed as a means of handling collections of indiscernible objects. Although the most direct application of the theory is quantum physics, it can be seen per se as a nonclassical logic (a nonreflexive logic). In this paper we revise and correct some aspects of quasiset theory as presented in [12], so as to avoid some misunderstandings and possible misinterpretations about the results achieved by the theory. Some further ideas with regard to quantum field theory are also advanced in this paper.
Freytes, Hector
In this paper we investigate a categorical equivalence between square root qMV algebras (a variety of algebras arising from quantum computation) and a category of preordered semigroups.
Pykacz, Jarosław
6 Citations
In the paper it is shown that every physically sound Birkhoff – von Neumann quantum logic, i.e., an orthomodular partially ordered set with an ordering set of probability measures can be treated as partial infinitevalued Łukasiewicz logic, which unifies two competing approaches: the manyvalued, and the twovalued but nondistributive, which have coexisted in the quantum logic theory since its very beginning.
Székely, Gergely
4 Citations
The aim of this paper is to provide a logicbased conceptual analysis of the twin paradox (TwP) theorem within a firstorder logic framework. A geometrical characterization of TwP and its variants is given. It is shown that TwP is not logically equivalent to the assumption of the slowing down of moving clocks, and the lack of TwP is not logically equivalent to the Newtonian assumption of absolute time. The logical connection between TwP and a symmetry axiom of special relativity is also studied.
Beggs, Edwin J.; Costa, José Félix; Tucker, John V.
10 Citations
Earlier, we have studied computations possible by physical systems and by algorithms combined with physical systems. In particular, we have analysed the idea of using an experiment as an oracle to an abstract computational device, such as the Turing machine. The theory of composite machines of this kind can be used to understand (a) a Turing machine receiving extra computational power from a physical process, or (b) an experimenter modelled as a Turing machine performing a test of a known physical theory T.
Our earlier work was based upon experiments in Newtonian mechanics. Here we extend the scope of the theory of experimental oracles beyond Newtonian mechanics to electrical theory. First, we specify an experiment that measures resistance using a Wheatstone bridge and start to classify the computational power of this experimental oracle using nonuniform complexity classes. Secondly, we show that modelling an experimenter and experimental procedure algorithmically imposes a limit on our ability to measure resistance by the Wheatstone bridge.
The connection between the algorithm and physical test is mediated by a protocol controlling each query, especially the physical time taken by the experimenter. In our studies we find that physical experiments have an exponential time protocol; this we formulate as a general conjecture. Our theory proposes that measurability in Physics is subject to laws which are colateral effects of the limits of computability and computational complexity.
MruczekNasieniewska, Krystyna
In the present paper we give syntactical and semantical characterization of the class of algebras defined by Pcompatible identities of modular ortholattices. We also describe the lattice of some subvarieties of the variety MOL_{Ex} defined by so called externally compatible identities of modular ortholattices.
Tennant, Neil
1 Citations
We present a logically detailed casestudy of explanation and prediction in Newtonian mechanics. The case in question is that of a planet’s elliptical orbit in the Sun’s gravitational field. Care is taken to distinguish the respective contributions of the mathematics that is being applied, and of the empirical hypotheses that receive a mathematical formulation. This enables one to appreciate how in this case the overall logical structure of scientific explanation and prediction is exactly in accordance with the hypotheticodeductive model.
Trypuz, Robert
In this paper the class of minimal models C^{ZI} for Kiczuk’s system of physical change ZI is provided and soundness and completeness proofs of ZI with respect to these models are given. ZI logic consists of propositional logic von Wright’s And Then and six specific axioms characterizing the meaning of unary propositional operator “Zm”, read “there is a change in the fact that”. ZI is intended to be a logic which provides a formal account for describing two kinds of process change: the change from one state of the process to its other state (e.g., transmitting or absorbing energy with greater or less than the usual intensity) and the perishing of the process (e.g., cessation of the energetic activity of the sun).
Weingartner, Paul
3 Citations
The purpose of the paper is to show that by cleaning Classical Logic (CL) from redundancies (irrelevances) and uninformative complexities in the consequence class and from too strong assumptions (of CL) one can avoid most of the paradoxes coming up when CL is applied to empirical sciences including physics. This kind of cleaning of CL has been done successfully by distinguishing two types of theorems of CL by two criteria. One criterion (RC) forbids such theorems in which parts of the consequent (conclusion) can be replaced by arbitrary parts salva validitate of the theorem. The other (RD) reduces the consequences to simplest conjunctive consequence elements. Since the application of RC and RD to CL leads to a logic without the usual closure conditions, an approximation to RC and RD has been constructed by a basic logic with the help of finite (6valued) matrices. This basic logic called RMQ (relevance, matrix, Quantum Physics) is consistent and decidable. It distinguishes two types of validity (strict validity) and classical or material validity. All theorems of CL (here: classical propositional calculus CPC) are classically or materially valid in RMQ. But those theorems of CPC which obey RC and RD and avoid the difficulties in the application to empirical sciences and to Quantum Physics are separated as strictly valid in RMQ. In the application to empirical sciences in general the proposed logic avoids the well known paradoxes in the area of explanation, confirmation, versimilitude and Deontic Logic. Concerning the application to physics it avoids also the difficulties with distributivity, commensurability and with Bell’s inequalities.
Arun, Abhishek; Haddow, Barry; Koehn, Philipp; Lopez, Adam; Dyer, Chris; Blunsom, Phil
2 Citations
Recent advances in statistical machine translation have used approximate beam search for NPcomplete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum risk training and decoding.
Crego, Josep M.; Yvon, François
2 Citations
In this work, we present an extension of ngrambased translation models based on factored language models (FLMs). Translation units employed in the ngrambased approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrasebased or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the ngrambased approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different backoff techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.
Ni, Yizhao; Saunders, Craig; Szedmak, Sandor; Niranjan, Mahesan
1 Citations
We propose a structured learning approach, maxmargin structure (MMS), which is targeted at natural language processing (NLP) tasks. The architecture of our approach is shown to capture structural aspects of the problem domains, leading to demonstrable performance improvements on two NLP tasks: partofspeech tagging and statistical machine translation (SMT). We present a perceptronbased online learning algorithm to train the model and demonstrate desirable computational scaling behavior over traditional optimisation methods.
Wang, Zhuoran; ShaweTaylor, John
1 Citations
This paper presents a novel regression framework to model both the translational equivalence problem and the parameter estimation problem in statistical machine translation (SMT). The proposed method kernelizes the training process by formulating the translation problem as a linear mapping among source and target word chunks (word ngrams of various length), which yields a regression problem with vector outputs. A kernel ridge regression model and a oneclass classifier called maximum margin regression are explored for comparison, between which the former is proved to perform better in this task. The experimental results conceptually demonstrate its advantages of handling very highdimensional features implicitly and flexibly. However, it shares the common drawback of kernel methods, i.e. the lack of scalability. For realworld application, a more practical solution based on locally linear regression hyperplane approximation is proposed by using online relevant training examples subsetting. In addition, we also introduce a novel way to integrate language models into this particular machine translation framework, which utilizes the language model as a penalty item in the objective function of the regression model, since its ngram representation exactly matches the definition of our feature space.
Wu, Xianchao; Matsuzaki, Takuya; Tsujii, Jun’ichi
1 Citations
This paper introduces deep syntactic structures to syntaxbased Statistical Machine Translation (SMT). We use a Headdriven Phrase Structure Grammar (HPSG) parser to obtain the deep syntactic structures of a sentence, which include not only a finegrained syntactic property description but also a semantic representation. Considering the abundant information included in the deep syntactic structures, it is interesting to investigate whether or not they improve the traditional syntaxbased translation models based on PCFG parsers. In order to use deep syntactic structures for SMT, this paper focuses on extracting treetostring translation rules from aligned HPSG tree–string pairs. The major challenge is to properly localize the nonlocal relations among nodes in an HPSG tree. To localize the semantic dependencies among words and phrases, which can be inherently nonlocal, a minimum covering tree is defined by taking a predicate word and its lexical/phrasal arguments as the frontier nodes. Starting from this definition, a lineartime algorithm is proposed to extract translation rules through onetime traversal of the leaf nodes in an HPSG tree. Extensive experiments on a treetostring translation system testified the effectiveness of our proposal.
Hunter, Tim; Resnik, Philip
3 Citations
Phrasebased decoding is conceptually simple and straightforward to implement, at the cost of drastically oversimplified reordering models. Syntactically aware models make it possible to capture linguistically relevant relationships in order to improve word order, but they can be more complex to implement and optimise. In this paper, we explore a new middle ground between phrasebased and syntactically informed statistical MT, in the form of a model that supplements conventional, nonhierarchical phrasebased techniques with linguistically informed reordering based on syntactic dependency trees. The key idea is to exploit linguisticallyinformed hierchical structures only for those dependencies that cannot be captured within a single flat phrase. For very local dependencies we leverage the success of conventional phrasebased approaches, which provide a sequence of targetlanguage words appropriately ordered and readymade with any agreement morphology. Working with dependency trees rather than constituency trees allows us to take advantage of the flexibility of phrasebased systems to treat nonconstituent fragments as phrases. We do impose a requirement—that the fragment be a novel sort of “dependency constituent”—on what can be translated as a phrase, but this is much weaker than the requirement that phrases be traditional linguistic constituents, which has often proven too restrictive in MT systems.
Morin, Emmanuel; Daille, Béatrice
3 Citations
The automatic compilation of bilingual lists of terms from specialized comparable corpora using lexical alignment has been successful for singleword terms (SWTs), but remains disappointing for multiword terms (MWTs). The low frequency and the variability of the syntactic structures of MWTs in the source and the target languages are the main reported problems. This paper defines a general framework dedicated to the lexical alignment of MWTs from comparable corpora that includes a compositional translation process and the standard lexical context analysis. The compositional method which is based on the translation of lexical items being restrictive, we introduce an extended compositional method that bridges the gap between MWTs of different syntactic structures through morphological links. We experimented with the two compositional methods for the French–Japanese alignment task. The results show a significant improvement for the translation of MWTs and advocate further morphological analysis in lexical alignment.
Kim, Su Nam; Baldwin, Timothy
2 Citations
We propose a method for automatically identifying individual instances of English verbparticle constructions (VPCs) in raw text. Our method employs the RASP parser and analysis of the sentential context of each VPC candidate to differentiate VPCs from simple combinations of a verb and prepositional phrase. We show that our proposed method has an Fscore of 0.974 at VPC identification over the Brown Corpus and Wall Street Journal.
Caseli, Helena Medeiros; Ramisch, Carlos; Graças Volpe Nunes, Maria; Villavicencio, Aline
5 Citations
Due to idiosyncrasies in their syntax, semantics or frequency, Multiword Expressions (MWEs) have received special attention from the NLP community, as the methods and techniques developed for the treatment of simplex words are not necessarily suitable for them. This is certainly the case for the automatic acquisition of MWEs from corpora. A lot of effort has been directed to the task of automatically identifying them, with considerable success. In this paper, we propose an approach for the identification of MWEs in a multilingual context, as a byproduct of a word alignment process, that not only deals with the identification of possible MWE candidates, but also associates some multiword expressions with semantics. The results obtained indicate the feasibility and low costs in terms of tools and resources demanded by this approach, which could, for example, facilitate and speed up lexicographic work.
Pecina, Pavel
42 Citations
We present an extensive empirical evaluation of collocation extraction methods based on lexical association measures and their combination. The experiments are performed on three sets of collocation candidates extracted from the Prague Dependency Treebank with manual morphosyntactic annotation and from the Czech National Corpus with automatically assigned lemmas and partofspeech tags. The collocation candidates were manually labeled as collocational or noncollocational. The evaluation is based on measuring the quality of ranking the candidates according to their chance to form collocations. Performance of the methods is compared by precisionrecall curves and mean average precision scores. The work is focused on twoword (bigram) collocations only. We experiment with bigrams extracted from sentence dependency structure as well as from surface word order. Further, we study the effect of corpus size on the performance of the individual methods and their combination.
Bejček, Eduard; Straňák, Pavel
2 Citations
We describe annotation of multiword expressions (MWEs) in the Prague dependency treebank, using several automatic preannotation steps. We use subtrees of the tectogrammatical tree structures of the Prague dependency treebank to store representations of the MWEs in the dictionary and preannotate following occurrences automatically. We also show a way to measure reliability of this type of annotation.
Muischnek, Kadri; Kaalep, HeikiJaan
This article focuses on the variability of one of the subtypes of multiword expressions, namely those consisting of a verb and a particle or a verb and its complement(s). We build on evidence from Estonian, an agglutinative language with free word order, analysing the behaviour of verbal multiword expressions (opaque and transparent idioms, support verb constructions and particle verbs). Using this data we analyse such phenomena as the order of the components of a multiword expression, lexical substitution and morphosyntactic flexibility.
Strik, Helmer; Hulsbosch, Micha; Cucchiarini, Catia
3 Citations
The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of cooccurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring “islands of pronunciation reduction” that contain (potential) MWEs can be identified in a large speech corpus.
Doucet, Antoine; AhonenMyka, Helena
In this paper, we address the problem of the exploitation of text phrases in a multilingual context. We propose a technique to benefit from multiword units in adhoc document retrieval, whatever the language of the document collection. We present principles to optimize the performance improvement obtained through this approach. The work is validated through retrieval experiments conducted on Chinese, Japanese, Korean and English.
Steinsvold, Christopher
Interpreting the diamond of modal logic as the derivative, we present a topological canonical model for extensions of K4 and show completeness for various logics. We also show that if a logic is topologically canonical, then it is relationally canonical.
Grégoire, Nicole
6 Citations
This article describes the design and implementation of a Dutch Electronic Lexicon of Multiword Expressions (DuELME). DuELME describes the core properties of over 5,000 Dutch multiword expressions. This article gives an overview of the decisions made in order to come to a standard lexical representation and discusses the description fields this representation comprises. We discuss the approach taken, which is innovative since it is based on the Equivalence Class Method (ECM). It is shown that introducing parameters to the ECM optimizes the method. The selection of the lexical entries and their properties is corpusbased. We describe the extraction of candidate expressions from corpora and discuss the selection criteria of the lexical entries. Moreover, we present the results of an evaluation of the standard representation in Alpino, a Dutch dependency parser.
Gabbay, Dov M.; Schlechta, Karl
1 Citations
Booth and his coauthors have shown in [2], that many new approaches to theory revision (with fixed K) can be represented by two relations, < and
$${{\vartriangleleft}}$$
, where < is the usual ranked relation, and
$${{\vartriangleleft}}$$
is a subrelation of < . They have, however, left open a characterization of the infinite case, which we treat here.
Åqvist, Lennart
The paper presents an infinite hierarchy PRm [m = 1, 2, . . . ] of sound and complete axiomatic systems for modal logic with graded probabilistic modalities, which are to reflect what I have elsewhere called the BoldingEkelöf degrees of evidential strength as applied to the establishment of matters of fact in lawcourts. Our present approach is seen to differ from earlier work by the author in that it treats the logic of these graded modalities not only from a semantical or modeltheoretic viewpoint but from a prooftheoretical and axiomatic stance as well. A paramount feature of the approach is its use of socalled systematic frame constants as labels of diverse grades of probability. Apart from this novel feature our approach can be seen to go back to pioneering work by Lou Goble in 1970.
Atkinson, David; Peijnenburg, Jeanne
3 Citations
We have earlier shown by construction that a proposition can have a welldefined nonzero probability, even if it is justified by an infinite probabilistic regress. We thought this to be an adequate rebuttal of foundationalist claims that probabilistic regresses must lead either to an indeterminate, or to a determinate but zero probability. In a comment, Frederik Herzberg has argued that our counterexamples are of a special kind, being what he calls ‘solvable’. In the present reaction we investigate what Herzberg means by solvability. We discuss the advantages and disadvantages of making solvability a sine qua non, and we ventilate our misgivings about Herzberg’s suggestion that the notion of solvability might help the foundationalist.
We further show that the canonical series arising from an infinite chain of conditional probabilities always converges, and also that the sum is equal to the required unconditional probability if a certain infinite product of conditional probabilities vanishes.
Francez, Nissim; Dyckhoff, Roy; BenAvi, Gilad
22 Citations
The paper briefly surveys the sentential prooftheoretic semantics for fragment of English. Then, appealing to a version of Frege’s contextprinciple (specified to fit typelogical grammar), a method is presented for deriving prooftheoretic meanings for subsentential phrases, down to lexical units (words). The sentential meaning is decomposed according to the functionargument structure as determined by the typelogical grammar. In doing so, the paper presents a novel prooftheoretic interpretation of simple type, replacing Montague’s modeltheoretic type interpretation (in arbitrary Henkin models). The domains of derivations are collections of derivations in the associated “dedicated” naturaldeduction proofsystem, and functions therein (with no appeal to models, truthvalues and elements of a domain). The compositionality of the semantics is analyzed.
Herzberg, Frederik
7 Citations
In a recent paper, Jeanne Peijnenburg and David Atkinson [Studia Logica, 89(3):333341 (2008)] have challenged the foundationalist rejection of infinitism by giving an example of an infinite, yet explicitly solvable regress of probabilistic justification. So far, however, there has been no criterion for the consistency of infinite probabilistic regresses, and in particular, foundationalists might still question the consistency of the solvable regress proposed by Peijnenburg and Atkinson.
In this paper, we employ Robinsonian nonstandard analysis to prove that a probabilistic regress is already consistent if it is admissible in the sense that its forwarditeration solution does not lead to obvious contradictions; naturally, the converse also holds true. As a consequence, it turns out that there is a rich class of probabilistic regresses, which generically will fail to be solvable.
We therefore propose a weaker version of the Probabilistic Regress Problem which concedes the existence of solvable regresses, but denies their genericity.
Heylen, Jan
2 Citations
Carnap’s theory of descriptions was restricted in two ways. First, the descriptive conditions had to be nonmodal. Second, only primitive predicates or the identity predicate could be used to predicate something of the descriptum. The motivating reasons for these two restrictions that can be found in the literature will be critically discussed. Both restrictions can be relaxed, but Carnap’s theory can still be blamed for not dealing adequately with improper descriptions.
Bender, Emily M.; Drellishak, Scott; Fokkens, Antske; Poulson, Laurie; Saleem, Safiyyah
6 Citations
This paper presents the LinGO Grammar Matrix grammar customization system, a webbased service which elicits typological descriptions of languages and outputs customized grammar fragments which are ready for sustained development into broadcoverage grammars. We describe the infrastructure we have developed to support grammar customization as well as the current set of linguistic phenomena addressed, reflect on what we have learned about a methodology for this style of multilingual grammar engineering, and evaluate the typological breadth of the system by using it to create grammars for seven languages from seven different language families.
more …
