Nakaiwa, Hiromi
This paper proposes a method to extract rules for the anaphora resolution of Japanese zero pronouns in Japanese–English MT from aligned sentence pairs. After aligned sentence pairs unsuitable for rule extraction because of analysis errors or free translations are automatically rejected, zero pronouns in the Japanese sentences and the English translation equivalents of their antecedents are extracted from the remaining Japanese and English aligned sentence pairs using ten handdeveloped alignment rules. This method identifies all Japanese zero pronouns whose translation equivalents are not explicitly expressed in an English sentence, this method identifies these as unalignable. Then, resolution rules for the remaining zero pronouns are automatically extracted using the aligned pairs, equivalent word/phrase pairs extracted from the aligned sentence pairs, and the syntactic and semantic structures of the Japanese sentences. This method was implemented in a Japanese–English MT system, ALTJ/E. 98.4% of all pairs were automatically aligned correctly in a window test, and 94.0% in a blind test. Furthermore, extracted rules for zero pronouns with deictic references created automatically from sentence pairs correctly resolved 99.0% of the zero pronouns in a window test and 85.0% of the zero pronouns in a blind test.
Chan, Samuel W. K.; T'sou, Benjamin K.
Anaphora is a discourselevel linguistic phenomenon.There is consensus that anaphora resolution shouldrely on prior sentences within the context of thediscourse. We propose to cast anaphora resolution asa semantic inference process in which a combination ofmultiple strategies, each exploiting different aspectsof linguistic knowledge, is employed to provide acoherent resolution of anaphora. A framework whichencompasses several salient linguistic parameters suchas grammatical role, proximity, repetition, sentencerecency and semantic cues is demonstrated. This workalso shows how an anaphoraresolution algorithm can beembedded within a framework which captures all theabove salient parameters, as well as remedies some ofthe inadequacies found in any monolithic resolutionsystem. A languageneutral semantic representationcharacterized by semantic cues is presented in orderto capture the distilled information after resolution.The effectiveness of the languageneutralrepresentation, both for machine translation andanaphora resolution, is demonstrated through a set ofsimulations and evaluations.
Mitkov, Ruslan
6 Citations
This paper presents amultilingual robust, knowledgepoor approach to resolvingpronouns in technical manuals. This approach is a modification of the practicalapproach (Mitkov 1998a) and operates on texts preprocessed by apartofspeech tagger. Input is checked against agreementand a number of antecedent indicators. Candidates are assigned scores by eachindicator and the candidate with the highest aggregate score isreturned as the antecedent. We propose this approach as aplatform for multilingual pronoun resolution. The robust approach was initiallydeveloped and tested for English, but we have also adaptedand tested it for Polish and Arabic. For bothlanguages, we found that adaptation required minimummodification and that further, even if used unmodified, the approachdelivers acceptable success rates. Preliminary evaluation reports high successrates in the range of over 90%.
Ferrández, Antonio; Palomar, Manuel; Moreno, Lidia
9 Citations
This paper documents the development of an empiricallybasedsystem implemented in Prolog that automatically resolves severalkinds of anaphora in Spanish texts. These are pronominalreferences, surfacecount anaphora, oneanaphora and ellipticalzerosubject constructions (i.e., sentences that omit theirpronominal subject). The resolution is based onrepresentations resulting from either partial or full parsing. Thesystem developed can also work on the output of a POStagger or with different dictionaries, without changing thegrammar. This grammar represents the syntactic information of eachlanguage by means of the Slot Unification Grammar formalism. The different kinds of information used for anaphora resolution in full and partial parsing are shown, as wellas evaluation results. The system has been adapted toEnglish texts, obtaining encouraging results that prove that itcan be applied with only a very few refinements to other languagesas well as Spanish and English. In addition, the differencesbetween English and Spanish anaphora are noted.
Mori, Tatsunori; Matsuo, Mamoru; Nakawaga, Hiroshi
This paper proposes a methodfor anaphora resolution ofzero subjects in Japaneseinstruction manuals based onboth the linguistic nature ofexpressions and the generalontology of the text type.In instruction manuals writtenin Japanese, zero subject isone of main reasons forambiguity of sentences.In order to resolve them,we examined the property ofseveral types of expressionsincluding some forms of verbalphrases and some conjunctive clauses.As a result, we have aset of constraints and defaultsfor zero subject resolution.We verified the precision and recall rateof the constraints and defaults with realexamples, and have found that they makequite good estimates with 97% precisionand 80% recall.
Geldbach, Stefanie
1 Citations
Anaphora resolution in machine translation involves two aspects:(1) the identification of the antecedent, i.e., the determinationof coreference relations between anaphor and antecedent; and (2)the translation of the anaphor, i.e., the selection of theappropriate targetlanguage equivalent. The identification ofthe antecedent is essentially a monolingual, languagepairindependent problem which is usually solved during analysis. Theselection of the targetlanguage equivalent, on the other hand,can be regarded as a languagepair dependent task which has to betackled during transfer and generation. In this paper, theproblems of anaphora translation are discussed for the languagepair Russian–German. Although in most cases sourcelanguageanaphoric pronouns correspond to targetlanguage anaphoricpronouns, in some cases this straightforward equation does nothold. Two cases of such translation discrepancies are treatedhere: zero anaphora and pronominal PPs. The differences in thedistribution of zero anaphora and pronominal PPs in Russian andGerman are described, and solutions to these translation problems basedon the Russian–German MT system T1 are presented.
Mráz, František; Plátek, Martin; Procházka, Martin
3 Citations
We study transformations of automata from some (sub)classes of restarting automata (RRWWautomata) into two types of special forms. We stress particularly the transformations into the linguistically motivated weak cyclic form. Special forms of the second type express a certain degree of determinism of such automata.
Wartena, Christian
1 Citations
Abstract storage types can be concatenated in two ways. It is shown that the languages accepted by automata using concatenations of one–turn pushdowns can be characterized in terms of linear grammars with controlled derivations. These representations show that for one–turn pushdowns both ways of concatenating are equivalent. Furthermore, a relation to the hierarchy of Khabbaz is shown.
Wyatt, Roger B.
Paradigmatic change is driving aesthetic change in cinema. This restructuring is not only transforming what is on the screen, but the means of production that put it there. The interplay of technology and culture, along with their dynamics has made this so. Marshall McLuhan's Laws of Media provide an effective lens with which to examine this change. Some preliminary observations on the shape of digital cinema are offered. However to paraphrase Al Jolson, you ain't seen nothin' yet.
Augst, Bertrand; O'Connor, Brian C.
5 Citations
Representation of film texts for scholars and students has been fraught with difficulties imposed by the very nature of the text. The timevarying image track presented hurdles to close significant challenges to formulation of units of meaning and analysis. The digital environment offers opportunities for addressing these problems. We offer here a model of the film document as a bundle of timevarying signals. We demonstrate using this model to construct a system for close analysis of film texts, including precise measurement of attributes. Finally, we consider some consequence for the pedagogical environment.
Gramatovici, Radu
In this paper we introduce a new class of formal languages called shufflebased multilanguages which generalize both contextfree and Marcus contextual languages. We study some properties of shufflebased multilanguages related to the requirements of mildly contextsensitive formalisms.
Burningham, B.
A study commissioned by the Canadian Institute for Historical Microreproductions produced some interesting secondary findings about the attitudes of the Canadian research community towards digitized facsimile collections. In written responses to a questionnaire designed primarily to elicit advice about the subject content and focus of future projects, and in structured followup interviews, many respondents demonstrated a marked ambivalence towards the concept of digitized collections. Furthermore, if faced with a choice between fully searchable text and digitized facsimile images with traditional points of access (subject, author, title, etc.), there appears to be a preference for the latter means of access.
Staiger, Ludwig
1 Citations
We investigate the relationship between the classes of ωlanguages accepted by Turing machines according to two types of acceptance: 1) Machines of the first type are allowed to read only a finite part of the infinite input. 2) Machines of the second type have the additional possibility to reject by not reading the whole infinite input. It is shown that machines of the second kind are more powerful than those of the first kind.
Van Jacob, Scott J.
In 1994, the Andrew W. Mellon Foundation funded a joint project undertaken by the Center for Research Libraries (CRL) and the Latin American Microfilm Project (LAMP) to scan and index over threehundred thousand pages of microfilmed Brazilian Government Documents for the Internet. Due to the collection size, format, language and poor physical condition of the text, entering this overwhelmingly textual collection as fulltext was prohibitively expensive. Instead the documents were scanned as images, thereby maintaining the intellectual content of the collection, but losing the dynamic searching capabilities inherent in fulltext databases. A combination of indexing approaches was used to provide access to these documents. Indexing (tableofcontents, pagination and subject indexes) found in the documents were recreated to give users access to the documents. A controlled vocabulary was established to index a portion of the database. The factors of costs, user feedback and available technologies all influenced the choices of the five indexes ultimately utilized. This paper will describe and comment on the strengths and weaknesses of the various indexing approaches taken to access the images within this database.
Auffret, Gwendal; Prié, Yannick
1 Citations
The digitization of library documents and archives increasingly extends to audiovisual (AV) document repositories. As a consequence, new computeraided techniques are being devised, providing opportunities for new uses of AV documents. As scholars work mainly by reading, annotating, reusing, and producing documents they are directly concerned by these changes. The first part of this article describes AV document use in the humanities, as well as the current and future influence computers might have on evolving practices. After establishing that “fullindexing” (indexing of the content for random access to any segment of an AV document) is a necessary condition if scholars are to develop new practices in using AV material, we will focus on the specific problems raised by AV indexing as opposed to text indexing, followed by a discussion of related AV indexing projects as well as standardization issues. The third part will propose a representation model for the description of AV material (AIStrata) and an exchange format of AV annotations (AEDI), based on a free segmentation approach. An example of annotation is also provided. The last part is devoted to a discussion regarding potential longterm influences of digital AV indexing techniques on scholarly uses of AV documents.
Žemlička, Michal; Král, Jaroslav
1 Citations
When reading a text or listening to speech, words are processed online by humans in the order they come. Humans mainly use this kind of parsing even when they process deterministic text (programs). Intuitively there are some mental actions just after the morphological analysis of any newly recognized word. This mental action helps to understand the given word (or to position the word within the frame of the – still not complete – sentence). Within parsing of formal languages the concept closest to this idea is topdown parsing that is usually used only together with different classes of LL grammars. The advantage of topdown parsing of programming languages is the possibility to implement it by a recursive descent parser – i.e. by a system of procedures that may recursively call each other. Such a system may be `tuned' by handmade changes. The usage of LL grammars is not always possible, because the grammars of programming languages may have left recursive symbols. Programming language grammars are intuitively `close' to LL grammars. A good model for such grammars are the kind grammars studied in this paper. Kind grammars preserve all the important features of LL grammars that are advantageous for parsing.
Ceterchi, Rodica
The paper introduces the concept of cutandpaste polynomial function, which models operations on words inspired from contextual languages and DNA computing. Fixedpoint equations attached to such functions become generative devices for new classes of languages. The class of firstdegree 2CPlanguages is shown to be incomparable with the Chomsky hierarchy. We give some examples which motivate further study of higher degree CPlanguages.
Zakharov, Vladimir A.
2 Citations
We introduce a new class OrtSP of firstorder sequential programs. This class of programs is characterized by means of orthogonal substitutions θ = x_{1}/t_{1}, ..., x_{n}/t_{n} such that none of the terms t_{i} occurs in the other terms t_{j}, j ≠ i. We show that the equivalence problem for programs in OrtSP is decidable. We select also a subclass OrtSP_{out} of orthogonal programs and demonstrate that the equivalence problem for programs in OrtSP_{out} is decidable in polynomial time when the alphabet of relational symbols is finite and fixed.
Freund, Rudolf
2 Citations
Psystems recently were introduced by Gheorghe Pă un as a new model for computations based on membrane structures. Using the membranes as a kind of filter for specific objects when transferring them into an inner compartment turned out to be a very powerful mechanism in combination with suitable rules to be applied within the membranes in the model of generalized Psystems, GPsystems for short. In general, GPsystems allow for the simulation of graph controlled grammars of arbitrary type based on productions working on single objects. In this paper we consider GPsystems as computing devices using splicing or cutting and recombination of strings. Various variants of such systems are proved to have universal computational power, e.g., we show how test tube systems based on splicing or cutting and recombination of strings can be simulated by the corresponding GPsystems.
Lin, JoWang
4 Citations
This paper shows that the semantics of shenme ‘what’ in Chinese bare conditionals may exhibit a phenomenon of double quantification. I argue that such double quantification can be nicely accounted for if one adopts Carlson's (1977a, b) semantics of bare plurals and verb meanings as well as the following two assumptions: (i) shenme ‘what’ can be a proform of bare NPs and hence has the same kind of denotation as bare NPs, and (ii) Chinese bare NPs are names of kinds of things. This analysis of Chinese bare conditionals lends support to Carlson's approach to bare plurals despite Wilkinson's (1991) criticisms. I also show that an extension of Heim's (1987) analysis of what as ‘something of kind x’ to Chinese shenme ‘what’ encounters problems when shenme ‘what’ is a shared constituent of a predicate which applies to kinds and another predicate which applies to objects.
Sosík, Petr
1 Citations
As it was shown in related papers, a conditional tabled ecogrammar system represents itself a rich multiagent formal framework, allowing among other things to characterize some important classes of formal languages. Some new results about generative power of extended conditional tabled ecogrammar (ECTEG) systems are shown. In order to extend the previous research the main interest is devoted to so far less studied systems with scattered mode context conditions. It is proven that several hierarchies of these language classes collapse and that the ECTEG_{1}(1,2;s) class characterizes contextsensitive languages.
KlempienHinrichs, Renate
2 Citations
Contextfree hypergraph grammars allow to define sets of hypergraphs in a recursive way. In the literature, three main approaches can be found: hyperedge rewriting (HR), separated handle rewriting (SHH), and confluent node rewriting (ChNCE). With respect to their graphgenerating power, SHH grammars and socalled remotefree ChNCE grammars characterize confluent node rewriting in graphs, which in turn is more powerful than hyperedge rewriting. With respect to their hypergraphgenerating power, HR and SHH grammars have been shown to be incomparable.
In this paper, we show that the hypergraphgenerating power of (remotefree) ChNCE grammars includes properly that of HR and SHH grammars together. This indicates that confluent node rewriting plays as important a role in generating sets of hypergraphs as it does in generating sets of graphs.
Jörgensen, Corinne
9 Citations
Rapid expansion in the digitization of image and image collections has vastly increased the numbers of images available to scholars and researchers through electronic means. This research review will familiarize the reader with current research applicable to the development of image retrieval systems and provides additional material for exploring the topic further, both in print and online. The discussion will cover several broad areas, among them classification and indexing systems used for describing image collections and research initiatives into image access focusing on image attributes, users, queries, tasks, and cognitive aspects of searching. Prospects for the future of image access, including an outline of future research initiatives, are discussed. Further research in each of these areas will provide basic data which will inform and enrich image access system design and will hopefully provide a richer, more flexible, and satisfactory environment for searching for and discovering images. Harnessing the true power of the digital image environment will only be possible when image retrieval systems are coherently designed from principles derived from the fullest range of applicable disciplines, rather than from isolated or fragmented perspectives.
Ruitenburg, Wim
We characterize the firstorder formulas with one free variable that are preserved under bisimulation and persistence or strong persistence over the class of Kripke models with transitive frames and unary persistent predicates.
Miroiu, Adrian
1 Citations
Some logical properties of modal languages in which actuality is expressible are investigated. It is argued that, if a sentence like 'Actually, Quine is a distinguished philosopher' is understood as a special case of worldindexed sentences (the index being the actual world), then actuality can be expressed only under strong modal assumptions. Some rival rigid and indexical approaches to actuality are discussed.
Ghilardi, Silvio; Miglioli, Pierangelo
2 Citations
By using algebraiccategorical tools, we establish four criteria in order to disprove canonicity, strong completeness, wcanonicity and strong wcompleteness, respectively, of an intermediate propositional logic. We then apply the second criterion in order to get the following result: all the logics defined by extraintuitionistic onevariable schemata, except four of them, are not strongly complete. We also apply the fourth criterion in order to prove that the Gabbayde Jongh logic D_{1} is not strongly wcomplete.
Gentilini, Paolo
5 Citations
This paper is the final part of the syntactic demonstration of the Arithmetical Completeness of the modal system G; in the preceding parts [9] and [10] the tools for the proof were defined, in particular the notion of syntactic countermodel. Our strategy is: PAcompleteness of G as a search for interpretations which force the distance between G and a GLLINtheorem to zero. If the GLLINtheorem S is not a Gtheorem, we construct a formula H expressing the non Gprovability of S, so that ⊢_{GLLIN} ∼ H and so that a canonical proof T of ∼ H in GLLIN is a syntactic countermodel for S with respect to G, which has the height θ(T) equal to the distance d(S, G) of S from G. Then we define the interpretation ξ of S which represents the prooftree T in PA. By induction on θ(T), we prove that ⊢_{PA} S^{ξ} and d(S, G) > 0 imply the inconsistency of PA.
Fermé, Eduardo L.; Hansson, Sven Ove
20 Citations
We introduce a constructive model of selective belief revision in which it is possible to accept only a part of the input information. A selective revision operator ο is defined by the equality K ο α = K * f(α), where * is an AGM revision operator and f a function, typically with the property ⊢ α → f(α). Axiomatic characterizations are provided for three variants of selective revision.
Suzuki, NobuYuki
2 Citations
In socalled Kripketype models, each sentence is assigned either to true or to false at each possible world. In this setting, every possible world has the twovalued Boolean algebra as the set of truth values. Instead, we take a collection of algebras each of which is attached to a world as the set of truth values at the world, and obtain an extended semantics based on the traditional Kripketype semantics, which we call here the algebraic Kripke semantics.
We introduce algebraic Kripke sheaf semantics for superintuitionistic and modal predicate logics, and discuss some basic properties. We can state the GödelMcKinseyTarski translation theorem within this semantics. Further, we show new results on superintuitionistic predicate logics. We prove that there exists a continuum of superintuitionistic predicate logics each of which has both of the disjunction and existence properties and moreover the same propositional fragment as the intuitionistic logic.
Busquets, Joan
1 Citations
In this paper I argue that Catalan polarity particles (PPs) sí/no, també/tampoc, which one finds in some special realizations of ellipsis in Catalan, impose constraints on Discourse Structure. By virtue of their explicit or implicit markedness with respect to the [± neg] feature, I distinguish them as strong/weak PPs. I demonstrate that the polarity carried by these PPs provides a testbed for discourse coherence, supervising the processing of the preceding discourse in relation to the last state. It is claimed that such PPs inherit in discourse the locality conditions that are present in the sentencelevel in terms of comparison discourse relations like PARALLELISM or similarity (signalled by sí/també) and contrast (signalled by no/tampoc). I propose a formalism within a general discourse representation theory known as Segmented DRT (Asher, 1993).
Minker, Wolfgang
4 Citations
We describe a stochastic component for automatic spoken natural language understanding in an application for train travel information retrieval, the French ARISE (Automatic Railway Information Systems for Europe) task. The focus is on the design and the elaboration of processing strategies that are optimally adapted to the task model, the semantic representation and the available training data. A semiautomatic iterative approach allows to produce a corpus of semantic labels that are used for component training.
Mitrana, Victor
2 Citations
The aim of this paper is to survey some language generative devices based on patterns reported during the last years. We recall in a uniform and systematic way nearly all generative mechanisms based on patterns known in the literature. The problems we address are mainly typical problems in formal language theory, namely generative power and closure under operations. Other aspects such as decidability and descriptional or computational complexity are briefly discussed in the final section.
Kashima, Ryo; Kamide, Norihiro
2 Citations
We introduce several restricted versions of the structural rules in the implicational fragment of Gentzen's sequent calculus LJ. For example, we permit the applications of a structural rule only if its principal formula is an implication. We investigate cuteliminability and theoremequivalence among various combinations of them. The results include new cutelimination theorems for the implicational fragments of the following logics: relevant logic E, strict implication S4, and their neighbors (e.g., EW and S4W); BCIlogic, BCKlogic, relevant logic R, and the intuitionistic logic.
van Alten, C. J.; Raftery, J. G.
2 Citations
The aim of this paper is to show that the implicational fragment BKof the intuitionistic propositional calculus (IPC) without the rules of exchange and contraction has the finite model property with respect to the quasivariety of left residuation algebras (its equivalent algebraic semantics). It follows that the variety generated by all left residuation algebras is generated by the finite left residuation algebras. We also establish that BKhas the finite model property with respect to a class of structures that constitute a Kripkestyle relational semantics for it. The results settle a question of Ono and Komori [OK85].
Yashin, Alexander
4 Citations
Extending the language of the intuitionistic propositional logic Int with additional logical constants, we construct a wide family of extensions of Int with the following properties: (a) every member of this family is a maximal conservative extension of Int; (b) additional constants are independent in each of them.
Elgueta, R.; Jansana, R.
5 Citations
Given a structure for a firstorder language L, two objects of its domain can be indiscernible relative to the properties expressible in L, without using the equality symbol, and without actually being the same. It is this relation that interests us in this paper. It is called Leibniz equality. In the paper we study systematically the problem of its definibility mainly for classes of structures that are the models of some equalityfree universal Horn class in an infinitary language L_{κκ}, where κ is an infinite regular cardinal.
Gentilini, Paolo
6 Citations
This paper is the second part of the syntactic demonstration of the Arithmetical Completeness of the modal system G, the first part of which is presented in [9]. Given a sequent S so that ⊢_{GLLIN} S, ⊬_{G} S, and given its characteristic formula H = char(S), which expresses the non Gprovability of S, we construct a canonical prooftree T of ~ H in GLLIN, the height of which is the distance d(S, G) of S from G. T is the syntactic countermodel of S with respect to Gand is a tool of general interest in Provability Logic, that allows some classification in the set of the arithmetical interpretations.
Giannakidou, Anastasia
74 Citations
Limited distribution phenomena related to negation and negative polarity are usually thought of in terms of affectivity where affective is understood as negative or downward entailing. In this paper I propose an analysis of affective contexts as nonveridical and treat negative polarity as a manifestation of the more general phenomenon of sensitivity to (non)veridicality (which is, I argue, what affective dependencies boil down to). Empirical support for this analysis will be provided by a detailed examination of affective dependencies in Greek, but the distribution of any will also be shown to follow from (non)veridicality.
Kohlenbach, Ulrich
1 Citations
Goodman's theorem states that intuitionistic arithmetic in all finite types plus full choice, HA^{ω} + AC, is conservative over firstorder intuitionistic arithmetic HA. We show that this result does not extend to various subsystems of HA^{ω}, HA with restricted induction.
Degen, J. W.
For each regular cardinal κ, we set up three systems of infinitary type logic, in which the length of the types and the length of the typed syntactical constructs are < κ. For a fixed κ, these three versions are, in the order of increasing strength: the local system Σ_{(κ)}, the global system gΣ_{(κ)} (the difference concerns the conditions on eigenvariables) and the τsystem τΣ_{(κ)} (which has antiselection terms or Hilbertian τterms, and no conditions on eigenvariables). A full cut elimination theorem is proved for the local systems, and about the τsystems we prove that they admit cutfree proofs for sequents in the τfree language common to the local and global systems. These two results follow from semantic completeness proofs. Thus every sequent provable in a global system has a cutfree proof in the corresponding τsystems. It is, however, an open question whether the global systems in themselves admit cut elimination.
Gentilini, Paolo
5 Citations
This paper is the first of a series of three articles that present the syntactic proof of the PAcompleteness of the modal system G, by introducing suitable prooftheoretic objects, which also have an independent interest. We start from the syntactic PAcompleteness of modal system GLLIN, previously obtained in [7], [8], and so we assume to be working on modal sequents S which are GLLINtheorems. If S is not a Gtheorem we define here a notion of syntactic metric d(S, G): we calculate a canonical characteristic fomula H of S (char(S)) so that ⊢_{G} ∼ H → (∼S) and ⊢_{GLLIN} ∼ H, and the complexity σ of ∼ H gives the distance d(S, G) of S from G. Then, in order to produce the whole completeness proof as an induction on this d(S, G), we introduce the treeinterpretation of a modal sequent Q into PA, that sends the letters of Q into PAformulas describing the properties of a GLLINproof P of Q: It is also a d(*, G)metric linked interpretation, since it will be applied to a prooftree T of ∼ H with H = char(S) and σ(∼ H) = d(S, G).
Cantwell, John
10 Citations
The problems that surround iterated contractions and expansions of beliefs are approached by studying hypertheories, a generalisation of Adam Grove's notion of systems of spheres. By using a language with dynamic and doxastic operators different ideas about the basic nature of belief change are axiomatised. It is shown that by imposing quite natural constraints on how hypertheories may change, the basic logics for belief change can be strengthened considerably to bring one closer to a theory of iterated belief change. It is then argued that the logic of expansion, in particular, cannot without loss of generality be strengthened any further to allow for a full logic of iterated belief change. To remedy this situation a notion of directed expansion is introduced that allows for a full logic of iterated belief change. The new operation is given an axiomatisation that is complete for linear hypertheories.
Luchi, Duccio; Montagna, Franco
1 Citations
The logic of proofs was introduced by Artemov in order to analize the formalization of the concept of proof rather than the concept of provability. In this context, some operations on proofs play a very important role. In this paper, we investigate some very natural operations, paying attention not only to positive information, but also to negative information (i.e. information saying that something cannot be a proof). We give a formalization for a fragment of such a logic of proofs, and we prove that our fragment is complete and decidable.
Somers, Harold
50 Citations
In the last ten years there has been a significant amount ofresearch in Machine Translation within a ``new'' paradigm ofempirical approaches, often labelled collectively as``Examplebased'' approaches. The first manifestation of thisapproach caused some surprise and hostility among observers moreused to different ways of working, but the techniques were quicklyadopted and adapted by many researchers, often creating hybridsystems. This paper reviews the various research efforts withinthis paradigm reported to date, and attempts a categorisation ofdifferent manifestations of the general approach.
Santos, Diana
1 Citations
Is adaptation of English NLP applications the right way to gomultilingual? Should one prefer ``languageindependent'' systems with aview to applying them to a large number of different languages? Experience from the processing of Portuguese in several differentareas (partofspeech tagging, corpus tools, lexical decomposition,machine translation, etc.) suggests that neither of these offers a satisfactory solution.
This paper argues for a thorough study of the way individual languageswork in order to develop applications suited for the language inquestion, i.e., ``languagedependent'' systems.
Cohen, Ariel
22 Citations
Generics and frequency statements are puzzling phenomena: they are lawlike, yet contingent. They may be true even in the absence of any supporting instances, and extending the size of their domain does not change their truth conditions. Generics and frequency statements are parametric on time, but not on possible worlds; they cannot be applied to temporary generalizations, and yet are contingent. These constructions require a regular distribution of events along the time axis. Truth judgments of generics vary considerably across speakers, whereas truth judgments of frequency statements are much more uniform. A generic may be false even if the vast majority of individuals in its domain satisfy the predicated property, whereas a frequency statement using, e.g., usually would be true. This paper argues that all these seemingly unrelated puzzles have a single underlying cause: generics and frequency statements express probability judgments, and these, in turn, are interpreted as statements of hypothetical relative frequency.
Habert, B.; Fabre, C.
2 Citations
Elementary dependency relationships between words within parse trees produced by robust analyzers on a corpus help automate the discovery of semantic classes relevant for the underlying domain. We introduce two methods for extracting elementary syntactic dependencies from normalized parse trees. The groupings which are obtained help identify coarsegrain semantic categories and isolate lexical idiosyncrasies belonging to a specific sublanguage. A comparison shows a satisfactory overlapping with an existing nomenclature for medical language processing. This symbolic approach is efficient on medium size corpora which resist to statistical clustering methods but seems more appropriate for specialized texts.
Müller, Stefan
2 Citations
At the moment there is no theory for free relative clauses in German in the framework of Headdriven Phrase Structure Grammar (HPSG). From GB literature on the subject it is known that free relative clauses behave partly like noun phrases. They can fill argument positions of verbs. And although they are finite sentences, they are serialized like noun phrases in the German Mittelfeld. The function free relative clauses can take is not restricted to complements. Depending on the properties of the relative phrase, free relative clauses can be modifiers as well. I will argue that free relative clauses project to a category that is tightly related to the category of the relative phrase.
As Ingria90 has shown, assignment of different cases in the relative and the matrix clause poses problems for grammars that rely on unification alone. In the following paper I will show that his subsumption based account is incompatible with standard assumptions in the HPSG framework. The setbased approach of DK97, which is similar in many respects to Ingria's approach, will also be discussed. It will be shown that some of the problems of the subsumption based account are still present in the setbased approach. I will provide a different solution to the problem that relies on an additional case feature for the case form of NPs. It is projected from the relative phrase and is not affected by case requirements of the verb.
In general, there are three possibilities to describe the projections of free relative clauses: firstly, the direct projection of a phrase from the relative phrase and a finite sentence, secondly, an empty head or a unary projection that projects a relative clause and thirdly, a lexical rule that changes the subcategorization frames of governing heads in a way that they subcategorize for relative clauses. I will argue for the unary schema and discuss the alternatives.
Jedrzejowicz, Joanna
2 Citations
In this paper we prove a Kleene type theorem for shuffle automata. We show that for each shuffle automaton in standard form one can define a shuffle expression denoting the language of the automaton.
Ardeshir, Mohammad
1 Citations
Basic Predicate Logic, BQC, is a proper subsystem of Intuitionistic Predicate Logic, IQC. For every formula ϕ in the language {∨, ∧, →, ⊤, ⊥, ∀, ∃}, we associate two sequences of formulas 〈ϕ0,ϕ1,...〉 and 〈ϕ0,ϕ1,...〉 in the same language. We prove that for every sequent ϕ ⇒ ψ, there are natural numbers m, n, such that IQC ⊢ ϕ ⇒ ψ, iff BQC ⊢ ϕn ⇒ ψm. Some applications of this translation are mentioned.
Halbach, Volker
15 Citations
Some axiomatic theories of truth and related subsystems of secondorder arithmetic are surveyed and shown to be conservative over their respective base theory. In particular, it is shown by purely finitistically means that the theory PA ÷ "there is a satisfaction class" and the theory FS↾ of [2] are conservative over PA.
Bellissima, Fabio; Cittadini, Saverio
We define the concepts of minimal pmorphic image and basic pmorphism for transitive Kripke frames. These concepts are used to determine effectively the least number of variables necessary to axiomatize a tabular extension of K4, and to describe the covers and cocovers of such a logic in the lattice of the extensions of K4.
Silberztein, Max
3 Citations
INTEX is a linguistic development environment that includes largecoverage dictionaries and grammars, and parses texts of several million words in real time. INTEX has tools to create and maintain largecoverage lexical resources as well as morphological and syntactic grammars. Dictionaries and grammars are applied to texts in order to locate morphological, lexical and syntactic patterns, remove ambiguities, and tag simple and compound words. INTEX can build lemmatized concordances and indices of large texts with respect to all types of Finite State patterns. INTEX is used as a corpus processor, to analyze literary, journalistic and technical texts. I describe here the subset of tools used to perform advanced search requests on large texts.
Brians, Paul
1 Citations
The project of creating annotations to Salman Rushdie's novel, The Satanic Verses, involved drawing on many resources on the Internet, including Web pages, Email lists, and individual correspondents. Internet research naturally stimulates collaborative research, and the end product is more flexible, useful, and available to a wider audience than is conventional paper publishing.
Costagliola, Gennaro; Chang, ShiKuo
3 Citations
In this paper we present a grammar formalism for the generation and parsing of twodimensional symbolic languages. Linear Positional Grammars (or LPGs for short) are an immediate generalization of the contextfree string grammars. Through the use of general spatial relations they allow the definition of pictures whose symbols span on a twodimensional space. Due to their analogy to contextfree string grammars, LPGs can be used to construct an LRbased parser which uses the spatial relations to navigate the input. We study ambiguous grammars and present several ways to solve them. Moreover we provide an algorithm to translate a linear positional grammar into a contextfree grammar with actions and suggest a general methodology to parse twodimensional symbolic languages by making use of the wellknown tool YACC (Yet Another CompilerCompiler [25]). As an example, we construct a parser for a subset of the twodimensional arithmetical expression language.
Flum, Jörg; Schiehlen, Matthias; Väänänen, Jouko
We prove some results about the limitations of the expressive power of quantifiers on finite structures. We define the concept of a bounded quantifier and prove that every relativizing quantifier which is bounded is already firstorder definable (Theorem 3.8). We weaken the concept of congruence closed (see [6]) to weakly congruence closed by restricting to congruence relations where all classes have the same size. Adapting the concept of a thin quantifier (Caicedo [1]) to the framework of finite structures, we define the concept of a meager quantifier. We show that no proper extension of firstorder logic by means of meager quantifiers is weakly congruence closed (Theorem 4.9). We prove the failure of the full congruence closure property for logics which extend firstorder logic by means of meager quantifiers, arbitrary monadic quantifiers, and the Härtig quantifier (Theorem 6.1).
Blankertz, Benjamin; Weiermann, Andreas
In this article we show how to extract with the use of the BuchholzCichonWeiermann approach to subrecursive hierarchies from Rathjen's 1991 ordinal analysis of KPM a characterization of the provably total numbertheoretic functions of KPM and some of its (most prominent) subsystems in a uniform and direct way.
Babulanam, S.M.; Beena, K.F.
A work on useroriented Bengali orthography has been carried out while teaching Bengali as a Third language. Learning Bengali is difficult because of the presence of innumerable conjunct letters and the absence of a vowelsign for the first vowel in Bengali orthography. It is extra difficult for foreigners because the working memory in learning a foreign language is quite limited. It is easy to make a computerkeyboard with a thousand letters and signs, but it is difficult to use in practice. It is shown in this work that the Bengali conjunct letters not used as initial letters in wordmaking were redundant in its orthography and could be dissected to their components, if a missingletter sign for the unborn first Bengali vowelsign was raised in accordance with Bengali orthographic rule. Only 30 conjunct letters used in Bengali as initial letters in wordmaking which could be kept intact. Thus only 108 signs on a keyboard, including 10 digits and 20 punctuation and other signs were sufficient in case of a useroriented Bengali orthography.
Craig, Hugh
1 Citations
The paper presents the results of a series of Principal Components Analyses of the frequencies of very common words in the dialogue of characters in plays by Ben Jonson. The first Principal Component in the data, the most important axis of differentiation, proves in each case to be a spectrum from elaborate, authoritative pronouncements to a dialogue style of reaction and interchange. Reference to other quantitative studies, literary and otherwise, suggests that a version of this axis may often be among the most important in stylistic difference generally. In Jonson it has a chronological aspect  there is a shift over his career from one end to the other  and there is often significant change within the idiolects of his characters as well. Successive segments of Volpone and Mosca's parts (they are protagonist and antagonist of Volpone, perhaps Jonson's bestknown comedy) change markedly along this axis, beginning far apart but coming by the end of the play to resemble each other very closely on this measure.
Resnik, Philip; Olsen, Mari Broman; Diab, Mona
18 Citations
We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the nearsentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modernday English, confirming the relevance of this corpus for research on present day language.
Romary, Laurent; Bonhomme, Patrice; Bruneseaux, Florence; Pierrel, JeanMarie
This paper presents some aspects of the Silfide server, a system dedicated to the delivery of linguistic resources on the web. After presenting the main issues behind the design of such a system, we focus on the editorial choices related to the use of the Text Encoding Initiative to represent our textual documents. In particular, we focus on the accommodations we have had to carry with regards to the TEI header and address the tradeoff between extensive enrichment and genericity of the primary data when one wants to precisely markup a given document content. As a whole, we show how essential the TEI has proven to be for a project such as ours both from a practical and conceptual point of view.
DeRose, Steven
5 Citations
Electronic texts are claimed to exhibit features distinct from their more tangible cousins. The Snapshot project aims to observe and capture language usage in an electronic medium by creating an open corpus of World Wide Web documents. These documents are reencoded using the TEI guidelines to create a flexible, persistent and portable data repository. This report gives an overview of the decisions made with respect to the reencoding of HTML documents, and with the structuring the overall corpus.
Mylonas, Elli; Renear, Allen
7 Citations
Mylonas and Renear introduce a volume of selected papers from The Text Encoding Initiative 10th Anniversary Conference, held at Brown University in November 1997. The Text Encoding Initiative (TEI), was launched in 1987 and sponsored by the Association for Computers and the Humanities, the Association for Literary and Linguistic Computing, and the Association for Computational Linguistics. It had as its original objective the development of an interchange language for textual data. This effort was completely successful and the TEI Guidelines are now widely accepted as the standard interchange format for textual data. Mylonas and Renear also note that the TEI has accomplished two other major achievements: it has produced a powerful new data description language (which is influencing the development of new WWW standards); and, most importantly, it has motivated the development of an entirely new research community, focused on understanding the role of text structure and markup in the use of emerging information technologies in culture, scholarship, and communication.
Bauman, Syd; Catapano, Terry
1 Citations
The TEI Guidelines provide little detail on how to encode a text within the physical structures of the book in which it is contained. This paper describes the physical structures of an early printed book and presents two methods for encoding a text within that structure through use of the TEI elements <DIV> and <JOIN>.
Bosak, Jon
This volume would not be complete without Jon Bosak's closing keynote address. It served as the perfect end to the conference, giving us all a view of how the TEI stands in relation to similar industry thinking and then placing our work in the context of new developments. The closing keynote you will read has been abridged and edited for clarity and continuity, but no attempt has been made to create a formal journal article out of an oral presentation – we hope that in retaining Jon Bosak's lively informal prose we've conveyed something of the warm, collegial atmosphere of TEI 10.
Birnbaum, David J.; Cournane, Mavis; Flynn, Peter
The TEI's WSD mechanism allows text encoders to document the nature and use of language scripts for a given document or class of documents, but these facilities have not been widely implemented.
This paper describes two implementations which use different approaches, both for encoding and for rendering, and draws some conclusions about the need for improving the utility of WSDs for scholarly texts.
Burnard, Lou; Popham, Michael
The TEI Header plays a vital role in the documentation and interchange of TEI conformant electronic texts. Moreover, this role is becoming increasingly important as more people follow the recommendations set out in TEI P3, and libraries, archives, and electronic text centres seek to share their holdings of electronic texts. However, the fact that TEI P3 allows for flexibility in the structure and content of TEI Headers has meant that divergent practices have begun to emerge within the numerous projects and initiatives creating TEI texts. With this in mind, the Oxford Text Archive hosted a oneday colloquium of leading TEI exponents, at which invited participants were encouraged to share their views and expertise on creating TEI Headers, and work together to develop some recommendations towards good practice.
Morrison, A.
This paper is an overview of some recent developments within the Oxford Text Archive (OTA). Specifically it focuses on the use of various forms of metadata used within the OTA, including the manipulation of the TEI header, as a means of assisting in the discovery and delivery of resources from the OTA. The paper explores the use of metadata throughout the Arts and Humanities Data Service as a whole, and how this has facilitated the building of an integrated gateway to digital humanities resources. Finally there is a brief discussion on how the OTA currently provides access to its holdings via the WWW and a look at some possible future developments.
Simons, Gary F.
1 Citations
This paper develops a solution to the problem of importing existing TEI data into an existing objectoriented database schema without changing the TEI data or the database schema. The solution is based on architectural processing. Two metaDTDs are used, one to define the architectural forms for the object model and another to map the existing SGML data onto those forms. A full example using a critical text in TEI markup is developed.
Walker, D.
Electronic texts are claimed to exhibit features distinct from their more tangible cousins. The Snapshot project aims to observe and capture language usage in an electronic medium by creating an open corpus of World Wide Web documents. These documents are reencoded using the TEI guidelines to create a flexible, persistent and portable data repository. This report gives an overview of the decisions made with respect to the reencoding of HTML documents, and with the structuring the overall corpus.
Welty, Christopher; Ide, Nacy
5 Citations
We are experimenting with the representation of a DTD and associated documents (i.e., documents conformant to the DTD) in a knowledge representation (KR) system, in order to provide more sophisticated query and retrieval from TEI documents than current systems provide. We are using CLASSIC, a framebased representation system developed at AT&T Bell Laboratories. Like many KR systems, CLASSIC enables the definition of structured concepts/frames, their organization into taxonomies, the creation and manipulation of individual instances of such concepts, and inference such as inheritance, relation transitivity, inverses, etc. In addition, CLASSIC provides for the key inferences of subsumption and classification. By representing a document as an individual instance of a hierarchy of concepts derived from the DTD, and by allowing the creation of additional userdefined concepts and relations, sophisticated query and retrieval operations can be performed. This paper describes CLASSIC and the formalism of description logic that underlies it, and demonstrates how it can be used for enhanced retrieval from richly encoded documents.
