Ustav formalni a aplikovane lingvistiky Vas srdecne zve na SEMINAR FORMALNI LINGVISTIKY vedeny prof. Evou Hajicovou
Institute of Formal and Applied Linguistics cordially invites you to THE SEMINAR OF FORMAL LINGUISTICS led by Prof. Eva Hajicova
Seminar se kona v pondeli od 13:30 v budove MFF UK, Malostranske nam. 25, 4. patro, mistnost S1 (c. 428).
The Seminar takes place on Mondays from 1,30 p.m. at the Faculty of Mathematics and Physics, Malostranske nam. 25, 4th floor, room S1 (428).
----
21. 2. 2011
Ondrej Bojar (UFAL MFF UK)
GENERATING CZECH WORD FORMS IN MT: FROM SYSTEM COMBINATION TO BLACK ART
Abstract:
When translating from English to Czech, target-side vocabulary is the critical part. Transfer-based systems like TectoMT are able to generate completely novel forms but do not reach the phrase-based benchmark (yet). Training phrase-based systems to generate new forms is not that straightforward, but I will describe two promising techniques: two-step translation and a new idea, the black art of “reverse self-training". The best results can be expected from system combination techniques. I will describe the experiments with Aachen MT output combination as I applied and adapted it for English to Czech translation. I was combining only UFAL systems (TectoMT and three different configurations of Moses), so I am searching for someone interested in reimplementing one missing bit and allowing the department to win this year's WMT.
----
28. 2. 2011
Nadezda Kudrnacova (Filozoficka fakulta, Masarykova univerzita, Brno) THE
UNERGATIVE VS. THE UNACCUSATIVE STATUS OF ENGLISH VERBS OF LOCOMOTION
Abstract:
According to the hypothesis first proposed by Perlmutter (1978), intransitive verbs (including verbs of locomotion) fall into two categories: unergative verbs and unaccusative verbs. The “Unaccusative Hypothesis” has since become the subject of intensive research. It has been claimed that each verbal category is associated with a specific set of syntactic and semantic properties. The subjects of unergative verbs, which include manner of locomotion verbs, are subjects both in deep and in surface structures (the subjects of unergative locomotion verbs are agents). The subjects of unaccusative verbs, which include “path verbs”, originate as deep structure objects (the subjects of unaccusatives are themes or patients). The unergative/unaccusative categorization is dependent on a number of phenomena. For example, unergative verbs of locomotion are re-categorized into unaccusatives when used in a directed motion sense (this categorical shift then underlies the possibility of the verbs’ causativization). In this talk I point out problems that this type of analysis poses, especially with respect to what is taken to be overt signals of the locomotion verbs’ unaccusative/unergative status.
----
7. 3. 2011
Magda Sevcikova, Jarmila Panevova (UFAL MFF UK)
ANOTACE SOUBOROVEHO VYZNAMU SUBSTANTIV V DATECH PRAZSKEHO ZAVISLOSTNIHO
KORPUSU
Abstrakt:
Morfologicka kategorie cisla je v cestine konstituovana protikladem singularu (s vyznamem jedne jednotliviny) a pluralu (s vyznamem mnozstvi jednotlivin). Substantiva ruce, boty, vlasy, sirky apod. ovsem svymi pluralovymi formami prototypicky odkazuji nikoli k pouhemu mnozstvi jednotlivin, ale k jejich paru nebo obvyklemu souboru, popr. k nekolika parum nebo obvyklym souborum. Parovy nebo sire souborovy vyznam proto navrhujeme chapat jako dalsi vyznam pluralove formy ceskych substantiv. V prednasce popiseme prubeh a vysledky anotace zamerene prave na identifikaci souboroveho vyznamu u substantiv obsazenych v datech Prazskeho zavislostniho korpusu. Uvedeme priklady kontextu, v nichz se paralelni anotace shodovala, i kontexty problematicke a navrhneme postup, jak souborovy vyznam vclenit do stavajici tektogramaticke anotace.
----
14. 3. 2011
Vladimir Petkevic (and Milena Hnatkova, Petr Jäger, Tomas Jelinek,
Alexandr Rosen, Hana Skoumalova) (UTKL FF UK)
CORPUS ARBORE À LA CARTE
Abstract:
While all treebanks are very rich and useful resources, they tend to reflect a specific approach to linguistic analysis and may distract users without a background in theoretical linguistics or those subscribing to a different theory. Yet despite appearances a comparison of different theory-specific representations reveals a substantial overlap of content. Rather than attempting to accommodate such different views in a single design avoiding a theoretical bias, our aim is to build a syntactically annotated corpus of Czech allowing for various modes of interpretation according to the preferences of the user, including the standard pattern presented in Czech primary and secondary schools. The three-level corpus (consisting of the orthographic, morphological and syntactic levels) will support various external representations of the same internal data being queried and retrieved – dependency or constituent structure, surface or deep syntax, parallel access to data from all the three levels, visualization of various phenomena such as agreement, analytical predicates, collocations, etc. The user will be equipped with a query language and tools to customize the interface in a plethora of ways with underspecification as one of the main options. Taking the analytical level of the Prague Dependency Treebank as the point of departure, the author(s) will focus on the key distinctions between their approach and the concept of PDT.