Organized by
LogoSchweizerische Gesellschaft für Geschichte
Logo Universität Lausanne
Supported by
Logo Universität Lausanne
Logo Schweizerische Akademie der Geistes- und Sozialwissenschaften
Logo Schweizerischer Nationalfonds
Logo Universität Lausanne
Logo Société Académique Vaudoise
Logo Fondation pour l'Université de Lausanne
Logo Mnémo-Pôle
Logo Fondation Pierre du Bois
Sponsored by
Logo Universitäre Fernstudien
Logo Ville de Lausanne
Logo Canton de Vaud
Logo Schwabe AG

Temporal entity extraction from historical texts

Donnerstag, 9. Juni
13:45 bis 14:45 Uhr
Raum 2120

Freitag, 10. Juni
13:45 bis 14:45 Uhr
Raum 2120

In  my  thesis  project  I  would  like  to  consider  temporal  aspects  in  historical  texts,  i.e.  automated extraction and normalisation of the temporal information. Time is a crucial dimension in humanities, e.g. a description of a person, place or linguistic entity should contain  temporal  terms.  Historical  texts  may  be  rich  in  temporal  expressions.  Manual  extraction of this information is time-­‐consuming, and for this reason some facts might still be undiscovered, and thus unknown by the scientists. My research will contribute to the development of an effective tool for temporal entity extraction from historical texts, assisting historical text-­‐mining.

This  project  is  funded  by  the  Swiss  Law  Sources  Foundation.  As  material  for  my  research I will use historical Swiss law texts kindly provided by the Foundation.

This organization has been publishing critical editions of Swiss historical legal texts in all languages of Switzerland for over a hundred years, for a total of 118 volumes. The texts'  creation  time  ranges  from  the  10th  to  the  18th  century,  meaning  that  they  vary  greatly in the overall state of the written language.

There  are  no  temporal  entity  extraction  systems  for  historical  texts.  Existing  temporal  information extraction systems use either rule-­‐based techniques, or a combination of machine learning and rule-­‐based methods. Normally, a temporal information extraction system  performs  the  following  tasks:  detection  fo  a  temporal  expression,  its  classification (whether it refers to a date/time/etc.) and normalisation in an ISO format.

At  the  first  stage  of  my  research,  I  explored  rule-­‐based  methods.  I  first  conducted  experiments with the rule-­‐based tagger Heideltime. It was soon revealed that, due to a high  data  sparsity,  rule-­‐based  methods  require  a  lot  of  manual  work,  collecting  new  expressions and writing new rules. To overcome these problems, at the current stage of my project I am developing a system trained with machine learning techniques, capable to learn annotation patterns from the manually annotated Gold Standard Corpus.

At  the  end  of  my  project  I  will  assess  cross-­‐domain  interoperability  of  the  system,  evaluating the best scoring system on out-­‐of-­‐domain historical corpora. When the extraction  of  the  temporal  expressions  is  completed,  I  will  conclude  my  project  by  mapping this temporal information to the existing entries in the database of historical places and persons of Switzerland which I am developing based on the back-­‐of-­‐the-­‐book indexes of the published volumes of law texts.


Tagungsorganisation: Schweizerische Gesellschaft für Geschichte und Historische Institute der Universität Lausanne | Kontakt