Archive for April, 2016|Monthly archive page

Where have I been?

Sunday, April 17th, 2016

It has been quite a while since the last post on this blog – and there is a very good reason for it. In September 2015 I completed my two-years Master of Science degree in Natural Language Processing (with Computer Science) at the University of Munich (LMU) / Centre of Information and Language Processing. This was done as a side-project, next to work and family. So, I have been busy and instead of writing blog posts, my time was dedicated to these exemplary projects:

  • Automated text classification with neural networks using the Enron email corpus (details); technologies and methods: Java, SNIPE (NN library), language modeling, supervised learning with cross-validation
  • Model-theoretic semantics; technologies and methods: Prolog, Montague grammar, generation of syntax trees and transformation into discourse semantics (DRT)
  • Implementation of various algorithms; technologies and methods: Java, finite state machines/automata, tries, Hidden Markov models/Viterbi algorithm etc.
  • Functional programming; technologies and methods: Scala, Akka, Play web framework
  • Building a fulltext search engine server from scratch, based on inverted positional index; technologies and methods: Java, Jetty web server, REST API
  • Discourse analysis of violent conflicts; technologies and methods: Python, SKLearn, supervised learning with SVMs, MaxEnt and Naive Bayes, feature selection with Chi Square and Mutual Information, newspaper data from LexisNexis
  • Deterministic machine translation of Old English; technologies and methods: Grammatical Framework
  • Using Solr for fulltext indexing in a digital humanities project (Wittenstein corpus/Wittfind); technologies and methods: Solr, Lucene, XML-TEI, XSLT
  • Word sense disambiguation in Old English; technologies and methods: Maven machine learning framework, Java, supervised learning. Published together with Alexander Fraser and Paul Sander Langeslag in the proceedings of GSCL 2015 under the title “God Wat Þæt Ic Eom God – An Exploratory Investigation Into Word Sense Disambiguation in Old English” (PDF, BIB)
  • Master thesis: Argumentation mining and automated discourse analysis; technologies and methods: UIMA, DKPro, Java, Maven, supervised learning, extensive feature engineering.