Software and data

Software

  • Evaluation Measures for Hierarchical Classification: The software that accompanies our technical report "Evaluation Measures for Hierarchical Classification: a Unified View and Novel Approaches". Download
  • Greek part-of-speech tagger. The tagger attempts to automatically determine the part of speech (e.g., noun, adjective, verb, etc.) of each word occurrence in Greek texts. It can also tag each word occurrence with additional information, such as the gender, number, and case of each noun, the voice, tense, and number of each verb etc.
    • Download (version 2.2 alpha): Minor bug fixes.
    • Download (version 2.1 alpha): This version uses Stanford's Maximum Entropy Classifier (see http://nlp.stanford.edu/software/), it performs better than version 1, and it provides an API. However, it does not yet provide a GUI, nor active learning facilities.
    • Download (version 1): This version uses a k-nearest neighbour classifier. It includes a GUI and active learning facilities, but no API.
  • NaiveBayesSpamDetector: an experimental e-mail spam filter that uses various forms of the Naive Bayes classifier.
  • Named-entity recognizer for Greek texts.
  • NaturalOWL: a natural language generator for OWL ontologies that supports English and Greek; it can be used within Protégé.
  • NLITDB: A prototype natural language interface for temporal databases. Download
  • Sentence compression software: the software of our HLT-NAACL 2010 paper. Download

Data

  • AspectTermSimilarities: manually specified similarities between aspect terms of English restaurant and laptop reviews, as used in our EACL 2014 paper "Multi-Granular Aspect Aggregation in Aspect-Based Sentiment Analysis". Download
  • Enron-Spam: contains ham e-mail messages from the Enron corpus and spam messages. Download
  • Ling-Spam: contains ham e-mail messages from a mailing list and spam messages. Download
  • Paraphrases: a collection of sentences and manually scored candidate paraphrases, as used in our EMNLP 2011 paper "A Generate and Rank Approach to Sentence Paraphrasing". Download
  • PU: contains ham e-mail messages (in encoded form) and spam messages. Download