gr.aueb.cs.nlp.postagger
Class SmallSetFunctions

java.lang.Object
  extended by gr.aueb.cs.nlp.postagger.SmallSetFunctions
           Contains the functions that use the basic tag set.
public class SmallSetFunctions
extends java.lang.Object


Method Summary
static java.util.List<WordWithCategory> smallSetClassifyFile(java.lang.String filename)

           A static method that classifies (tags) every token (word, symbol etc.) of a text file (in UTF-8 encoding) using the coarse tagset.
static java.util.List<WordWithCategory> smallSetClassifyString(java.lang.String stringToClassify)

           A static method that classifies (tags) every token of a string using the coarse tagset.
static double smallSetEvaluateFile(java.lang.String filename)

           A static method that computes the tagger's accuracy, given a file (in UTF-8 encoding) containing a sequence of tokens and their correct coarse categories (tags).
static void smallSetTrainOtherClassifier(java.lang.String filename)

           A static method that trains the tagger on a file (in UTF-8 encoding) containing a sequence of tokens and their correct coarse categories (tags).
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

smallSetClassifyFile

public static java.util.List<WordWithCategory> smallSetClassifyFile(java.lang.String filename)
                                                             throws java.io.FileNotFoundException,
                                                                    java.io.IOException
A static method that classifies (tags) every token (word, symbol etc.) of a text file (in UTF-8 encoding) using the coarse tagset. All the tokens of the file must be separated by whitespace characters (e.g. " δυνάμεων , δήλωσε κάτοικος της πόλης στο πρακτορείο Reuters . ").

Input: String - the location of the file.
Output: List <WordWithCategory> - a list of every word of the file with its category (tag).
Throws:
java.io.FileNotFoundException
java.io.IOException

smallSetClassifyString

public static java.util.List<WordWithCategory> smallSetClassifyString(java.lang.String stringToClassify)
A static method that classifies (tags) every token of a string using the coarse tagset. All the tokens of the string must be separated by whitespace characters.

Input: String - the location of the file.
Output: List <WordWithCategory> - a list of every word of the file with its category (tag).

smallSetEvaluateFile

public static double smallSetEvaluateFile(java.lang.String filename)
A static method that computes the tagger's accuracy, given a file (in UTF-8 encoding) containing a sequence of tokens and their correct coarse categories (tags). The file must contain one line for each token, and each line must contain the token followed by the correct tag, separated by a space, as in the example output of the previous method.

Input: String - the location of the file.
Output: double - the tagger's accuracy on the tokens of the input file.

smallSetTrainOtherClassifier

public static void smallSetTrainOtherClassifier(java.lang.String filename)
A static method that trains the tagger on a file (in UTF-8 encoding) containing a sequence of tokens and their correct coarse categories (tags). The file must be in the same format as the example output of method smallSetClassifyString.

Input: String - the location of the file.