gr.aueb.cs.nlp.postagger
Class BigSetFunctions

java.lang.Object
  extended by gr.aueb.cs.nlp.postagger.BigSetFunctions
           Contains the functions that use the extended tag set.
public class BigSetFunctions
extends java.lang.Object


Method Summary
static java.util.List<WordWithCategory> bigSetClassifyFile(java.lang.String filename)

           A static method that classifies (tags) every token of a text file (in UTF-8 encoding) using the fine tagset.
static java.util.List<WordWithCategory> bigSetClassifyString(java.lang.String stringToClassify)

           A static method that classifies (tags) every token of a string using the fine tagset.
static double bigSetEvaluateFile(java.lang.String filename)

           A static method that computes the tagger's accuracy, given a file (in UTF-8 encoding) containing a sequence of tokens and their correct fine categories (tags).
static void bigSetTrainOtherClassifier(java.lang.String filename)

           A static method that trains the tagger on a file (in UTF-8 encoding) containing a sequence of tokens and their correct fine categories (tags).
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

bigSetClassifyFile

public static java.util.List<WordWithCategory> bigSetClassifyFile(java.lang.String filename)
                                                           throws java.io.FileNotFoundException,
                                                                  java.io.IOException
A static method that classifies (tags) every token of a text file (in UTF-8 encoding) using the fine tagset. All the tokens of the file must be separated by whitespace characters.

Input: String - the location of the file.
Output: List <WordWithCategory> - a list of every word of the file with its category (tag).
Throws:
java.io.FileNotFoundException
java.io.IOException

bigSetClassifyString

public static java.util.List<WordWithCategory> bigSetClassifyString(java.lang.String stringToClassify)
A static method that classifies (tags) every token of a string using the fine tagset. All the tokens of the string must be separated by whitespace characters.

Input: String - the location of the file.
Output: List <WordWithCategory> - a list of every word of the file with its category (tag).

bigSetEvaluateFile

public static double bigSetEvaluateFile(java.lang.String filename)
A static method that computes the tagger's accuracy, given a file (in UTF-8 encoding) containing a sequence of tokens and their correct fine categories (tags). The file must contain one line for each token, and each line must contain the token followed by the correct tag, separated by a space, as in the example output of the previous method.

Input: String - the location of the file.
Output: double - the tagger's accuracy on the tokens of the input file.

bigSetTrainOtherClassifier

public static void bigSetTrainOtherClassifier(java.lang.String filename)
A static method that trains the tagger on a file (in UTF-8 encoding) containing a sequence of tokens and their correct fine categories (tags). The file must be in the same format as the example output of method bigSetClassifyString.

Input: String - the location of the file.