gr.aueb.cs.nlp.postagger
Class BigSetFunctions
java.lang.Object
gr.aueb.cs.nlp.postagger.BigSetFunctions
Contains the functions that use the extended tag set.
public class BigSetFunctions
- extends java.lang.Object
Method Summary |
static java.util.List<WordWithCategory> |
bigSetClassifyFile(java.lang.String filename)
A static method that classifies (tags) every token of a text file (in UTF-8 encoding) using the fine tagset. |
static java.util.List<WordWithCategory> |
bigSetClassifyString(java.lang.String stringToClassify)
A static method that classifies (tags) every token of a string using the fine tagset. |
static double |
bigSetEvaluateFile(java.lang.String filename)
A static method that computes the tagger's accuracy, given a file (in UTF-8
encoding) containing a sequence of tokens and their correct fine categories
(tags). |
static void |
bigSetTrainOtherClassifier(java.lang.String filename)
A static method that trains the tagger on a file (in UTF-8 encoding)
containing a sequence of tokens and their correct fine categories (tags). |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
bigSetClassifyFile
public static java.util.List<WordWithCategory> bigSetClassifyFile(java.lang.String filename)
throws java.io.FileNotFoundException,
java.io.IOException
A static method that classifies (tags) every token of a text file
(in UTF-8 encoding) using the fine tagset. All the tokens of the
file must be separated by whitespace characters.
- Input: String - the location of the file.
- Output: List <WordWithCategory> - a list of every word of the file with its category (tag).
- Throws:
java.io.FileNotFoundException
java.io.IOException
bigSetClassifyString
public static java.util.List<WordWithCategory> bigSetClassifyString(java.lang.String stringToClassify)
A static method that classifies (tags) every token of a string using the
fine tagset. All the tokens of the string must be separated by
whitespace characters.
- Input: String - the location of the file.
- Output: List <WordWithCategory> - a list of every word of the file with its category (tag).
bigSetEvaluateFile
public static double bigSetEvaluateFile(java.lang.String filename)
A static method that computes the tagger's accuracy, given a file (in UTF-8
encoding) containing a sequence of tokens and their correct fine categories
(tags). The file must contain one line for each token, and each line must
contain the token followed by the correct tag, separated by a space, as
in the example output of the previous method.
- Input: String - the location of the file.
- Output: double - the tagger's accuracy on the tokens of the input file.
bigSetTrainOtherClassifier
public static void bigSetTrainOtherClassifier(java.lang.String filename)
A static method that trains the tagger on a file (in UTF-8 encoding)
containing a sequence of tokens and their correct fine categories (tags).
The file must be in the same format as the example output of method bigSetClassifyString.
- Input: String - the location of the file.