gr.aueb.cs.nlp.postagger
Class SmallSetFunctions
java.lang.Object
gr.aueb.cs.nlp.postagger.SmallSetFunctions
Contains the functions that use the basic tag set.
public class SmallSetFunctions
- extends java.lang.Object
Method Summary |
static java.util.List<WordWithCategory> |
smallSetClassifyFile(java.lang.String filename)
A static method that classifies (tags) every token (word, symbol etc.) of a text file (in UTF-8 encoding) using the coarse tagset. |
static java.util.List<WordWithCategory> |
smallSetClassifyString(java.lang.String stringToClassify)
A static method that classifies (tags) every token of a string using the coarse tagset. |
static double |
smallSetEvaluateFile(java.lang.String filename)
A static method that computes the tagger's accuracy, given a file (in UTF-8 encoding) containing a sequence of tokens and their correct coarse categories (tags). |
static void |
smallSetTrainOtherClassifier(java.lang.String filename)
A static method that trains the tagger on a file (in UTF-8 encoding) containing a sequence of tokens and their correct coarse categories (tags). |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
smallSetClassifyFile
public static java.util.List<WordWithCategory> smallSetClassifyFile(java.lang.String filename)
throws java.io.FileNotFoundException,
java.io.IOException
A static method that classifies (tags) every token (word, symbol etc.)
of a text file (in UTF-8 encoding) using the coarse tagset. All the
tokens of the file must be separated by whitespace characters (e.g.
" δυνάμεων , δήλωσε κάτοικος της πόλης στο πρακτορείο Reuters . ").
- Input: String - the location of the file.
- Output: List <WordWithCategory> - a list of every word of the file with its category (tag).
- Throws:
java.io.FileNotFoundException
java.io.IOException
smallSetClassifyString
public static java.util.List<WordWithCategory> smallSetClassifyString(java.lang.String stringToClassify)
A static method that classifies (tags) every token of a string using the
coarse tagset. All the tokens of the string must be separated by
whitespace characters.
- Input: String - the location of the file.
- Output: List <WordWithCategory> - a list of every word of the file with its category (tag).
smallSetEvaluateFile
public static double smallSetEvaluateFile(java.lang.String filename)
A static method that computes the tagger's accuracy, given a file (in UTF-8
encoding) containing a sequence of tokens and their correct coarse categories
(tags). The file must contain one line for each token, and each line must
contain the token followed by the correct tag, separated by a space, as in
the example output of the previous method.
- Input: String - the location of the file.
- Output: double - the tagger's accuracy on the tokens of the input file.
smallSetTrainOtherClassifier
public static void smallSetTrainOtherClassifier(java.lang.String filename)
A static method that trains the tagger on a file (in UTF-8 encoding)
containing a sequence of tokens and their correct coarse categories (tags).
The file must be in the same format as the example output of method smallSetClassifyString.
- Input: String - the location of the file.