Tuesday, July 30, 2013

An N-Gram Parser in F#

module NGramParsing =
let GetNGrams tokens nGramOrder outputDelimiter =
tokens
|> Seq.windowed nGramOrder
|> Seq.map (String.concat outputDelimiter)
view raw gistfile1.fs hosted with ❤ by GitHub
This is a very basic n-gram parser in F#. It takes an IEnumerable containing the word tokens that you want N-Grams for, a number indicating the n-gram order, and a delimiter for the output. It returns an IEnumerable with each string containing a single n-gram with the words separated by the delimiter provided in the third parameter. For example:
NGramParsing.GetNGrams(new [] { "This", "is", "a", "test" }, 2, "|");
Would yield:
{ "This|is", "is|a", "a|test" }

No comments: