igraph::pageranksmart_stopwords to be internal data so that package doesnt need to be explicitly loaded with library to be able to parseidf(d, t) = log( n / df(d, t) ) to idf(d, t) = log( n / df(d, t) ) + 1 to avoid zeroing out common word tfidf valueslexRank and unnest_sentencesunnest_sentences and unnest_sentences_ to parse sentences in a dataframe following tidy data principlesbind_lexrank and bind_lexrank_ to calculate lexrank scores for sentences in a dataframe following tidy data principles (unnest_sentences & bind_lexrank can be used on a df in a magrittr pipeline)sentenceSimil now calculated using Rcpp. Improves speed by ~25%-30% over old implementation using proxy packageAdded logic to avoid naming conflicts in proxy::pr_DB in sentenceSimil (#1, @AdamSpannbauer)
Added check and error for cases where no sentences above threshold in lexRankFromSimil (#2, @AdamSpannbauer)
tokenize now has stricter punctuation removal. Removes all non-alphnumeric characters as opposed to removing [:punct:]