qdap Version 2.1.1
trinker
released this
02 Aug 14:52
·
175 commits
to master
since this release
CHANGES IN qdap VERSION 2.1.1
BUG FIXES
syllable_count
returned the sentence (recycled) in thewords
column of the
output. This behavior has been fixed. See GitHub issue #188 for details.syn
returned antonyms for some words. This was caused by the dictionary:
qdapDictionaries::key.syn
contained antonyms and elemets the were error
messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)- The
pres_debates2012
data set contained three errors in speech attribution.
This has been corrected and the turn of talk (tot
) as well. word_stats
would throw an error if no poly-syllable words existed. This has
been corrected (reported by Nicolas Turenne).
NEW FEATURES
qdap_df
and%&%
added to mimic some of the functionality ofdplyr
's
tbl_df
and chaining pipe in a more specific, less flexible,qdap
oriented
way.Text
added to view and change thetext.var
attribute of adata.frame of the class
qdap_df`.cumulative
generic method added to view cumulative scores over time.formality
picks up acumulative
method.polarity
picks up acumulative
method.end_mark
picks up aclass
(end_mark
),plot
method, and acumulative
method.syllable_sum
,polysyllable_sum
, andcombo_syllable_sum
pick up a
class
,plot
method, and acumulative
method.wfm
becomes a generic method currently applied to atext.var
that is:
character
,factor
(coerced tocharacter
), orwfdf
.unbag
added as a compliment tobag_o_words
and friends for undoing string
splitting. A convenience wrapper forpaste(collapse = " ")
.as.Corpus.TermDocumentMatrix
,as.Corpus.DocumentTermMatrix
, and
as.Corpus.wfm
added to convert a matrix format to atm::Corpus
.exclude
becomes a generic method for various classes. Functionality is the
same but with improved code readability.check_spelling_interactive
,check_spelling
,which_misspelled
, and
correct
allow the user to identify potentially misspelled words and
optionally suggest replacements.random_data
&random_sent
added to generate random sentence data sets and
vectors.comma_spacer
added to ensure strings with commas contain a space after them.check_text
added to identify potential problems in text.replace_ordinal
added to convert ordinal representations of 1 through 100 to
strictly ordinal text (e.g., "1st" becomes "first").- A vignette:
Cleaning Text & Debugging
was added to assist users with
cleaning and debugging problems inqdap
. pronoun_type
, andsubject_pronoun_type
,object_pronoun_type
added to
examine usage of subject/object pronouns by grouping variable.
MINOR FEATURES
dplyr
's chaining pipe imported for convenience. See
http://www.rdocumentation.org/packages/magrittr/functions/magrittr for details.
IMPROVEMENTS
wfm
gains a speedup through generic classes andtm
package integration
(strip
is no longer used inwfm
).as.tdm.character
andas.dtm.character
gain a speed boost with atm
package integration.- Added message to
as.data.frame.Corpus
for missing end-marks suggesting the
use of:sent.split = FALSE
. as.Corpus
familiy of functions didn't necessarily respect document names and
sometimes used numeric sequence instead. The introduction of a reader via
tm::readTabular
has fixed this.sentSplit
now gives warnings for text that may contain anomalies such as:
non-ASCII characters, factors, missing punctuation, empty cells, and no
alphabetic characters found.read.transcript
now gives a warning when reading from a .docx file and the
separator (sep
) used is still found in the text as this may indicate the
data did not split correctly.dispersion_plot
now takes a named list of vectors of terms as the argument to
match.terms
. The vectors are combined as a unified theme named with the
names of the list supplied tomatch.terms
.
CHANGES
as.data.frame.Corpus
's default value forsent.split
is nowFALSE
.- The
state
column in theqdap::DATA2
data-set is now character (previously
factor).