Releases: trinker/qdap
qdap Version 2.2.0
NEWS
Versioning
Releases will be numbered with the following semantic versioning format:
<major>.<minor>.<patch>
And constructed with the following guidelines:
- Breaking backward compatibility bumps the major (and resets the minor
and patch) - New additions without breaking backward compatibility bumps the minor
(and resets the patch) - Bug fixes and misc. changes bumps the patch
CHANGES IN qdap VERSION 2.2.0
BUG FIXES
bag_o_words
did not make use of thebag_o_words2
helper function that has
finer grained control of the output....
were ignored but now are respected.fry
threw an error if a group contained < 300 words but had enough text to
generate 2 texts chunks of 100 words each, caught by S. Enrico P. Indiogine.
The bug has been fixed as these groups are dropped and a warning given.phrase_net
threw an error caused by dplyr's (0.3) approach to subsetting
columns. Proviously a vector was returned, now atbl_df
object is returned:
tidyverse/dplyr#587. This was adtreeded by using
explicitdf[[index]]
rather thandf[, index]
.
NEW FEATURES
chunker
added to break text, optionally by grouping variables, into equal
chunks. The chunk size can be specified by giving number of words to be in
each chunk or the number of chunks.
IMPROVEMENTS
all_words
gains char.keep
and char2space
arguments to enable retention
of characters and multi word phrases. These features are passed to
freq_terms
as well. Suggestd by stackoverflow's lawyeR
(http://stackoverflow.com/a/26162401/1000343).
CHANGES
rm_url
has been moved into its own canned regex pattern extraction/replacer
package namedqdapRegex
.name2sex
now uses the gender package to predict sex. This makes the
function slightly slower but much more accurate than previous versions.
Because of this increased accuracy and dependence ongender
, the arguments
pred.sex
,fuzzy.match
, anddatabase
are no longer necessary and have
been removed.
CHANGES IN qdap VERSION 2.1.1
BUG FIXES
syllable_count
returned the sentence (recycled) in thewords
column of the
output. This behavior has been fixed. See GitHub issue #188 for details.syn
returned antonyms for some words. This was caused by the dictionary:
qdapDictionaries::key.syn
contained antonyms and elemets the were error
messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)- The
pres_debates2012
data set contained three errors in speech attribution.
This has been corrected and the turn of talk (tot
) as well. word_stats
would throw an error if no poly-syllable words existed. This has
been corrected (reported by Nicolas Turenne).
NEW FEATURES
qdap_df
and%&%
added to mimic some of the functionality ofdplyr
's
tbl_df
and chaining pipe in a more specific, less flexible,qdap
oriented
way.Text
added to view and change thetext.var
attribute of adata.frame of the class
qdap_df`.cumulative
generic method added to view cumulative scores over time.formality
picks up acumulative
method.polarity
picks up acumulative
method.end_mark
picks up aclass
(end_mark
),plot
method, and acumulative
method.syllable_sum
,polysyllable_sum
, andcombo_syllable_sum
pick up a
class
,plot
method, and acumulative
method.wfm
becomes a generic method currently applied to atext.var
that is:
character
,factor
(coerced tocharacter
), orwfdf
.unbag
added as a compliment tobag_o_words
and friends for undoing string
splitting. A convenience wrapper forpaste(collapse = " ")
.as.Corpus.TermDocumentMatrix
,as.Corpus.DocumentTermMatrix
, and
as.Corpus.wfm
added to convert a matrix format to atm::Corpus
.exclude
becomes a generic method for various classes. Functionality is the
same but with improved code readability.check_spelling_interactive
,check_spelling
,which_misspelled
, and
correct
allow the user to identify potentially misspelled words and
optionally suggest replacements.random_data
&random_sent
added to generate random sentence data sets and
vectors.comma_spacer
added to ensure strings with commas contain a space after them.check_text
added to identify potential problems in text.replace_ordinal
added to convert ordinal representations of 1 through 100 to
strictly ordinal text (e.g., "1st" becomes "first").- A vignette:
Cleaning Text & Debugging
was added to assist users with
cleaning and debugging problems inqdap
. pronoun_type
, andsubject_pronoun_type
,object_pronoun_type
added to
examine usage of subject/object pronouns by grouping variable.
MINOR FEATURES
dplyr
's chaining pipe imported for convenience. See
http://www.rdocumentation.org/packages/magrittr/functions/magrittr for details.
IMPROVEMENTS
wfm
gains a speedup through generic classes andtm
package integration
(strip
is no longer used inwfm
).as.tdm.character
andas.dtm.character
gain a speed boost with atm
package integration.- Added message to
as.data.frame.Corpus
for missing end-marks suggesting the
use of:sent.split = FALSE
. as.Corpus
familiy of functions didn't necessarily respect document names and
sometimes used numeric sequence instead. The introduction of a reader via
tm::readTabular
has fixed this.sentSplit
now gives warnings for text that may contain anomalies such as:
non-ASCII characters, factors, missing punctuation, empty cells, and no
alphabetic characters found.read.transcript
now gives a warning when reading from a .docx file and the
separator (sep
) used is still found in the text as this may indicate the
data did not split correctly.dispersion_plot
now takes a named list of vectors of terms as the argument to
match.terms
. The vectors are combined as a unified theme named with the
names of the list supplied tomatch.terms
.
CHANGES
as.data.frame.Corpus
's default value forsent.split
is nowFALSE
.- The
state
column in theqdap::DATA2
data-set is now character (previously
factor).
CHANGES IN qdap VERSION 2.1.0
BUG FIXES
new_project
did not copy the .Rprofile over into the new project. This has
been fixed. Reference issue #184.sentiment_frame
coerced words to factor.stringsAsFactors = FALSE
has
been added to prevent this.polarity
did not work on > 1 grams due to a bug insentiment_frame
converting character to factor (thanks for the find @chewth). See GitHub
issue #185 for details.
NEW FEATURES
unique_by
added to allow the user to find terms unique to individual
elements of a grouping variable.build_qdap_vignette
replaces the temporary place holder version of the
Introduction to qdap vignette. This function will replace the (1) HTML,
(2) source, & (3) R code found inbrowseVignettes(package = 'qdap')
.
MINOR FEATURES
sub_holder
picks up aalpha.type
argument that allows the user to specify
whether alpha or numeric keys should be used.replace_number
picks up aremove
argument that removes numbers from text.
IMPROVEMENTS
qheat
becomes a generic method. This means some of the internal function
class checking has been moved to individual methods for those classes.
Additionally,qheat
now works with logical matrices/data.frames.- The
tm
package compatibility functions have been renamed in a more R-ish
way and take the form of generic methods for specific classes. For example,
df2tm_corpus
becomesas.Corpus
. Here is a complete list of changes:df2tm_courpus
is nowas.Corpus
tm_corpus2df
is nowas.data.frame
as.wfm
is now a generic methodtm_corpus2wfm
is nowas.wfm
tm2qdap
is nowas.wfm
tdm
is nowas.tdm
oras.TermDocumentMatrix
dtm
is nowas.dtm
oras.DocumentTermMatrix
CHANGES
colsplit2df
andcolpaste2df
no longer convert character columns to factor.df2tm_corpus
is deprecated. It will be removed in a subsequent version of
qdap
. Useas.Corpus
instead.tm_corpus2df
is deprecated. It will be removed in a subsequent version of
qdap
. Useas.data.frame
instead.tm2qdap
is deprecated. It will be removed in a subsequent version of
qdap
. Useas.wfm
instead.tm_corpus2wfm
is deprecated. It will be removed in a subsequent version of
qdap
. Useas.wfm
instead.tdm
is deprecated. It will be removed in a subsequent version ofqdap
.
Useas.tdm
oras.TermDocumentMatrix
instead.dtm
is deprecated. It will be removed in a subsequent version ofqdap
.
Useas.dtm
oras.DocumentTermMatrix
instead.- The Introduction to qdap .Rmd vignette has been moved to an internal
directory. The HTML version is not built by default. This saves CRAN space
and time checking the package source. The file has been replaced with a
temporary place holder that contains instructions for building the actual
vignette. The user may also use the `bui...
qdap Version 2.1.1
CHANGES IN qdap VERSION 2.1.1
BUG FIXES
syllable_count
returned the sentence (recycled) in thewords
column of the
output. This behavior has been fixed. See GitHub issue #188 for details.syn
returned antonyms for some words. This was caused by the dictionary:
qdapDictionaries::key.syn
contained antonyms and elemets the were error
messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)- The
pres_debates2012
data set contained three errors in speech attribution.
This has been corrected and the turn of talk (tot
) as well. word_stats
would throw an error if no poly-syllable words existed. This has
been corrected (reported by Nicolas Turenne).
NEW FEATURES
qdap_df
and%&%
added to mimic some of the functionality ofdplyr
's
tbl_df
and chaining pipe in a more specific, less flexible,qdap
oriented
way.Text
added to view and change thetext.var
attribute of adata.frame of the class
qdap_df`.cumulative
generic method added to view cumulative scores over time.formality
picks up acumulative
method.polarity
picks up acumulative
method.end_mark
picks up aclass
(end_mark
),plot
method, and acumulative
method.syllable_sum
,polysyllable_sum
, andcombo_syllable_sum
pick up a
class
,plot
method, and acumulative
method.wfm
becomes a generic method currently applied to atext.var
that is:
character
,factor
(coerced tocharacter
), orwfdf
.unbag
added as a compliment tobag_o_words
and friends for undoing string
splitting. A convenience wrapper forpaste(collapse = " ")
.as.Corpus.TermDocumentMatrix
,as.Corpus.DocumentTermMatrix
, and
as.Corpus.wfm
added to convert a matrix format to atm::Corpus
.exclude
becomes a generic method for various classes. Functionality is the
same but with improved code readability.check_spelling_interactive
,check_spelling
,which_misspelled
, and
correct
allow the user to identify potentially misspelled words and
optionally suggest replacements.random_data
&random_sent
added to generate random sentence data sets and
vectors.comma_spacer
added to ensure strings with commas contain a space after them.check_text
added to identify potential problems in text.replace_ordinal
added to convert ordinal representations of 1 through 100 to
strictly ordinal text (e.g., "1st" becomes "first").- A vignette:
Cleaning Text & Debugging
was added to assist users with
cleaning and debugging problems inqdap
. pronoun_type
, andsubject_pronoun_type
,object_pronoun_type
added to
examine usage of subject/object pronouns by grouping variable.
MINOR FEATURES
dplyr
's chaining pipe imported for convenience. See
http://www.rdocumentation.org/packages/magrittr/functions/magrittr for details.
IMPROVEMENTS
wfm
gains a speedup through generic classes andtm
package integration
(strip
is no longer used inwfm
).as.tdm.character
andas.dtm.character
gain a speed boost with atm
package integration.- Added message to
as.data.frame.Corpus
for missing end-marks suggesting the
use of:sent.split = FALSE
. as.Corpus
familiy of functions didn't necessarily respect document names and
sometimes used numeric sequence instead. The introduction of a reader via
tm::readTabular
has fixed this.sentSplit
now gives warnings for text that may contain anomalies such as:
non-ASCII characters, factors, missing punctuation, empty cells, and no
alphabetic characters found.read.transcript
now gives a warning when reading from a .docx file and the
separator (sep
) used is still found in the text as this may indicate the
data did not split correctly.dispersion_plot
now takes a named list of vectors of terms as the argument to
match.terms
. The vectors are combined as a unified theme named with the
names of the list supplied tomatch.terms
.
CHANGES
as.data.frame.Corpus
's default value forsent.split
is nowFALSE
.- The
state
column in theqdap::DATA2
data-set is now character (previously
factor).
qdap Version 2.1.0
CHANGES IN qdap VERSION 2.1.0
BUG FIXES
new_project
did not copy the .Rprofile over into the new project. This has
been fixed. Reference issue #184.sentiment_frame
coerced words to factor.stringsAsFactors = FALSE
has
been added to prebent this.polarity
did not work on > 1 grams due to a bug insentiment_frame
converting character to factor (chewth). See GitHub issue #185 for details.
NEW FEATURES
unique_by
added to allow the user to find terms unique to individual
elements of a grouping variable.build_qdap_vignette
replaces the temporary place holder version of the
Introduction to qdap vignette. This function will replace the (1) HTML,
(2) source, & (3) R code found inbrowseVignettes(package = 'qdap')
.
MINOR FEATURES
sub_holder
picks up aalpha.type
argument that allows the user to specify
whether alpha or numeric keys should be used.replace_number
picks up aremove
argument that removes numbers from text.
IMPROVEMENTS
qheat
becomes a generic method. This means some of the internal function
class checking has been moved to individual methods for those classes.
Additionally,qheat
now works with logical matrices/data.frames.- The
tm
package compatibility functions have been renamed in a more R-ish
way and take the form of generic methods for specific classes. For example,
df2tm_corpus
becomesas.Corpus
. Here is a complete list of changes:df2tm_courpus
is nowas.Corpus
tm_corpus2df
is nowas.data.frame
as.wfm
is now a generic methodtm_corpus2wfm
is nowas.wfm
tm2qdap
is nowas.wfm
tdm
is nowas.tdm
oras.TermDocumentMatrix
dtm
is nowas.dtm
oras.DocumentTermMatrix
CHANGES
colsplit2df
andcolpaste2df
no longer convert character columns to factor.df2tm_corpus
is deprecated. It will be removed in a subsequent version of
qdap
. Useas.Corpus
instead.tm_corpus2df
is deprecated. It will be removed in a subsequent version of
qdap
. Useas.data.frame
instead.tm2qdap
is deprecated. It will be removed in a subsequent version of
qdap
. Useas.wfm
instead.tm_corpus2wfm
is deprecated. It will be removed in a subsequent version of
qdap
. Useas.wfm
instead.tdm
is deprecated. It will be removed in a subsequent version ofqdap
.
Useas.tdm
oras.TermDocumentMatrix
instead.dtm
is deprecated. It will be removed in a subsequent version ofqdap
.
Useas.dtm
oras.DocumentTermMatrix
instead.- The Introduction to qdap .Rmd vignette has been moved to an internal
directory. The HTML version is not built by default. THis saves CRAN space
and time checking the package source. The file has been replaced with a
temporary place holder that contains instructions for building the actual
vignette. The user may also use thebuild_qdap_vignette
directly. qdap
incorporates the chanegs from thetm
package version: 0.6:
http://cran.r-project.org/web/packages/tm/news.html Reference issue #187.
qdapTools Version 2.0.0.b
CHANGES IN qdap VERSION 2.0.0
The qdapTools
package now houses several former qdap functions. While
qdapTools
is a Dependency and all of these functions will be accessible to
the qdap user there is a break in backward compatibility if these functions
are included in code. For this reason this release is a major bump of qdap.
BUG FIXES
replace_number
did not replace single digits numbers. Spotted by Ben Bolker.
This behavior has been fixed and unit testing added for this function. See
issue #178.
NEW FEATURES
sub_holder
added; this function holds the place for particular character
values, allowing the user to manipulate the vector and then revert the place
holders back to the original values.Network
method added to make network plots of select qdap objects.qtheme
,theme_nightheat
,theme_duskheat
, theme_norah,
theme_cafe,
theme_grayscale,
theme_badkitchen, and
theme_hipsteradded to style
Network` plots.polarity
picks up aNetwork
method.formality
picks up aNetwork
method.- qdap officially begins utilizing the
testthat
package for unit testing,
though only a few functions have begun the process, more will be added over
time.
MINOR FEATURES
IMPROVEMENTS
CHANGES
- The
qdapTools
package now houses the following formerqdap
functions:
hash
,%ha%
,hash_look
,hms2sec
,id
,lookup
,%l%
,%l+%
,%l*%
,
repo2github
,sec2hms
,text2color
,url_dl
,v_outer
,list2df
,
matrix2df
,vect2df
,list_df2df
,list_vect2df
,counts2list
,
vect2list
, &mtabulate
. These functions will continue to be available to
qdap users in interactive mode (qdapTools
is a Dependency and thus these
functions are loaded into the workspace by default). This will allow this
bundle of functions to be used outside of qdap without calling the larger qdap
package per the request of Kirill Muller (see issue #165). - As schedulaed the
dissimialrity
function has been removed from the qdap
package to avoid conflict with thetm
package. UseDissimilarity
function
instead.
qdap Version 2.0.0
Initial 2.0.0 bump:
CHANGES IN qdap VERSION 2.0.0
The qdapTools
package now houses several former qdap functions. While
qdapTools
is a Dependency and all of these functions will be accessible to
the qdap user there is a break in backward compatability if these functions
are included in code. For this reason this release is a major bump of qdap.
CHANGES
- The
qdapTools
package now houses the following formerqdap
functions:
hash
,%ha%
,hash_look
,hms2sec
,id
,lookup
,%l%
,%l+%
,%l*%
,
repo2github
,sec2hms
,text2color
,url_dl
,v_outer
. These functions
will continue to be available toqdap
users in interactive mode (qdapTools
is a Dependency and thus these functions are loaded into the workspace by
default). This will allow this bundle of functions to be used outside of
qdap without calling the larger qdap package per the request of Kirill Muller
(see issue #165). - The
dissimialrity
function has been removed from the qdap package to avoid
conflict with thetm
package. UseDissimilarity
function instead.
qdap Version 1.3.6
CHANGES IN qdap VERSION 1.3.6
MINOR FEATURES
polarity
picks up aconstrain
argument that constrains the polarity values
to be between -1 and 1.
IMPROVEMENTS
polarity
's equation now uses primes on the de-amplifiers before they're
confined to be >= -1. This avoids confusion in the indicator function that
took the de-amplifiers variable and returned the same variable.dist_tab
's frequency columns used a capital F in Freq. This was not
consistent across all column names and has been changed to lower case.
CHANGES
polarity_frame
is deprecated and will be removed in a subsequent release.
Please usesentiment_frame
instead.