Skip to content

qdap Version 2.1.1

Compare
Choose a tag to compare
@trinker trinker released this 02 Aug 14:52
· 175 commits to master since this release

CHANGES IN qdap VERSION 2.1.1

BUG FIXES

  • syllable_count returned the sentence (recycled) in the words column of the
    output. This behavior has been fixed. See GitHub issue #188 for details.
  • syn returned antonyms for some words. This was caused by the dictionary:
    qdapDictionaries::key.syn contained antonyms and elemets the were error
    messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)
  • The pres_debates2012 data set contained three errors in speech attribution.
    This has been corrected and the turn of talk (tot) as well.
  • word_stats would throw an error if no poly-syllable words existed. This has
    been corrected (reported by Nicolas Turenne).

NEW FEATURES

  • qdap_df and %&% added to mimic some of the functionality of dplyr's
    tbl_df and chaining pipe in a more specific, less flexible, qdap oriented
    way.
  • Text added to view and change the text.var attribute of a data.frame of the classqdap_df`.
  • cumulative generic method added to view cumulative scores over time.
  • formality picks up a cumulative method.
  • polarity picks up a cumulative method.
  • end_mark picks up a class (end_mark), plot method, and a cumulative
    method.
  • syllable_sum, polysyllable_sum, and combo_syllable_sum pick up a
    class, plot method, and a cumulative method.
  • wfm becomes a generic method currently applied to a text.var that is:
    character, factor (coerced to character), or wfdf.
  • unbag added as a compliment to bag_o_words and friends for undoing string
    splitting. A convenience wrapper for paste(collapse = " ").
  • as.Corpus.TermDocumentMatrix, as.Corpus.DocumentTermMatrix, and
    as.Corpus.wfm added to convert a matrix format to a tm::Corpus.
  • exclude becomes a generic method for various classes. Functionality is the
    same but with improved code readability.
  • check_spelling_interactive, check_spelling, which_misspelled, and
    correct allow the user to identify potentially misspelled words and
    optionally suggest replacements.
  • random_data & random_sent added to generate random sentence data sets and
    vectors.
  • comma_spacer added to ensure strings with commas contain a space after them.
  • check_text added to identify potential problems in text.
  • replace_ordinal added to convert ordinal representations of 1 through 100 to
    strictly ordinal text (e.g., "1st" becomes "first").
  • A vignette: Cleaning Text & Debugging was added to assist users with
    cleaning and debugging problems in qdap.
  • pronoun_type, and subject_pronoun_type, object_pronoun_type added to
    examine usage of subject/object pronouns by grouping variable.

MINOR FEATURES

IMPROVEMENTS

  • wfm gains a speedup through generic classes and tm package integration
    (strip is no longer used in wfm).
  • as.tdm.character and as.dtm.character gain a speed boost with a tm
    package integration.
  • Added message to as.data.frame.Corpus for missing end-marks suggesting the
    use of: sent.split = FALSE.
  • as.Corpus familiy of functions didn't necessarily respect document names and
    sometimes used numeric sequence instead. The introduction of a reader via
    tm::readTabular has fixed this.
  • sentSplit now gives warnings for text that may contain anomalies such as:
    non-ASCII characters, factors, missing punctuation, empty cells, and no
    alphabetic characters found.
  • read.transcript now gives a warning when reading from a .docx file and the
    separator (sep) used is still found in the text as this may indicate the
    data did not split correctly.
  • dispersion_plot now takes a named list of vectors of terms as the argument to
    match.terms. The vectors are combined as a unified theme named with the
    names of the list supplied to match.terms.

CHANGES

  • as.data.frame.Corpus's default value for sent.split is now FALSE.
  • The state column in the qdap::DATA2 data-set is now character (previously
    factor).