forked from trinker/qdap
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
1220 lines (796 loc) · 44.1 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
NEWS
====
Versioning
----------
Releases will be numbered with the following semantic versioning format:
<major>.<minor>.<patch>
And constructed with the following guidelines:
* Breaking backward compatibility bumps the major (and resets the minor
and patch)
* New additions without breaking backward compatibility bumps the minor
(and resets the patch)
* Bug fixes and misc. changes bumps the patch
CHANGES IN qdap VERSION 2.3.0 -
----------------------------------------------------------------
BUG FIXES
NEW FEATURES
MINOR FEATURES
IMPROVEMENTS
CHANGES
CHANGES IN qdap VERSION 2.2.5-2.2.9
----------------------------------------------------------------
BUG FIXES
* `check_spelling` and other spell checkers threw an error with a custom
dictionary that did not have at least one word beginning with all 26 letters
of the alphabet. The dictionary automatically uses `assume.first.correct=FALSE`
if this occurs. Reported by @CallumH of StackOverflow:
http://stackoverflow.com/q/33516466/1000343 See issue #217 for details.
* `check_spelling_interactive` replace substrings rather then bounded words.
This was caught by @chrisjacques See issue #221
* `replace_abbreviation` threw an error because `data.frame` converts character
to factor by default and `nchar` no longer works on factor. This was caught
by @karilint See issue #225
CHANGES IN qdap VERSION 2.2.4
----------------------------------------------------------------
NEW FEATURES
* `add_s` added to add -s, -es, or -ies to word endings.
MINOR FEATURES
IMPROVEMENTS
* `common` now returns `NULL` invisibly with a message rather than an error if
no groups meet the parmeters. Suggested by @bitanshu via issue #213
* `word_cor`'s defualt `group.var` is no longer `NULL` but set to use `1:nrow`
via `qdapTools::id(text.var)`. Thanks to Drew Schmidt for bringing this issue
to attention. Documentation and an error for `group.var = NULL` has been
updated to add clarity.
CHANGES
CHANGES IN qdap VERSION 2.2.2
----------------------------------------------------------------
BUG FIXES
* `type_token_ratio` was misnamed as `type_text_ratio`, this has been corrected.
The plot for this class also contained a misspelling "type-toke ratio" which
has been corrected as well.
NEW FEATURES
* `inspect_text` added to allow for pretty printed viewing of text strings and
**tm** `Corpus`es.
CHANGES
* The following functions had been previously deprecated and now have been
removed: `df2tm_corpus`, `tm2qdap`, `tm_corpus2wfm`, `tm_corpus2df`, `tdm`,
`dtm`, and `polarity_frame`.
CHANGES IN qdap VERSION 2.2.1
----------------------------------------------------------------
BUG FIXES
* The internal vignette "An Introduction to qdap" produced errors when compiled
by `build_qdap_vignete`. This behavior has been fixed by using static
reporting. The root of the behavior is the ability of `cm_` functions to
grab data from the global environment, which may not be the case in a `knitr`/
`rmarkdown` generated environment.
* `polarity` no longer handled phrases (words + spaces) for `polarity.frame`.
This behavior was caught by @Benasso http://stackoverflow.com/q/27156834/1000343.
This bug is a result of the changes made to `bag_o_words` earlier this year.
The bug has been fixed and a unit test put in place to ensure the bug is not
reintroduced.
* `Network.formality` did not include edge width handling. This has been
corrected.
* `word_stats` gave an incorrect warning message for missing endmarks:
"Some sentences not have standard qdap punctuation endmarks." The "do" has
been added: "Some sentences do not have standard qdap punctuation endmarks."
* `pres_debates2012` data set contained missplits in lines: 544, 1054. These
have been corrected (GitHub issue #205).
* `pos` threw an error if only one word was passed to `text.var`. Fix:
`drop = FALSE` has been added to data frame indexing. Caught by
StackOverflow user G_1991 http://stackoverflow.com/q/29896488/1000343.
* `as.tdm.wfm` would error if no grouping variable was supplied. This behavior
has been corrected.
NEW FEATURES
* `word_length` function added to give counts of word length usage by grouping
variable. See `?word_length` for details`
* `word_position` function added to give counts of the position of words within
a sentence.
* `sent_detect_nlp` added in the `sentSplit` family to wrap **NLP** package
functionality into a convenient function.
* `lexical_classification` provides a means of assessing content vs. functional
word usage at the grouping variable and sentence level. The class comes with
generic methods for `preprocessed`, `scores` (and plots of these methods),
`Animated`, `Network`, `cumulative` and `Animate.cumulative`.
* `Animate.character` added as a generic method that allows for the animation of
text. This is useful in conjunction with other \code{Animate} objects to
create complex animations with accompanying text.
* `add_incomplete` added to replace sentences with missing endmarks with a `|`
to indicate an incomplete sentence.
* `type_toke_ratio` added to determine type-token ratio per grouping variable.
IMPROVEMENTS
* `polarity` takes `polarity.frame` with phrases (words with spaces).
* The `Animate` method for the classes: `polarity` & `formality` gains the
ability to print corresponding animated text for combined use with other
`Animated` methods.
* `multigsub`/`mgsub` get a speed boost through better programming choices. See
issue #201 for details. Thank you to @Alexey Ferapontov for his critical post
http://stackoverflow.com/q/27367914/1000343 that inspired the changes.
* `formality` and `pos` now have minimal unit tests.
* `trans_context` used `message` to print to the console. This results in
truncated output. `message` has been replaced with `cat`.
* `strip` gets a speed boost (~10x) by using better regex algorithms,
consolidating code/function calls, and by creating a generic `strip` method
for different classes. Additionally, mutiple white spaces are now condensed
to a single white space.
* `scrubber` would automatically take a space and a single last character and
remove the space. This was to remove spaces before ending punctuation. `scrubber`
used `substring` rather than a more controlled regular expression.
This has been corrected. Report thanks to @Fabrizio Maccallini. See issue #207
for more information.
* `pres_debates2012` picks up a `role` column to make fitering out the
candidates easier. The variable order has also changed to put the `dialogue`
last.
CHANGES
* The **ggplot2** package is no longer in Depends. This means the user will
have to manually load the package to use additional ggplot2 features. See
GitHub issue #199 for more.
* `pos` now treats contractions words as 2 words. For example the word count on
what's is 2 for what + is. The previous behavior was to strip out the
apostrophes. This was undesirable as the sentence "She's cool" would have no
verb in the `pos` output. This change affects `pos_by` and `formality` as
well.
CHANGES IN qdap VERSION 2.2.0
----------------------------------------------------------------
BUG FIXES
* `bag_o_words` did not make use of the `bag_o_words2` helper function that has
finer grained control of the output. `...` were ignored but now are respected.
* `fry` threw an error if a group contained < 300 words but had enough text to
generate 2 texts chunks of 100 words each, caught by S. Enrico P. Indiogine.
The bug has been fixed as these groups are dropped and a warning given.
* `phrase_net` threw an error caused by **dplyr**'s (0.3) approach to subsetting
columns. Previously a vector was returned, now a `tbl_df` object is returned:
https://github.com/hadley/dplyr/issues/587. This was addressed by using
explicit `df[[index]]` rather than `df[, index]`.
NEW FEATURES
* `chunker` added to break text, optionally by grouping variables, into equal
chunks. The chunk size can be specified by giving number of words to be in
each chunk or the number of chunks.
IMPROVEMENTS
`all_words` gains `char.keep` and `char2space` arguments to enable retention
of characters and multi word phrases. These features are passed to
`freq_terms` as well. Suggested by stackoverflow's lawyeR
(http://stackoverflow.com/a/26162401/1000343).
CHANGES
* `rm_url` has been moved into its own canned regex pattern extraction/replacer
package named `qdapRegex`.
* `name2sex` now uses the **gender** package to predict sex. This makes the
function slightly slower but much more accurate than previous versions.
Because of this increased accuracy and dependence on `gender`, the arguments
`pred.sex`, `fuzzy.match`, and `database` are no longer necessary and have
been removed.
CHANGES IN qdap VERSION 2.1.1
----------------------------------------------------------------
BUG FIXES
* `syllable_count` returned the sentence (recycled) in the `words` column of the
output. This behavior has been fixed. See GitHub issue #188 for details.
* `syn` returned antonyms for some words. This was caused by the dictionary:
`qdapDictionaries::key.syn` contained antonyms and elements the were error
messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)
* The `pres_debates2012` data set contained three errors in speech attribution.
This has been corrected and the turn of talk (`tot`) as well.
* `word_stats` would throw an error if no poly-syllable words existed. This has
been corrected (reported by Nicolas Turenne).
NEW FEATURES
* `qdap_df` and `%&%` added to mimic some of the functionality of `dplyr`'s
`tbl_df` and chaining pipe in a more specific, less flexible, `qdap` oriented
way.
* `Text` added to view and change the `text.var` attribute of a `data.frame of
the class `qdap_df`.
* `cumulative` generic method added to view cumulative scores over time.
* `formality` picks up a `cumulative` method.
* `polarity` picks up a `cumulative` method.
* `end_mark` picks up a `class` (`end_mark`), `plot` method, and a `cumulative`
method.
* `syllable_sum`, `polysyllable_sum`, and `combo_syllable_sum` pick up a
`class`, `plot` method, and a `cumulative` method.
* `wfm` becomes a generic method currently applied to a `text.var` that is:
`character`, `factor` (coerced to `character`), or `wfdf`.
* `unbag` added as a compliment to `bag_o_words` and friends for undoing string
splitting. A convenience wrapper for `paste(collapse = " ")`.
* `as.Corpus.TermDocumentMatrix`, `as.Corpus.DocumentTermMatrix`, and
`as.Corpus.wfm` added to convert a matrix format to a `tm::Corpus`.
* `exclude` becomes a generic method for various classes. Functionality is the
same but with improved code readability.
* `check_spelling_interactive`, `check_spelling`, `which_misspelled`, and
`correct` allow the user to identify potentially misspelled words and
optionally suggest replacements.
* `random_data` & `random_sent` added to generate random sentence data sets and
vectors.
* `comma_spacer` added to ensure strings with commas contain a space after them.
* `check_text` added to identify potential problems in text.
* `replace_ordinal` added to convert ordinal representations of 1 through 100 to
strictly ordinal text (e.g., "1st" becomes "first").
* A vignette: `Cleaning Text & Debugging` was added to assist users with
cleaning and debugging problems in `qdap`.
* `pronoun_type`, and `subject_pronoun_type`, `object_pronoun_type` added to
examine usage of subject/object pronouns by grouping variable.
MINOR FEATURES
* `dplyr`'s chaining pipe imported for convenience. See
http://www.rdocumentation.org/packages/magrittr/functions/magrittr for details.
IMPROVEMENTS
* `wfm` gains a speed-up through generic classes and `tm` package integration
(`strip` is no longer used in `wfm`).
* `as.tdm.character` and `as.dtm.character` gain a speed boost with a `tm`
package integration.
* Added message to `as.data.frame.Corpus` for missing end-marks suggesting the
use of: `sent.split = FALSE`.
* `as.Corpus` family of functions didn't necessarily respect document names and
sometimes used numeric sequence instead. The introduction of a reader via
`tm::readTabular` has fixed this.
* `sentSplit` now gives warnings for text that may contain anomalies such as:
non-ASCII characters, factors, missing punctuation, empty cells, and no
alphabetic characters found.
* `read.transcript` now gives a warning when reading from a .docx file and the
separator (`sep`) used is still found in the text as this may indicate the
data did not split correctly.
* `dispersion_plot` now takes a named list of vectors of terms as the argument to
`match.terms`. The vectors are combined as a unified theme named with the
names of the list supplied to `match.terms`.
CHANGES
* `as.data.frame.Corpus`'s default value for `sent.split` is now `FALSE`.
* The `state` column in the `qdap::DATA2` data-set is now character (previously
factor).
CHANGES IN qdap VERSION 2.1.0
----------------------------------------------------------------
BUG FIXES
* `new_project` did not copy the .Rprofile over into the new project. This has
been fixed. Reference issue #184.
* `sentiment_frame` coerced words to factor. `stringsAsFactors = FALSE` has
been added to prevent this.
* `polarity` did not work on > 1 grams due to a bug in `sentiment_frame`
converting character to factor (thanks for the find @chewth). See GitHub
issue #185 for details.
NEW FEATURES
* `unique_by` added to allow the user to find terms unique to individual
elements of a grouping variable.
* `build_qdap_vignette` replaces the temporary place holder version of the
*Introduction to qdap vignette*. This function will replace the (1) HTML,
(2) source, & (3) R code found in `browseVignettes(package = 'qdap')`.
MINOR FEATURES
* `sub_holder` picks up a `alpha.type` argument that allows the user to specify
whether alpha or numeric keys should be used.
* `replace_number` picks up a `remove` argument that removes numbers from text.
IMPROVEMENTS
* `qheat` becomes a generic method. This means some of the internal function
class checking has been moved to individual methods for those classes.
Additionally, `qheat` now works with logical matrices/data.frames.
* The `tm` package compatibility functions have been renamed in a more R-ish
way and take the form of generic methods for specific classes. For example,
`df2tm_corpus` becomes `as.Corpus`. Here is a complete list of changes:
- `df2tm_courpus` is now `as.Corpus`
- `tm_corpus2df` is now `as.data.frame`
- `as.wfm` is now a generic method
- `tm_corpus2wfm` is now `as.wfm`
- `tm2qdap` is now `as.wfm`
- `tdm` is now `as.tdm` or `as.TermDocumentMatrix`
- `dtm` is now `as.dtm` or `as.DocumentTermMatrix`
CHANGES
* `colsplit2df` and `colpaste2df` no longer convert character columns to factor.
* `df2tm_corpus` is deprecated. It will be removed in a subsequent version of
`qdap`. Use `as.Corpus` instead.
* `tm_corpus2df` is deprecated. It will be removed in a subsequent version of
`qdap`. Use `as.data.frame` instead.
* `tm2qdap` is deprecated. It will be removed in a subsequent version of
`qdap`. Use `as.wfm` instead.
* `tm_corpus2wfm` is deprecated. It will be removed in a subsequent version of
`qdap`. Use `as.wfm` instead.
* `tdm` is deprecated. It will be removed in a subsequent version of `qdap`.
Use `as.tdm` or `as.TermDocumentMatrix` instead.
* `dtm` is deprecated. It will be removed in a subsequent version of `qdap`.
Use `as.dtm` or `as.DocumentTermMatrix` instead.
* The *Introduction to qdap* .Rmd vignette has been moved to an internal
directory. The HTML version is not built by default. This saves CRAN space
and time checking the package source. The file has been replaced with a
temporary place holder that contains instructions for building the actual
vignette. The user may also use the `build_qdap_vignette` directly.
* `qdap` incorporates the changes from the `tm` package version: 0.6:
http://cran.r-project.org/web/packages/tm/news.html Reference issue #187.
CHANGES IN qdap VERSION 2.0.0
----------------------------------------------------------------
The `qdapTools` package now houses several former qdap functions. While
`qdapTools` is a Dependency and all of these functions will be accessible to
the qdap user there is a break in backward compatibility if these functions
are included in code. For this reason this release is a major bump of qdap.
BUG FIXES
* `replace_number` did not replace single digits numbers. Spotted by Ben Bolker.
This behavior has been fixed and unit testing added for this function. See
issue # 178.
NEW FEATURES
* `sub_holder` added; this function holds the place for particular character
values, allowing the user to manipulate the vector and then revert the place
holders back to the original values.
* `Network` method added to make network plots of select qdap objects.
* `qtheme`, `theme_nightheat`, `theme_duskheat`, theme_norah`, `theme_cafe`,
`theme_grayscale`, `theme_badkitchen`, and `theme_hipster` added to style
`Network` plots.
* `polarity` picks up a `Network` method.
* `formality` picks up a `Network` method.
* qdap officially begins utilizing the `testthat` package for unit testing,
though only a few functions have begun the process, more will be added over
time.
MINOR FEATURES
IMPROVEMENTS
CHANGES
* The `qdapTools` package now houses the following former `qdap` functions:
`hash`, `%ha%`, `hash_look`, `hms2sec`, `id`, `lookup`, `%l%`, `%l+%`, `%l*%`,
`repo2github`, `sec2hms`, `text2color`, `url_dl`, `v_outer`, `list2df`,
`matrix2df`, `vect2df`, `list_df2df`, `list_vect2df`, `counts2list`,
`vect2list`, & `mtabulate`. These functions will continue to be available to
qdap users in interactive mode (`qdapTools` is a Dependency and thus these
functions are loaded into the workspace by default). This will allow this
bundle of functions to be used outside of qdap without calling the larger qdap
package per the request of Kirill Muller (see issue #165).
* As scheduled the `dissimilarity` function has been removed from the qdap
package to avoid conflict with the `tm` package. Use `Dissimilarity` function
instead.
CHANGES IN qdap VERSION 1.3.6
----------------------------------------------------------------
MINOR FEATURES
* `polarity` picks up a `constrain` argument that constrains the polarity values
to be between -1 and 1.
IMPROVEMENTS
* `polarity`'s equation now uses primes on the de-amplifiers before they're
confined to be >= -1. This avoids confusion in the indicator function that
took the de-amplifiers variable and returned the same variable.
* `dist_tab`'s frequency columns used a capital F in Freq. This was not
consistent across all column names and has been changed to lower case.
CHANGES
* `polarity_frame` is deprecated and will be removed in a subsequent release.
Please use `sentiment_frame` instead.
CHANGES IN qdap VERSION 1.3.5
----------------------------------------------------------------
BUG FIXES
* The An Introduction to qdap vignette contained a broken link in the tm
Package Compatibility section. This has been fixed. Also the reliance on
`Rgraphviz` from the vignette has been removed. This will eliminate CRAN
WARN in CRAN checks (for some OS) but not the note for `tm`'s reliance on
`Rgraphviz`.
* `polarity` reported the incorrect number of words for sentences containing
commas. This has been fixed (Max Ghenis).
NEW FEATURES
* `formality` picks up an `Animate` method.
* `end_mark_by` function added as a aggregated grouping version of `end_mark`.
MINOR FEATURES
* `raj.act.1POS` added. `raj.act.1POS` is a data set for Romeo and Juliet: Act 1
broken into parts of speech.
IMPROVEMENTS
* `discourse_map` picks up a `pause` argument that enables the user to pause
between plots in interactive mode.
CHANGES
CHANGES IN qdap VERSION 1.3.4
----------------------------------------------------------------
BUG FIXES
NEW FEATURES
* `gantt` and `gantt_wrap` (single facet) pick up and `Animate` method.
* `polarity` picks up an `Animate` method.
* `vertex_apply` and `edge` apply added to make uniform changes to lists of
`igraph` objects.
MINOR FEATURES
IMPROVEMENTS
* `discourse_map` picks up a `condense` argument that allows the user to
condense sequential rows for like grouping variable sub groups.
* `list_df2df` names now use a zero padded numeric portion for default names.
For example `c("L1", "L2", "L3", ... "L10")`, becomes
`c("L01", "L02", "L03", ... "L10")`.
CHANGES
CHANGES IN qdap VERSION 1.3.3
----------------------------------------------------------------
BUG FIXES
* `colpaste2df` dropped the column name for a single retained column when
`keep.orig = FALSE`. See GitHub issue #157 for more.
* `multigsub` (`mgsub`) would return `NA` for replacement of length 1 after the
addition of the `order.pattern` (used to prevent substrings from
replacing meta-strings) in version 1.3.2.
NEW FEATURES
* `phrase_net` function provides functioning similar to the Many Eyes
Phrase Net plot.
* `discourse_map` function provides a network mapping of the flow of discourse
between social actors. Function output is `Animate` ready as well. See
`?discourse_map` and http://trinker.github.io/qdap_examples/animation_dialogue
for more.
* `Animate` function added to convert select qdap outputs to an animated
sequence. See `?Animate.discourse_map` for more.
MINOR FEATURES
* `synonyms_frame` (`syn_frame`) added to allow the user to create a synonym
hash for the revamped `synonyms` function.
* `repo2github` function added to send a directory to GitHub upon first commit.
IMPROVEMENTS
* `new_project` has an improved directory structure and works with any version
of the `reports` package.
* `synonyms` function used the `env.syl` hash data from qdapDictionaries
internally. This approach could cause problems if used within other functions
in a package. It also limits the usability of synonyms. The `synonyms`
function picks up a `synonym.frame` argument that allows the user to specify
a synonym hash table. This can be created via the `synonyms_frame` function
(per a request from J. Aravind).
CHANGES
CHANGES IN qdap VERSION 1.3.2
----------------------------------------------------------------
This is a patch release to address the archiving of the `lsa` package.
BUG FIXES
* The **qdap-tm Package Compatibility** Vignette contained an error in the
Feinerer I, Hornik K, Meyer D (2008) reference (pages listed as 51-54 has been
corrected to pages 1-54 as well as incorrect journal). Caught by Kurt Hornik.
MINOR FEATURES
* `DocumentTermMatrix` and `TermDocumentMatrix` from the tm package pick up a
`Filter` method.
IMPROVEMENTS
* `multigsub` picks up an argument, `order.pattern`, to prevent substrings from
replacing meta-strings.
* The following data sets were added to qdapDictionaries package:
`Fry_1000`, `Leveled_Dolch`, `Dolch`
CHANGES
* The package `lsa` has been removed from Suggests field in the DESCRIPTIONN
file, examples, and vignettes.
CHANGES IN qdap VERSION 1.3.1
----------------------------------------------------------------
A version bump necessary for Re-Submission to CRAN.
CHANGES
* `new_project` was reconfigured with the old code that does not require the
newest version of the reports package.
CHANGES IN qdap VERSION 1.3.0
----------------------------------------------------------------
BUG FIXES
* `read.transcript` could leave a QDAP_PLACE_HOLDER behind if a colon was found
in the person column. This behavior has been fixed.
* `word_cor`'s plotting method threw an error if a word did not have any words
above the r threshold. This behavior has been corrected.
* `Filter` overwrote a base R function; this has been fixed per Joshua Ulrich.
* `scores.polarity`'s print method would return an error if columns were not
indexed yet were rounded. For instance, the following threw an error:
`scores(with(sentSplit(DATA, 4), polarity(state, person)))[, 1:4]`
This behavior has been fixed.
NEW FEATURES
* qdap adds an HTML vignette to better explain the intended work flow and
function use for the package. Use `browseVignettes(package = "qdap")` to
open.
* qdap adds a PDF vignette to describe the compatibility and navigation between
qdap and the `tm` packages. Use `browseVignettes(package = "qdap")` to open.
MINOR FEATURES
IMPROVEMENTS
* `apply_as_df` picks up a `stopwords` and `filter` arguments that allows the
user to remove stopwords and min/max length words.
* `plot.word_cor` picks up the argument `ncol` that allows the user to specify
the number of columns used. This uses `ggplot2`'s `facet_wrap` rather than
`facet_grid` (which is the default if `ncol =NULL`).
* `name2sex` relied upon having qdapDictionaries loaded. This could be an issue
if the function were used internally. The user now supplies a dictionary of
names and probabilities.
* `df2tm_corpus` gains a `demographics.vars` argument that allows the user to
add demographic information to the resulting corpus `DMetaDat`.
* `tm_corpus2df` gains the ability to convert `DMetaDat` into demographic
data.frame columns.
CHANGES
CHANGES IN qdap VERSION 1.2.0
----------------------------------------------------------------
BUG FIXES
NEW FEATURES
* `Filter` added to give the ability to provide a range of character
lengths to filter from a `wfm` object.
* `scores` generic method added to view scores from select qdap objects.
* `counts` generic method added to view counts from select qdap objects.
* `proportions` generic method added to view proportions from select qdap
objects.
* `preprocessed` generic method added to view preprocessed data from select qdap
objects.
* `apply_as_df` added to allow the user to apply qdap functions to a Corpus
directly.
MINOR FEATURES
* `tm_corpus2wfm` added to quickly convert from a **tm** package `Corpus` to a qdap
`wfm` object.
* `as.wfm` added as a means to attempt to coerce a matrix to a `wfm` object.
* `%l+%` added as a counterpart to `%l%` that assumes `missing = NULL`.
* `%bs%` added as quick counterpart to `boolean_search` for indexing.
IMPROVEMENTS
* `df2tm_corpus` now sets metaData information for ID and creator (based on)
`Sys.info()["user"]`.
* `matrix2df` now accepts a simple_triplet_matrix object as well.
* `word_cor` output that was a list (not a correlation matrix) did not have a
plot method. The plot method for `word_cor` now handles both matrices and the
list of correlations.
* `rm_row` picks up the `contains` argument that allows the user to search for,
and remove rows of, within the string, not just the beginning.
* `read.transcript` now handles multiple character spaces as an argument to
`sep` when `text` argument is used.
CHANGES
* `dissimilarity` has been renamed to `Dissimilarity` to prevent tm package
conflicts. The old version has been deprecated and will be removed in a the
next version (minor or major) push to CRAN.
CHANGES IN qdap VERSION 1.1.0
----------------------------------------------------------------
A version bump necessary for Re-Submission to CRAN.
CHANGES
* Downgraded the version requirement for the reports package to
reports (>= 0.1.2) in order to upload to CRAN. reports (>= 0.2.0) is not yet
available on CRAN.
CHANGES IN qdap VERSION 1.0.0
----------------------------------------------------------------
The word lists and dictionaries in `qdap` have been moved to `qdapDictionaries`.
Additionally, many functions have been renamed with underscores instead of the
former period separators. These changes break backward compatibility. Thus
this is a **major** release (ver. 1.0.0).
It is the general practice to deprecate functions within a package before
removal, however, the number of necessary changes in light of qdap being
relatively new to CRAN, made these changes sensible at this point.
BUG FIXES
* `qheat`'s argument `by.column = FALSE` resulted in an error. This behavior
has been fixed.
* `question_type` did not work because of changes to `lookup` that did not
accept a two column matrix for `key.match`. See GitHub issue #127 for more.
* `combo_syllable.sum` threw an error if the `text.var` contained a cell with an
all non-character ([a-z]) string. This behavior has been fixed.
* `todo` function created by `new_project` would not report completed tasks if
`report.completed = TRUE`.
* `termco` and `termco.d` threw an error if more than one consecutive regex
special character was passed to `match.list` or `match.string`. See GitHub
issue #128 for more.
* `trans.cloud` threw an error if a single list with a named vector was passed
to `target.words`. This behavior has been fixed.
* `sentSplit` now returns the "tot" column when `text.place = "original"`.
* `all_words` output dataframe FREQ column class has been changed from factor to
numeric. Additionally, the WORDS column prints using `left.just` but retains
traditional character properties (print class added). `all_words` also picks
up `apostrophe.remove` and `ldots` (for `strip`) arguments.
* `gantt_plot` did not handle `fill.vars`, particularly if the fill was nested
within the `grouping.vars`. This behavior has been fixed with corresponding
examples added.
* `url_dl` - Downloaded an empty file when not using a Dropbox key. This
behavior has been fixed.
* The `cm_code.` family of functions had a bug in the output due to
`cm_long2dummy` and `cm_dummy2long`'s handling of stretching spans. This has
been corrected.
* `cm_code.exclude` did not output the correct excluded spans. This behavior
has been corrected.
* The use of `comment` to convey object characteristics has been replaced with
the use of `class`.
* `question_type` did not include question words ending in 'd as part of the
category. For instance "How'd you like it?" was not classified as a how
question.
* `beg2char` would not include the `char` if `include = TRUE` and `noc = 1`.
* `cm_range2long` returned `NA`s for vectors containing multiple single values.
See GitHub issue #144 for more.
* `termco` family of functions did not handle `NA` values. This has been fixed.
(Matt Williamson) See GitHub issue #147 for details.
* `pos` threw an error for vectors of length 1. This has been fixed (Kurt
Hornik). See GitHub issue #150 for details.
* `formality` threw an error for vectors of length 1. This has been fixed. (Kurt
Hornik) See GitHub issue #151 for details.
NEW FEATURES
* The `cm_xxx2long` family of functions (`cm_df2long`, `cm_range2long` and
`cm_time2long`) now have a generic wrapper, `cm_2long`, to generate the long
formats.
* `hash_look` (and `%ha%`) a counterpart to `hash` added to allow quick access
to a hash table. Intended for use within functions or multiple uses of the
same hash table, whereas `lookup` is intended for a single external (non
function) use which is more convenient though could be slower.
* `boolean_search`, a Boolean term search function, added to allow for indexed
searches of Boolean terms.
* `trans_context` is a printing function desired to grab the context (n rows
before and after) an event (an index from a vector of indices). The function
prints the indices around the episode from a transcript to the console or a
.csv, .xlsx, .txt, or .doc file.
* `colpaste2df` is a wrapper for `paste2` that pastes dataframe columns together
and outputs a dataframe.
* `colcomb2class` quickly combines columns for number of qdap classes including
output from: `termco`, `question_type`, `pos_by`, and `character_table`.
* `lview` a function to unclass a list output that has a special print method
that returns only a portion of the output. `lview` re-classes to "list".
* `word_cor` added to find words within grouping variables that are associated
based on correlation.
* `tm2qdap` a function to convert `"TermDocumentMatrix"` and
`"DocumentTermMatrix"` to a `wfm` added to allow easier integration with the
`tm` package.
* `apply_as_tm` a function to allow functions intended to be used on the `tm`
package's `TermDocumentMatrix` to be applied to a `wfm` object.
* `tm_corpus2df` and `df2tm_corpus` added to convert a tm package corpus to a
dataframe for use in qdap or vice versa.
* `tdm` and `dtm` are now truly compatible with the `tm` package. `tdm` and
`dtm` produce outputs of the class `"TermDocumentMatrix"` and
`"DocumentTermMatrix"` respectively. This change (coupled with the renaming
of `stopwords` to `rm_stopwords`) should make the two packages logical
companions and further extend the qdap package to integrate with the many
packages that already handle `"TermDocumentMatrix"` and
`"DocumentTermMatrix"`.
* `cm_distance` now uses resampling of data from the null model to generate
pvalues for the mean code distances. Useful for determining if an association
(small distance) between codes is likely to happen if the null is true.
* `dispersion_plot` added to enable viewing of word dispersion through
discourse.
* `word_proximity` added to compliment `dispersion_plot` and `word_cor`
functions. `word_proximity` gives the average distance between words in
the unit of sentences.
MINOR FEATURES
* `url_dl` now takes quoted string urls supplied to ... (no url argument is
supplied)
* `condense` is a function that condense dataframe columns that are a list of
vectors to a single vector of strings. This outputs a dataframe with
condensed columns that can be wrote to csv/xlsx.
* `mcsv_w` now uses `condense` to attempt to attempt to condense columns that are
lists of vectors to a single vector of strings. This adds flexibility to
`mcsv_w` with more data sets. `mcsv_w` now writes lists of dataframes to
multiple csvs (e.g., the output from `termco` or `polarity`). `mcsv_w` picks
up a dataframes argument, an optional character vector supplied in lieu of
\ldots that grabs the dataframes from an environment (default id the Global
environment).
* `ngrams` now has an argument ellipsis that passes further arguments supplied
to `strip`
* `dtm` added to compliment `tdm`, allowing for easier integration with other R
packages that utilize `tdm`/`dtm`.
* `dir_map` picks up a `use.path` argument that allows the user to specify a
more flexible path to the created pre-formed `read.transcript` scripts based
on something like `file.path(getwd(), )`. This means portability of code on
different machines.
* `polarity_frame` a function to make a hash environment lookup for use with the
`polarity` function.
* `DATA.SPLIT` a `sentSplit` version of the `DATA` data-set has been added to
qdap.
* `gantt_plot` accepts NULL for `grouping.var` and figures for "all" rows as a
single grouping var.
* `replace_number` now handles 10^47 digits compared to 10^14 previously.
* The `new_project` function gains a `github` argument that optionally sends the
repo to GitHub public account upon creation.
* `qheat`, `polarity.plot` and `formality.plot` pick up the argument `plot`
which optionally suppresses the plotting. This is useful if the user is
operating in **knitr**, **sweave**, etc. and wishes to alter/add onto the plot.
* `lookup` now takes `missing = NULL`. This results in the original values in
`terms` corresponding to the missing elements being retained.
* `cm_time.temp` picks up a `grouping.var` argument that works similarly to
`cm_range.temp`'s `grouping.var`. `cm_time.temp` also takes hour values for
`start` and `end` as in `end = "01:22:03"`.
* `gantt_rep` picks up a generic `plot` method.
* Functions in the `cm_code.xxx` and `cm_xxx2long` pick up a generic plot method
that utilizes `gantt_wrap` to plot a Gantt plot of the span data.
* Functions in the `cm_code.xxx` and `cm_xxx2long` pick up a generic summary
method. This summary method has its own plot method that utilizes `qheat` to
plot a heatmap of the summary statistics. The generic print method
(`print.sum_cmspans`) is useful for output intended for publication.
* `qheat` picks up a `facet.vars` argument that allows a character vector of
length 1 or 2 to facet by.
* `question_type` gives the indices of questions via `$inds`.
* `colsplit2df` not splits multiple columns to match the capabilities of
`colpaste2df`.
* `sentSplit` now handles repeated measures and picks up a turn of talk plot
method.
* `tot_plot` now handles repeated measures and `grouping.var` to be nested