-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.Rmd
365 lines (349 loc) · 20 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
---
author: "Chris Hubertus Joseph Hartgerink"
bibliography: [library.bib]
description: "PhD dissertation by CHJ Hartgerink, written during 2014-2019, mostly at Tilburg University."
documentclass: book
classoption: a5paper
github-repo: chartgerink/dissertation
link-citations: true
biblio-style: "apalike"
title: "Contributions towards understanding and building sustainable science"
always_allow_html: yes
output:
bookdown::gitbook:
config:
toc:
collapse: subsection
scroll_highlight: yes
before: null
after: null
toolbar:
position: fixed
edit : null
download: null
search: yes
fontsettings:
theme: white
family: sans
size: 2
sharing:
facebook: no
twitter: yes
github: yes
google: no
linkedin: no
weibo: no
instapaper: no
vk: no
all: ['facebook', 'google', 'twitter', 'github', 'linkedin', 'weibo', 'instapaper']
bookdown::pdf_book:
keep_tex: yes
includes:
in_header: preamble.tex
---
# Prologue {-}
```{r echo = FALSE}
suppressPackageStartupMessages(require(magrittr))
```
The history and practice of science is convoluted
[@isbn:9781846142109], but when I was a student it was taught to me in a
relatively uncomplicated manner. Among the things I remember from my
high school science classes are how to convert a metric distance into
Astronomical Units (AUs) and that there was something called the
research cycle (I always forgot the separate steps and their order,
which will ironically be a crucial subject of this
dissertation). Those classes presented things such as the AU and the
empirical cycle as unambiguous truths. In hindsight, it is difficult
to imagine these constructed ideas as historically unambiguous. For
example, I was taught the AU as simple arithmetic while that
calculation implies accepting a historically complex process full of
debate on how an AU should be defined
[@doi:10.1017/s1743921305001365]. As such, that calculation was
path-dependent, similar to how the history and practice of science in
general is also path-dependent [@isbn:0692094187;@gelman-forking].
Scientific textbooks understandably present a distillation of the
scientific process. Not everyone needs the (full) history of
discussions after broad consensus has already been reached. This is a
useful heuristic for progress but also minimizes (maybe even
belittles) the importance of the process [@isbn:0692094187]. As such,
textbook science [vademecum science; @isbn:9780226253251], with which
science teaching starts, provides high certainty, little detail, and
provides the breeding ground for a view of science as producing
certain knowledge. Through this kind of teaching, storybook images of
scientists and science might arise, often as the actors and process of
discovering absolute truths rather than of uncertain and iterative
production of coherent and consistent knowledge. Such storybook images
likely result in the impression that scientists versus non-scientists
are more objective, rational, skeptical, rigorous, and ethical, even
after taking into account educational level
[@doi:10.1080/08989621.2016.1268922].
Scientific research articles tend to provide more details and less
certainty than scientific textbooks, but still present storified
findings that simplify a complicated process into a single, linear
narrative [particularly salient in tutorials on writing journal
publications; @doi:10.1017/cbo9780511807862.002]. Compared to
scientific textbooks, which present a narrative across many studies,
scientific articles provide a narrative across relatively few
studies. Hence, scientific articles should be relatively better than
scientific textbooks for understanding the validity of findings
because they get more space to nuance, provide more details, and
contextualize research findings. Nonetheless, the linear narrative of
the scientific article distills and distorts a complicated non-linear
research process and thereby provides little space to encapsulate the
full nuance, detail, and context of findings. Moreover, storification
of research results requires flexibility, where its manifestation in
the flexibility of analyses may be one of the main culprits of false
positive findings [i.e., incorrectly claiming an effect;
@doi:10.1371/journal.pmed.0020124] and detracts from accurate
reporting. The lack of detail and (excessive) storification go hand in
hand with the misrepresentation of event chronology to present a more
comprehensible narrative to the reader and researcher. For example,
breaks from a main narrative (i.e., nonconfirming results) may be
excluded from the reporting. Such misrepresentation becomes
particularly problematic if the validity of the presented findings
rests on the actual and complete order of events --- as it does in the
prevalent epistemological model based on the empirical research
cycle [@isbn:9789023228912]. Moreover, the storification within
scholarly articles can create highly discordant stories across
scholarly articles, leading to conflicting narratives and confusion in
research fields or news reports and, ultimately, less coherent
understanding of science by both general- and specialized audiences.
When I started as a psychology student in 2009, I implicitly perceived
science and scientists in the storybook way. I was the first in my
immediate family to go to university, so I had no previous informal
education about what 'true' scientists or 'true' science looked like
--- I was only influenced by the depictions in the media and popular
culture. In other words, I thought scientists were objective,
disinterested, skeptical, rigorous, ethical (and predominantly
male). The textbook- and article based education I received at the
university did not disconfirm or recalibrate this storybook image and,
in hindsight, might have served to reinforce it (e.g., textbooks
provided a decontextualized history that presented the path of
discovery as linear, 'the truth' as unequivocal, multiple choice exams
which could only receive correct or wrong answers, and certified
stories in the form of peer reviewed publications). Granted, the
empirical scientist was warranted the storybook qualities exactly
*because* the empirical research cycle provided a way to overcome
human biases and provided grounds for the widespread belief that
search for 'the truth' was more important than individual gain.
As I progressed throughout my science education, it became clear to me
how naive the storybook image of science and the scientist was through
a series of events that undercut the very epistemological model that
granted these qualities. As a result of these events, I had what I
somewhat dramatically called two 'personal crises of epistemological
faith in science' (or put plainly: wake up calls). These crises
strongly correlated with several major events within the psychology
research community and raised doubts about the value of the research I
was studying and conducting. Both these crises made me consider
leaving scientific research and I am sure I was not alone in
experiencing this sentiment.
My first, local crisis of epistemological faith was when the
psychology professor who got me interested in research publicly
confessed to having fabricated data throughout his academic career
[@isbn:9789044623123]. Having been inspired to go down the path of
scholarly research by this very professor and having worked as a
research assistant for him, I doubted myself and my abilities and
asked whether I was critical enough to conduct and notice valid
research. After all, I had not had even an inch of suspicion while
working with him. Moreover, I wondered what to make of my interest in
research, given that the person who got me inspired appeared to be
such a bad example to model myself after. This event also unveiled to
me the politics of science and how validity, rigor, and 'truth'
finding was not a given [see for example
@isbn:9780671447694]. Regardless, the self-reported prevalence of
fraudulent behaviors among scientists [viz. 2%;
@doi:10.1371/journal.pone.0005738] was sufficiently low to not
undermine the epistemological effort of the scientific collective
(although it could still severely distort it). Ultimately, I
considered it unlikely that the majority of researchers would be
fraudsters like this professor and simply realized that research could
fail at various stages (e.g., data sharing, peer review). As a result,
I became more skeptical of the certified stories in peer reviewed
journals and in my own and other's research. I ultimately shifted my
focus towards studying statistics to improve research.
A second, more encompassing epistemological crisis arose when I took a
class that indicated that scientists undermine the empirical research
cycle at a large scale. These behaviors were sometimes intentional,
sometimes unintentional, but often the result of misconceptions and
ill procedures in order to play the game of getting published
[@doi:10.1177/1745691612459060]. More specifically, this
epistemological crisis originated from learning about how loose
application of statistical procedures could produce statistically
significant results from pretty much anything [e.g.,
@doi:10.1177/0956797611417632]. Additionally, these behaviors result
in biased publication of results [@doi:10.1007/bf01173636] through the
invisible (and often unaccountable) hand of peer review
[@cogprints1646] that in itself suffers from various
misconceptions. This combination potentially leads to a vicious cycle
of overestimated (and sometimes false positive) effects, leading to
underpowered research that is selectively published, leading to
overestimated effects and underpowered research, and so on until that
cycle gets disrupted. These issues are not necessarily new and have
been discussed for over 40 years in some way or form
[@doi:10.1037/0033-2909.105.2.309;@doi:10.1037/h0045186;@doi:10.1037/0033-2909.86.3.638;@doi:10.2466/03.11.pms.112.2.331-348;@doi:10.1207/s15327957pspr0203_4;@doi:10.1056/nejm199310143291613]. Given
this longstanding vicious cycle, it seemed unlikely the issues in
empirical research would resolve themselves --- they seemed more
likely to be further exacerbated if left unattended. Progress on these
issues would not be trivial or self-evident, given that previous
awareness subsided and attempts to improve the situation did not stick
in the long run. It also indicated to me that the reforms needed had
to be substantial, because the improvements made over the last decades
remained insufficient [although the historical context is highly
relevant, see @doi:10.1177/1745691615609918]. Because of the failed
attempts in the past and the awareness of these issues throughout the
last six years or so, my epistemological worries are ongoing and
oscillate between pessimism and optimism for improvement.
Nonetheless, these two epistemological crises caused me to become
increasingly engaged with various initiatives and research domains to
actively contribute towards improving science. This was not only my
personal way of coping with these crises and more specific incidents,
it also felt like an exciting space to contribute to. In late 2012, I
was introduced to the concept of Open Science for my first big
research project. It seemed evident to me that Open Science was a
great way to improve the verifiability of research [see also
@doi:10.15200/winn.144232.26366]. The Open Science Framework had
launched only recently [@doi:10.31237/osf.io/t23za], which is where I
started to document my work openly. I found it scary, difficult, and
did not know where to start simply because I had never been taught to
do science this way nor did anyone really know how. It led me to
experiment with these new tools and processes to find out the
practicalities of actually making my own work open, and I have
continued to do so ever since. It made me work in a more reproducible,
open manner, and also led me to become engaged in what are often
called the Open Access and Open Science movements. Both these
movements aim to make knowledge available to all in various ways,
going beyond dumping excessive amounts of information but also making
it comprehensible by providing clear documentation to for example
data. Not only are the communities behind these movements supportive
in educating each other in open practices, they also activated me to
help others see the value of Open Science and how to implement it [my
first steps taken in @doi:10.6084/m9.figshare.928315.v2]. Through
this, activism within the realm of science became part of my daily
scientific practice.
Actively improving science through doing research became the main
motivation for me to pursue a PhD project. Initially, we set out to
focus purely on statistical detection of data fabrication (linking
back to my first epistemological crisis). The proposed methods to
detect data fabrication had not been tested widely nor validated and
there was a clear opportunity for a valuable contribution. Rather
quickly, our attention widened towards a broader set of issues,
resulting in a broad perspective on issues in science by looking at
not only data fabrication, but also at questionable research
practices, statistical results and the reporting thereof, complemented
by thinking about incentivizing rigorous practices. This dissertation
presents the results of this work in two parts.
Part 1 of this dissertation (chapters 1-6) pertains to research on
understanding and detecting the tripartite of research practice (the
good [responsible], the bad [fraudulent], and the ugly [questionable]
practices so to speak). Chapter 1 reviews literature on research
misconduct, questionable research practices, and responsible conduct
of research. In addition to providing an introduction to these three
topics in a systematic way by asking 'What is it?', 'What do
researchers do?' and 'How can we improve?', the chapter also proposes
a practical computer folder structure for transparent research
practices in an attempt to promote responsible conduct of research.
In Chapter 2, I report the reanalysis of data indicating widespread
$p$-hacking across various scientific domains
[@doi:10.1371/journal.pbio.1002106;@doi:10.5061/dryad.79d43]. The
original research was highly reproducible itself, but slight and
justifiable changes to the analyses failed to confirm the finding of
widespread $p$-hacking across scientific domains. This chapter offered
an initial indication of how difficult it is to robustly detect
$p$-hacking. In an attempt to improve the detection and estimation of
$p$-hacking, Chapter 3 replicated and extended the findings from
Chapter 2. We replicated the analyses using an independent data set of
statistical results in psychology [@doi:10.3758/s13428-015-0664-2] and
found that $p$-value distributions are distorted through reporting
habits (e.g., rounding to two decimals). Additionally, we set out to
create and apply new statistical models in an attempt to improve
detection of $p$-hacking. Chapter 4 focuses on the opposite of false
positive results, namely false negative results. Here we argue that,
based on the published statistically nonsignificant results in
combination with typically small sample sizes, researchers are letting
a lot of potential true effects slip under their radar if
nonsignificant findings are naively interpreted as true zero
effects. We introduce the adjusted Fisher method for testing the
presence of non-zero true effects among a set of statistically
nonsignificant results, and present three applications of this
method. In Chapter 5 I report on a data set containing over half a
million statistical results extracted with the tool `statcheck` from
the psychology literature. This chapter, in the form of a data paper,
explains the methodology underlying the data collection process, how
the data can be downloaded, that there are no copyright restrictions
on the data, and what the limitations of the data are. This data set
was documented and shared for further research on understanding the
reporting and reported results [original research using these data has
already been conducted; @doi:10.1371/journal.pone.0182651]. Chapter 6
presents results on two studies where we tried to classify genuine and
fabricated data solely using statistical methods. In these two
studies, we relied heavily on openly shared data from two Many Labs
projects
[@doi:10.1027/1864-9335/a000178;@doi:10.1016/j.jesp.2015.10.012] and
had a total of 67 researchers fabricate data in a controlled setting
to determine which statistical methods distinguish between genuine-
and fabricated data the best.
Part 2 of this dissertation (chapters 7-9) pertains to practical ways
to improve the epistemological sustainability of
science. Epistemological sustainability of science pertains to both
the reliability of the knowledge produced as the longevity of the
system that produces it. Chapter 7 specifically focuses on data
retrieval from empirical research articles presenting vector
images. We developed and tested software to this end, which is a
promising way to mitigate the effect of rapidly decreasing odds of
data retrieval as a paper gets older [@doi:10.1016/j.cub.2013.11.014].
In Chapter 8 we present a conceptual redesign of the scholarly
communication system based on modules, focusing on how
networked scholarly communication might facilitate improved research
and researcher evaluation. This conceptual redesign takes into account
the issues of restricted access, researcher degrees of freedom,
publication biases, perverse incentives for researchers, and other
human biases in the conduct of research. The basis of this redesign is
to shift from a reconstructive and text-based research article into a
decomposed set of research modules that are communicated continuously
and contain information in any form (e.g., text, code, data, video).
Chapter 9 extends this new form of scholarly communication in its
technical foundations and contextualizes it in the library- and
information sciences (LIS). From LIS, five key functions of a
scholarly communication system emerge: registration, certification,
preservation, awareness, and incentives
[@roosendaal1998;@doi:10.1045/september2004-vandesompel]. First, I
extend how the article-based scholarly communication system takes a
narrow and unsatisfactory approach to the five functions. Second, I
extend how new Web protocols, when used to implement the redesign
proposed in Chapter 8, could fulfill the five scholarly communication
functions in a wider and more satisfactory sense. In the Epilogue, I
provide a high level framework to inform radical change in the
scientific system, which brings together all the lessons from this
dissertation.
The order of the chapters in this dissertation does not reflect the
exact chronological order of events. Table \@ref(tab:overview)
re-sorts the chapters in the chronological order and provides
additional information for each chapter. More specifically, it
includes a direct link to the collection of materials underlying that
chapter (if relevant), whether the chapter was shared as a preprint,
and the associated peer reviewed article (if any). If published,
the chapters in this dissertation may be slightly different in word
use or formatting, but contain substantively the same content. These
are additional aspects to the chapters that attempt to improve the
reproducibility of the chapters, in order to prevent the issues
causing my epistemological crises.
```{r overview, echo = FALSE}
df <- read.csv('assets/tables/ch1-table.csv')
df[,1] <- df[,1] - 1
names(df)[2] <- 'Data package'
df <- df[c(-8,-11,-12),]
if (!knitr::is_html_output()) {
knitr::kable(df, format = 'latex',
booktabs = TRUE,
caption = 'Chronologically ordered dissertation chapters, supplemented with identifiers data package, preprint, and peer reviewed article.') %>%
kableExtra::landscape() %>%
kableExtra::kable_styling(
latex_options = c("striped"), position = "center")
} else {
knitr::kable(df,
booktabs = TRUE, row.names = FALSE,
caption = 'Chronologically ordered dissertation chapters, supplemented with identifiers data package, preprint, and peer reviewed article.') %>%
kableExtra::kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive", full_width = F))
}
```