-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsec0_meta.tex
145 lines (115 loc) · 6.82 KB
/
sec0_meta.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Abstracts & etc
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\keywords{Process discovery, Machine learning, Event logs}
%% Abstract text
\begin{abstractpage}[english]
% - super quick intro to problem
Diagnostics and real-time data is crucial to running modern software operations.
Different stakeholders need different granularities of information displayed in a quick and informative fashion.
In addition, software operations change rapidly which makes it necessary for the diagnostic systems to be able to adapt quickly.
\vspace{0.8ex}
% - describe methods
In this thesis, I present a method to analyze past event logs to discover the underlying process models;
I describe methods to combine these models with real-time information to create visualizations
that are useful to even non-technical users;
I leverage the models to show the users predictions about future events;
I explore the possibilities of using machine learning to improve the accuracy of these predictions;
and, I examine how the models can be used in conjunction with real-time events to automatically notify relevant people about any faults or anomalies in a system.
\vspace{0.8ex}
% - describe conclusion
I developed a system fulfilling the requirements of this project that was released to production.
I tested the machine learning approach against a simple statistical model and, based on the testing, the machine learning models did not significantly increase the prediction accuracy in the data sets used.
Despite the lack of benefits of the machine learning component, the system was successful and valuable to the end-users
\end{abstractpage}
\newpage
%% Abstract in Finnish.
% Analyzing event logs to discover process models and to detect anomalies in real-time
\thesistitle{Tapahtumalokien analysointi prosessien mallintamiseen ja poikkeavuuksien havaitsemiseen reaaliajassa}
\supervisor{Professori\ Petri Vuorimaa}
\advisor{M.Sc.\ Tao Zhu}
\advisor{M.Sc.\ Rafael Forsbach Valle}
\department{Tietotekniikan laitos}
\degreeprogram{Computer, Communication and Information Sciences}
\professorship{Tietotekniikka}
\date{13.9.2017}
%% Avainsanat
\keywords{Prosessitutkinta, Koneoppiminen, Tapahtumalokit}
%% Tiivistelmän tekstiosa
\begin{abstractpage}[finnish]
Diagnostiikka ja reaaliaikainen informaation on elintärkeää nykyaikaisten ohjelmistoprojektien hallinnassa.
Projektin eri osapuolilla on erilaisia tarpeita tiedon koostamiseen ja sen esittämiseen.
Toisille yksinkertainen tieto on tärkeää, toisia suuret suuntaviivat hyödyttävät enemmän.
Lisäksi ohjelmisto-operaatiot muuttuvat nopeasti, ja diagnostiikan on sopeuduttava muutoksiin nopeasti.
\vspace{0.8ex}
Tässä diplomityössä esittelen tavan analysoida menneitä tapahtumalokitietoja niiden taustalla olevien prosessien mallintamiseksi.
Kuvailen tapoja käyttää näitä malleja reaaliaikaisen informaation visualisointiin tavalla, josta on hyötyä jopa maallikkokäyttäjille.
Käytän malleja hyväksi näyttääkseni käyttäjille ennusteita tulevista tapahtumista.
Kokeilen myös mikäli ennustusten tarkkuutta voisi parantaa hyödyntämällä koneoppimista.
Lopuksi tarkastelen kuinka malleja voi hyödyntää reaaliaikaisten tapahtumien analysoinnissa mahdollistaen automaattisten virheilmoitusten lähettämisen niistä kiinnostuneille ihmisille.
\vspace{0.8ex}
Projektin aikana kehitin järjestelmän joka täytti projektin alussa määritellyt vaatimukset.
Tutkin koneoppimisen tehokkuutta verraten sitä yksinkertaisiin tilastollisiin malleihin.
Testauksessa koneoppiminen ei merkittävästi lisännyt ennustusten tarkkuutta.
Tästäkin huolimatta järjestelmä vastasi sille asetettuja tarpeita ja se julkaistiin loppukäyttäjille, jotka kokivat projektin onnistuneeksi ja arvokkaaksi.
\end{abstractpage}
%% Preface
\mysection{Preface}
This thesis work was carried out at the Microsoft headquarters at the campus in Redmond, USA.
It has been written as the last part of my Master's degree.
I would like to thank the people at Microsoft for the opportunity.
Thank you Kristin and Britt at recruiting for figuring out the unusual circumstances for me.
Thank you Tao and Rafael for mentoring and guiding me during my work.
I would also like to thank the rest of the ``ICH Catalog Health'' team for dealing with my constant questions
and helping me feel at home in my team.
Great thanks to professor Petri Vuorimaa for being my supervisor and for helping me with the practicalities.
I would like to thank my friends for supporting me.
I would like to thank my girlfriend Kara for her unending support and understanding during this stressful time.
It is not an understatement that without you I would not be where I am today.
Thank you Max and David for your endless support and for proofreading.
Thank you Xane, Jorge, Cody, and other friends and birds for keeping me sane.
Lastly, special thanks to all of my family for standing by me and dealing with my travels and schedules.
\vspace{1cm}
\noindent
Espoo, September 13, 2017
\vspace{5mm}
\noindent
Teemu T.\ Vartiainen
\newpage
\thesistableofcontents
\mysection{Symbols and Abbreviations}
\subsection*{Symbols}
\begin{tabularx}{\linewidth}{l X}
$\mathbb{B}(X)$ & a set of multi-sets over $X$ \\
$A^*$ & Kleene star, the set of all strings over symbols in $A$ \\
$\langle a,b,c \rangle$ & ordered trace containing events $a$, $b$, and $c$ \\
${}^\bullet x$ & \emph{pre-set}, the set of all nodes having a directed arc to $x$ \\
$x^\bullet$ & \emph{post-set}, the set of all nodes having directed arc from $x$ \\
$a \rightarrow_L b$ & $b$ directly follows $a$ in event log $L$ \\
$a \leftarrow_L b$ & $b$ directly precedes $a$ in event log $L$ \\
$a \#_L b$ & $a$ and $b$ are unrelated in event log $L$ \\
$a_i ||_L a_j$ & $a$ and $b$ are parallel in event log $L$ \\
$m_{ij} \in M$ & matrix cell value, row $i$ column $j$ of matrix $M$ \\
\end{tabularx}
\subsection*{Abbreviations}
\begin{tabular}{ll}
BDT & boosted decision tree \\
BFGS & Broyden–Fletcher–Goldfarb–Shanno algorithm \\
BI & business intelligence \\
ETA & estimated time of arrival \\
HTTP & hypertext transfer protocol \\
ID & identifier \\
JSON & JavaScript object notation \\
L-BFGS & limited-memory BFGS \\
MAE & mean absolute error \\
MBI & medium business intelligence \\
ML & machine learning \\
MSE & mean square error \\
PII & personally identifiable information \\
RMSE & root (of) mean square error \\
RQ & research question \\
SLA & service level agreement \\
SQL & structured query language \\
TP & top percentile \\
WF & workflow \\
\end{tabular}