-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
139 lines (107 loc) · 6.58 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
== Building and running ========================================================
Build the program with "make". It assumes that "uuagc" is installed. The "futil"
library in the "lib" directory might need to be build and installed using
"cabal" first.
Running the program requires the "graphviz" package to be installed!
== Source overview =============================================================
/lib
/futil Helper library
/samples Sample PHP programs and output generated
by our analyzer.
/src
Main.hs Main module
/MF
Analysis.hs Generic embelished monotone framework.
Includes the fixed point solver.
DirectlyRelevantVariables.hs A monotone analysis used by the program
slicing to find the "directly relevant
variables" as described in [1].
Program.hs An AST for desugared PHP. Helper functions
for the Program AST.
ProgramSlicing.hs The actual program slicing algorithm
(described below).
/PHP
/Simple
SimpleAST.ag Attribute grammar working on desugared
PHP. Responsible for constructing the
(inter- and intra-procedural) control
flow.
AST2Simple.hs Converts the concrete syntax tree from
the php parser to the abstract syntax
tree defined in SimepleAST.ag
== Limitations and known issues ================================================
As of the moment of writing we were aware of the following limitations and
issues with our anaylizer:
* All variables are assumed to be local and unshadowed.
* Arrays and objects, switch statements are not supported.
* Calls to "undefined" functions (i.e. functions not defined in the PHP module
being analyzed, e.g library functions) do not make its return value depend
on the actual parameters.
Language Features that are supported:
* Ternary ifs
* Increment operators
* Assign and increment operator (+=)
igh Level Overview
== Program Slicing algorithm ===================================================
The actual program slicing analysis will be executed on the simplified and de-sugared
AST build in the preprocessing phase. The program slicing analysis has two step:
1. First, the directly relevant variables are calculated. In backwards program slicing,
a variable is directly relevant if it is used in the right hand side of and
assignment in which the left hand side is another relevant variable. Initially,
the slicing criterium is used to start the analysis. To compute the directly
relevant variables an instance of the monotone framework called DirectlyRelevantVariables
is solved. The result of the solve function has the type:
Map Label (Map CallContext (Set SymbolType))
This indicates that support for procedures is added by allowing inter procedural
directly relevant variables analysis within the monotone framework.
2. In step two the dependent control statements like if and while statements are
taken into account. To do this, first the notion of directly relevant statements
is introduced. A statement is directly relevant if a directly relevant variable,
as calculated in step one, is assigned in this statement. We consider a control
statement relevant if a relevant statement is found within its range of influence.
Variables referenced in a relevant control statement are added to the map of relevant
variables, and a new fix point is calculated. Step two is executed recursively until
the relevant variables remains unchanged.
The relevant variables are returned as the output of the program slicing algorithm.
Post processing
In the post processing phase visual representation of the program slice is
generated. Statements that are part of the slice are coloured green, statements
that are not part of the slice are coloured gray. A statement is part of the
program slice if a relevant variable is defined, or in case of a control statement
if a relevant variable is referenced.
== Interprocedural Flow ========================================================
Interprocedrual flow is represented by the 4-tuple of labels described in the book.
We have decided to implement the interprocedural-flow part within the Monotone
Framework. To accomplish this we extended the basic monotone framweork with a few
extra functions/data structures:
* Calling context is added to the analysis result.
* The monotone framwork lifts the standard intra-procedural functions into the calling contexts.
* Pattern matching is done on the statements which constitute to the interprocedura
flow and different behaviour is implemented for them.
* The monotone framework now requires 4 more functions which provide functionality
for translating formal/actual parameters, merging result values and specifying when
to cut off the calling contexts. Moving in and out of context is done by the framwork itself.
== Creating the Labels/Control flow ============================================
Starting from the AST defined in SimpleAST.hs we try to construct the 6
elementary data structures needed for our analysis:
* Program labels
* Control flow
* Inter- procedural flow
* Entry labels
* Exit labels
* Range of Influence of block-statements (if/while).
The system also desugars the expressions into a form containing only assignments,
unary and binary operators for expressions, thus making the evaluation order
explicit. To accomplish this, dummy variables are introduced for subexpressions.
Function calls, which are seen as expressions in the original sytnax tree,
will be converted to statements using at the call site: FuncCall and FuncBack.
At the definition side: FuncIn and Return.
To construct the control flow, each statement in the original AST has 2
important attributes:
* an inherited attribute exit to which it can flow.
* a synthesized attribute entry which tells the node up in the tree where
it can enter.
This way, complete control is given to the subexpressions with respect to
what flow and labels they generate, this makes them context-independant.
== References ==================================================================
[1] Frank Tip. A Survey of Program Slicing Techniques.