Skip to content

Commit d11b77e

Browse files
Add files via upload
1 parent 16b8a11 commit d11b77e

6 files changed

+264
-205
lines changed

09-DescriptiveStatistics.md

+55-26
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,24 @@
11

2-
3-
```python
42
Python Pandas - Descriptive Statistics
3+
=======================================
54

65
A large number of methods collectively compute descriptive statistics and other related operations on DataFrame.
76
Most of these are aggregations like sum(), mean(), but some of them, like sumsum(), produce an object of the
87
same size. Generally speaking, these methods take an axis argument, just like ndarray.{sum, std, ...}, but the axis can be
98
specified by name or integer DataFrame − “index” (axis=0, default), “columns” (axis=1)
109

1110
Let us create a DataFrame and use this object throughout this chapter for all the operations.
12-
```
13-
1411

15-
```python
16-
Example
17-
```
12+
### Example
1813

1914

2015
```python
2116
import pandas as pd
2217
import numpy as np
2318
#Create a Dictionary of series
24-
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
19+
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']),
20+
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
21+
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
2522
#Create a DataFrame
2623
df = pd.DataFrame(d)
2724
print( df )
@@ -53,7 +50,9 @@ Returns the sum of the values for the requested axis. By default, axis is index
5350
import pandas as pd
5451
import numpy as np
5552
#Create a Dictionary of series
56-
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
53+
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']),
54+
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
55+
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
5756
#Create a DataFrame
5857
df = pd.DataFrame(d)
5958
print( df.sum() )
@@ -98,18 +97,17 @@ print(df.sum(1))
9897
dtype: float64
9998

10099

101-
102-
```python
103-
mean()
100+
### ``mean()``
104101
Returns the average value
105-
```
106102

107103

108104
```python
109105
import pandas as pd
110106
import numpy as np
111107
#Create a Dictionary of series
112-
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
108+
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']),
109+
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
110+
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
113111
#Create a DataFrame
114112
df = pd.DataFrame(d)
115113
print(df.mean())
@@ -129,7 +127,9 @@ Returns the Bressel standard deviation of the numerical columns.
129127
import pandas as pd
130128
import numpy as np
131129
#Create a Dictionary of series
132-
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
130+
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']),
131+
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
132+
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
133133
#Create a DataFrame
134134
df = pd.DataFrame(d)
135135
print(df.std() )
@@ -152,15 +152,16 @@ Let us now understand the functions under Descriptive Statistics in Python Panda
152152
```python
153153

154154

155-
The following table list down the important functions − S.No. Function Description
155+
The following table list down the important functions
156+
156157
1. count() Number of non-null observations
157158
2. sum() Sum of values
158159
3. mean() Mean of Values
159160
4. median() Median of Values
160161
5. mode() Mode of values
161162
6. std() Standard Deviation of the Values
162-
7.min() Minimum Value
163-
8. max() Maximum Value
163+
7. min() Minimum Value
164+
8. max() Maximum Value
164165
9. abs() Absolute Value
165166
10. prod() Product of Values
166167
11. cumsum() Cumulative Sum
@@ -180,10 +181,16 @@ The ``describe()`` function computes a summary of statistics pertaining to the D
180181

181182

182183
```python
183-
import pandas as pd import numpy as np
184-
#Create a Dictionary of series d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
185-
#Create a DataFrame df = pd.DataFrame(d)
186-
print df.describe()
184+
import pandas as pd
185+
import numpy as np
186+
#Create a Dictionary of series
187+
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']),
188+
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
189+
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
190+
191+
#Create a DataFrame
192+
df = pd.DataFrame(d)
193+
print(df.describe())
187194
```
188195

189196

@@ -200,6 +207,11 @@ Takes the list of values; by default, 'number'. object − Summarizes String col
200207

201208
Now, use the following statement in the program and check the output − import pandas as pd import numpy as np
202209
#Create a Dictionary of series
210+
```
211+
212+
213+
```python
214+
203215
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
204216
#Create a DataFrame df = pd.DataFrame(d) print df.describe(include=['object']) Its output is as follows − Name count 12 unique 12 top Ricky freq 1
205217

@@ -217,10 +229,27 @@ Now, use the following statement and check the output −
217229

218230
import pandas as pd
219231
import numpy as np
220-
#Create a Dictionary of series d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
221-
#Create a DataFrame df = pd.DataFrame(d)
222-
print df. describe(include='all')
232+
#Create a Dictionary of series
233+
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']),
234+
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
235+
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
236+
#Create a DataFrame
237+
df = pd.DataFrame(d)
238+
print(df. describe(include='all') )
223239

224-
Its output is as follows − Age Name Rating count 12.000000 12 12.000000 unique NaN 12 NaN top NaN Ricky NaN freq NaN 1 NaN mean 31.833333 NaN 3.743333 std 9.232682 NaN 0.661628 min 23.000000 NaN 2.560000 25% 25.000000 NaN 3.230000 50% 29.500000 NaN 3.790000 75% 35.500000 NaN 4.132500 max 51.000000 NaN 4.800000
225240

226241
```
242+
243+
Age Name Rating
244+
count 12.000000 12 12.000000
245+
unique NaN 12 NaN
246+
top NaN Ricky NaN
247+
freq NaN 1 NaN
248+
mean 31.833333 NaN 3.743333
249+
std 9.232682 NaN 0.661628
250+
min 23.000000 NaN 2.560000
251+
25% 25.000000 NaN 3.230000
252+
50% 29.500000 NaN 3.790000
253+
75% 35.500000 NaN 4.132500
254+
max 51.000000 NaN 4.800000
255+

12-iteration.md

+89-72
Original file line numberDiff line numberDiff line change
@@ -42,46 +42,40 @@ for col in df:
4242
y
4343

4444

45-
46-
```python
4745
To iterate over the rows of the DataFrame, we can use the following functions −
48-
iteritems() − to iterate over the (key,value) pairs
49-
iterrows() − iterate over the rows as (index,series) pairs
50-
itertuples() − iterate over the rows as namedtuples
51-
iteritems()
52-
Iterates over each column as key, value pair with label as key and column value as a Series object.
53-
```
5446

55-
56-
```python
47+
* ``iteritems()`` - to iterate over the (key,value) pairs
48+
* ``iterrows()`` - iterate over the rows as (index,series) pairs
49+
* ``itertuples()`` - iterate over the rows as namedtuples
50+
* ``iteritems()`` - Iterates over each column as key, value pair with label as key and column value as a Series object.
5751

5852

53+
```python
5954
import pandas as pd
6055
import numpy as np
6156

6257
df = pd.DataFrame(np.random.randn(4,3),columns=['col1','col2','col3'])
6358
for key,value in df.iteritems():
64-
print key,value
65-
Its output is as follows −
66-
col1 0 0.802390
67-
1 0.324060
68-
2 0.256811
69-
3 0.839186
70-
Name: col1, dtype: float64
71-
72-
col2 0 1.624313
73-
1 -1.033582
74-
2 1.796663
75-
3 1.856277
76-
Name: col2, dtype: float64
77-
78-
col3 0 -0.022142
79-
1 -0.230820
80-
2 1.160691
81-
3 -0.830279
82-
Name: col3, dtype: float64
59+
print(key,value)
8360
```
8461

62+
col1 0 1.141317
63+
1 0.289031
64+
2 -1.269689
65+
3 -1.668425
66+
Name: col1, dtype: float64
67+
col2 0 1.561011
68+
1 0.391033
69+
2 0.083089
70+
3 0.106299
71+
Name: col2, dtype: float64
72+
col3 0 -0.436407
73+
1 0.136565
74+
2 0.444321
75+
3 0.738629
76+
Name: col3, dtype: float64
77+
78+
8579

8680
```python
8781

@@ -92,68 +86,84 @@ iterrows() returns the iterator yielding each index value along with a series co
9286

9387

9488
```python
95-
9689
import pandas as pd
9790
import numpy as np
9891

9992
df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
10093
for row_index,row in df.iterrows():
101-
print row_index,row
102-
Its output is as follows −
103-
0 col1 1.529759
104-
col2 0.762811
105-
col3 -0.634691
106-
Name: 0, dtype: float64
107-
108-
1 col1 -0.944087
109-
col2 1.420919
110-
col3 -0.507895
111-
Name: 1, dtype: float64
112-
113-
2 col1 -0.077287
114-
col2 -0.858556
115-
col3 -0.663385
116-
Name: 2, dtype: float64
117-
3 col1 -1.638578
118-
col2 0.059866
119-
col3 0.493482
120-
Name: 3, dtype: float64
94+
print("\n")
95+
print(row_index,row)
96+
```
97+
98+
99+
100+
0 col1 -0.469367
101+
col2 -1.466803
102+
col3 0.493435
103+
Name: 0, dtype: float64
104+
105+
106+
1 col1 0.686016
107+
col2 -1.293819
108+
col3 -1.087791
109+
Name: 1, dtype: float64
110+
111+
112+
2 col1 0.646084
113+
col2 -0.312096
114+
col3 1.518408
115+
Name: 2, dtype: float64
116+
117+
118+
3 col1 -2.464781
119+
col2 0.211235
120+
col3 -0.238992
121+
Name: 3, dtype: float64
122+
123+
124+
121125
Note − Because iterrows() iterate over the rows, it doesn't preserve the data type across the row. 0,1,2 are the row indices and col1,col2,col3 are column indices.
122126

123-
```
124127

125128

126129
```python
127130
itertuples()
128131
itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values.
132+
133+
134+
```
135+
136+
137+
```python
129138
import pandas as pd
130139
import numpy as np
131140

132141
df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
133142
for row in df.itertuples():
134-
print row
143+
print("\n")
144+
print(row)
135145
```
136146

147+
148+
149+
Pandas(Index=0, col1=-0.9771341226396765, col2=0.2724475615802741, col3=-0.6589499024186599)
150+
151+
152+
Pandas(Index=1, col1=1.6177467086432253, col2=-0.9763868574908899, col3=0.08317561529190409)
153+
154+
155+
Pandas(Index=2, col1=-0.988445247281908, col2=-0.6366889592765412, col3=0.3956289433362847)
156+
157+
158+
Pandas(Index=3, col1=-0.19595952598665276, col2=-0.13115863172857256, col3=-0.04519796025813786)
137159

138-
```python
139-
140-
Its output is as follows −
141-
Pandas(Index=0, col1=1.5297586201375899, col2=0.76281127433814944, col3=-
142-
0.6346908238310438)
143160

144-
Pandas(Index=1, col1=-0.94408735763808649, col2=1.4209186418359423, col3=-
145-
0.50789517967096232)
146161

147-
Pandas(Index=2, col1=-0.07728664756791935, col2=-0.85855574139699076, col3=-
148-
0.6633852507207626)
149162

150-
Pandas(Index=3, col1=0.65734942534106289, col2=-0.95057710432604969,
151-
col3=0.80344487462316527)
152163
Note − Do not try to modify any object while iterating. Iterating is meant for reading and the iterator returns a copy of the original object (a view), thus the changes will not reflect on the original object.
153164

154165

155166

156-
```
157167

158168

159169
```python
@@ -163,14 +173,21 @@ import numpy as np
163173
df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
164174

165175
for index, row in df.iterrows():
166-
row['a'] = 10
167-
print df
168-
Its output is as follows −
169-
col1 col2 col3
170-
0 -1.739815 0.735595 -0.295589
171-
1 0.635485 0.106803 1.527922
172-
2 -0.939064 0.547095 0.038585
173-
3 -1.016509 -0.116580 -0.523158
176+
row['a'] = 10
177+
178+
print(df)
179+
180+
```
181+
182+
col1 col2 col3
183+
0 0.325979 0.892602 -1.034127
184+
1 2.267333 -0.356288 -2.088448
185+
2 -1.159300 1.004701 0.742375
186+
3 0.132715 -1.565420 -1.142597
187+
188+
189+
190+
```python
174191
Observe, no changes reflected.
175192

176193
```

0 commit comments

Comments
 (0)