Skip to content

Commit 537623b

Browse files
committed
renames README.txt to .md and adds shebang to logsanalysis.py
0 parents  commit 537623b

File tree

3 files changed

+171
-0
lines changed

3 files changed

+171
-0
lines changed

README.md

+72
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Logs Analysis Project
2+
3+
This application will answer three questions regarding the newsdata.sql dataset.
4+
It utilizes PostgreSQL and psycopg2. It uses three methods: "popular_article"
5+
(to find the three most popular articles), popular_author (which lists the most
6+
popular authors, in terms of page views, in descending order), and error_days
7+
(which calculates on which days more than 1% of requests led to errors).
8+
9+
10+
## Getting Started
11+
12+
In order to run this program the user will need to have access to newsdata.sql,
13+
psycopg2, python (preferable 3.0 or newer), in addition to several views which need to be
14+
created in the database. The structure of the queries (breaking them into views) allows
15+
the logsanalysis.py file to be as readable as possible.
16+
17+
### Necessary Views:
18+
19+
As stated above, the user must create these views in order to run the analysis: popularthree, questionone,
20+
mostviews,
21+
22+
```sql
23+
popularthree:
24+
CREATE VIEW popularthree AS
25+
SELECT (regexp_split_to_array(path, E'/article/'))[2], COUNT(*) AS views FROM log
26+
WHERE path != '/' GROUP BY (regexp_split_to_array(path, E'/article/'))[2]
27+
ORDER BY views DESC LIMIT 3;
28+
29+
questionone:
30+
CREATE VIEW questionone AS
31+
SELECT articles.title, popularthree.views from articles, popularthree
32+
WHERE articles.slug = popularthree.regexp_split_to_array
33+
ORDER BY views DESC;
34+
35+
mostviews:
36+
CREATE VIEW mostviews AS
37+
SELECT (regexp_split_to_array(path, E'/article/'))[2] AS title, COUNT(*) AS views FROM log
38+
WHERE path != '/' AND status != '404 NOT FOUND' GROUP BY title ORDER BY views DESC;
39+
40+
popularauthor:
41+
CREATE VIEW popularauthor AS
42+
SELECT articles.author, SUM(mostviews.views) AS articleViews FROM articles, mostviews
43+
WHERE articles.slug = mostviews.title GROUP BY author ORDER BY articleViews desc;
44+
45+
mostpopularauthors:
46+
CREATE VIEW mostpopularauthors AS
47+
SELECT authors.name, popularauthor.articleViews FROM authors, popularauthor
48+
WHERE authors.id = popularauthor.author ORDER BY popularauthor.articleViews DESC;
49+
50+
days:
51+
CREATE VIEW days AS
52+
SELECT time ::TIMESTAMP::DATE, status FROM log ORDER BY time;
53+
54+
okrsts:
55+
CREATE VIEW okrqsts AS
56+
SELECT time, COUNT(status) AS ok FROM days
57+
WHERE status = '200 OK' GROUP BY time ORDER BY time;
58+
59+
badrqsts:
60+
CREATE VIEW badrqsts AS
61+
SELECT time, COUNT(status) AS bad FROM days
62+
WHERE status = '404 NOT FOUND' GROUP BY time ORDER BY time;
63+
64+
dailyerrors:
65+
CREATE VIEW dailyerrors AS
66+
SELECT okrqsts.time, CAST(badrqsts.bad AS FLOAT) / CAST(okrqsts.ok AS FLOAT) * 100 AS percent
67+
FROM okrqsts, badrqsts WHERE okrqsts.time = badrqsts.time;
68+
69+
baddays:
70+
CREATE VIEW baddays AS
71+
SELECT time, percent FROM dailyerrors WHERE percent > 1;
72+
```

logs_analysis.txt

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
The top three articles are:
2+
3+
4+
5+
Candidate is jerk, alleges rival - 338647 views
6+
7+
8+
9+
Bears love berries, alleges bear - 253801 views
10+
11+
12+
13+
Bad things gone, say good people - 170098 views
14+
15+
16+
17+
18+
19+
The most popular authors are as follow:
20+
21+
22+
23+
Ursula La Multa - 507594 hits
24+
25+
26+
27+
Rudolf von Treppenwitz - 423457 hits
28+
29+
30+
31+
Anonymous Contributor - 170098 hits
32+
33+
34+
35+
Markoff Chaney - 84557 hits
36+
37+
38+
39+
40+
These days more than 1 percent of requests resulted in error:
41+
42+
43+
44+
2016-07-17 - 2.31506899455 percent

logsanalysis.py

+55
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
"""When file run from the command line, all three operations
2+
are called in order."""
3+
4+
#!/usr/bin/env python3
5+
import psycopg2
6+
import datetime
7+
8+
9+
def popular_article():
10+
"""Returns most popular 3 articles along with number of views."""
11+
db = psycopg2.connect("dbname='news'")
12+
c = db.cursor()
13+
c.execute("select * from questionone;")
14+
rows = c.fetchall()
15+
print('The top three articles are: ')
16+
for row in rows:
17+
x, y = row
18+
print('\n %s - %s views' % (x, y))
19+
db.close()
20+
21+
22+
def popular_author():
23+
"""Lists most popular authors and number of views in descending order."""
24+
db = psycopg2.connect("dbname='news'")
25+
c = db.cursor()
26+
c.execute("select * from mostpopularauthors;")
27+
authors = c.fetchall()
28+
print('\n\n The most popular authors are as follow:')
29+
for row in authors:
30+
a, b = row
31+
print('\n %s - %s hits' % (a, b))
32+
db.close()
33+
34+
35+
def error_days():
36+
"""Lists days where more than 1% of requests were erroneously resultant."""
37+
db = psycopg2.connect("dbname='news'")
38+
c = db.cursor()
39+
c.execute("select * from baddays;")
40+
days = c.fetchall()
41+
print('\n These days more than 1 percent of requests resulted in error:')
42+
for row in days:
43+
a, b = row
44+
print('\n %s - %s percent' % (a, b))
45+
db.close()
46+
47+
48+
def logs_analysis():
49+
"""Runs all three analyses consecutively"""
50+
popular_article()
51+
popular_author()
52+
error_days()
53+
54+
55+
logs_analysis()

0 commit comments

Comments
 (0)