|
| 1 | +# Logs Analysis Project |
| 2 | + |
| 3 | +This application will answer three questions regarding the newsdata.sql dataset. |
| 4 | +It utilizes PostgreSQL and psycopg2. It uses three methods: "popular_article" |
| 5 | +(to find the three most popular articles), popular_author (which lists the most |
| 6 | +popular authors, in terms of page views, in descending order), and error_days |
| 7 | +(which calculates on which days more than 1% of requests led to errors). |
| 8 | + |
| 9 | + |
| 10 | +## Getting Started |
| 11 | + |
| 12 | +In order to run this program the user will need to have access to newsdata.sql, |
| 13 | +psycopg2, python (preferable 3.0 or newer), in addition to several views which need to be |
| 14 | +created in the database. The structure of the queries (breaking them into views) allows |
| 15 | +the logsanalysis.py file to be as readable as possible. |
| 16 | + |
| 17 | +### Necessary Views: |
| 18 | + |
| 19 | +As stated above, the user must create these views in order to run the analysis: popularthree, questionone, |
| 20 | +mostviews, |
| 21 | + |
| 22 | +```sql |
| 23 | +popularthree: |
| 24 | + CREATE VIEW popularthree AS |
| 25 | + SELECT (regexp_split_to_array(path, E'/article/'))[2], COUNT(*) AS views FROM log |
| 26 | + WHERE path != '/' GROUP BY (regexp_split_to_array(path, E'/article/'))[2] |
| 27 | + ORDER BY views DESC LIMIT 3; |
| 28 | + |
| 29 | +questionone: |
| 30 | + CREATE VIEW questionone AS |
| 31 | + SELECT articles.title, popularthree.views from articles, popularthree |
| 32 | + WHERE articles.slug = popularthree.regexp_split_to_array |
| 33 | + ORDER BY views DESC; |
| 34 | + |
| 35 | +mostviews: |
| 36 | + CREATE VIEW mostviews AS |
| 37 | + SELECT (regexp_split_to_array(path, E'/article/'))[2] AS title, COUNT(*) AS views FROM log |
| 38 | + WHERE path != '/' AND status != '404 NOT FOUND' GROUP BY title ORDER BY views DESC; |
| 39 | + |
| 40 | +popularauthor: |
| 41 | + CREATE VIEW popularauthor AS |
| 42 | + SELECT articles.author, SUM(mostviews.views) AS articleViews FROM articles, mostviews |
| 43 | + WHERE articles.slug = mostviews.title GROUP BY author ORDER BY articleViews desc; |
| 44 | + |
| 45 | + mostpopularauthors: |
| 46 | + CREATE VIEW mostpopularauthors AS |
| 47 | + SELECT authors.name, popularauthor.articleViews FROM authors, popularauthor |
| 48 | + WHERE authors.id = popularauthor.author ORDER BY popularauthor.articleViews DESC; |
| 49 | + |
| 50 | +days: |
| 51 | + CREATE VIEW days AS |
| 52 | + SELECT time ::TIMESTAMP::DATE, status FROM log ORDER BY time; |
| 53 | + |
| 54 | +okrsts: |
| 55 | + CREATE VIEW okrqsts AS |
| 56 | + SELECT time, COUNT(status) AS ok FROM days |
| 57 | + WHERE status = '200 OK' GROUP BY time ORDER BY time; |
| 58 | + |
| 59 | +badrqsts: |
| 60 | + CREATE VIEW badrqsts AS |
| 61 | + SELECT time, COUNT(status) AS bad FROM days |
| 62 | + WHERE status = '404 NOT FOUND' GROUP BY time ORDER BY time; |
| 63 | + |
| 64 | +dailyerrors: |
| 65 | + CREATE VIEW dailyerrors AS |
| 66 | + SELECT okrqsts.time, CAST(badrqsts.bad AS FLOAT) / CAST(okrqsts.ok AS FLOAT) * 100 AS percent |
| 67 | + FROM okrqsts, badrqsts WHERE okrqsts.time = badrqsts.time; |
| 68 | + |
| 69 | +baddays: |
| 70 | + CREATE VIEW baddays AS |
| 71 | + SELECT time, percent FROM dailyerrors WHERE percent > 1; |
| 72 | +``` |
0 commit comments