Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an inspirational example #57

Open
c-martinez opened this issue Sep 27, 2017 · 2 comments
Open

Provide an inspirational example #57

c-martinez opened this issue Sep 27, 2017 · 2 comments
Assignees

Comments

@c-martinez
Copy link
Collaborator

I had a half day workshop with humanities scholars this week. Towards the end of the workshop, the group was getting a bit tired and starting to struggle. In particular someone voiced his concern about slicing lists: "it is too abstract, I don't understand when this might be useful". So we spent the last 1/2 hour of the workshop with me showing them how to make a plot of the word-count of a book (code below).

Pros:

  • It made concepts a bit more concrete as I was using mostly things we had already covered during the workshop.
  • It was good for them to see "how things are done in real life": I couldn't remember how to load text from a url, so I quickly googled it -- I think it was a bit of a relief for them to see that, yes, you can google for the answer (and that you will most likely find it).

Cons:

  • I think it goes a bit against the interactive style of workshops -- I was doing all the typing and explaining what I was doing and what problems I was encountering, while they were just watching and listening. It could be very easy for people to get distracted.
  • It went a bit too fast for some people (I pasted my command history and added comments on a shared document afterwards, but still...)

Question: Is it useful to include such a demo as an inspirational example of what you can do with Python?

# This is the book we are going to work with (Dracula)
urlBook = 'http://www.gutenberg.org/cache/epub/345/pg345.txt'

# We use urllib library to load data from the web
import urllib

# We first use urlopen to connect to make our request and 
# then read the content of our response
resp = urllib.request.urlopen(urlBook)
data = resp.read()

# We have the text of the book on a variable called data, so we inspect it a bit
print(type(data))   # What is its time
print(len(data))     # How big is it?
print(data[:100])   # Let’s have a look at the first 100 letters of the data...

# We will pre-process our data a little bit
dataStr = str(data)   # convert it into a string
dataLower = dataStr.lower()  # lowercase it
dataWords = dataLower.split(' ')  # split our string after every space ‘ ‘

# Let’s have a look at how our data looks now
print(type(dataWords))   # What is the type
print(dataWords[:10])     # look at different parts of the data
print(dataWords[100:120])
print(dataWords[1100:1120])

# How we will use the python counter.
from collections import Counter

# A counter tells us how many items of which kind we have
# for example, how many 1’s, 2’s and 3’s
print(Counter([1, 1, 1, 2, 3]))

# Now we use counter to count how many times each word appears on the text
counts = Counter(dataWords)

# We can see which are the 10 most common words on our text:
print(counts.most_common(n=10))

# Now we are going to use python library matplotlib. 
import matplotlib.pyplot as plt

# First we get the convert our counts to a list and sort our list in reverse order
values = list(counts.values())
values.sort(reverse=True)

# We can have a look at the top 5 and bottom 5 counts
print(values[:5])
print(values[-5:])

# Now we can print this in a log scale
plt.loglog(values)

DISCLAIMER: this is probably not the prettiest way of doing word counts, but the aim was to illustrate the point of how to load and work with data.

@c-martinez c-martinez assigned c-martinez and richyvk and unassigned c-martinez Sep 27, 2017
@richyvk
Copy link
Collaborator

richyvk commented Sep 28, 2017

Hey. I like it. I have to say I have done about zero plotting, but I can see how plotting is a good use case for a wide range of libraries. I'm thinking usage stats for example.

I did start work on an introduction that I'd thought could include a complete compelling example like this to get people interested.

Would you see this example being used like that? That way we could maybe break it apart for osme of the episodes?

Might mean a fair bit of work though.

I will admit I've been looking at the lesson a bit lately, and the felling it leaves me with is usually something like, wow, this is long!

Perhaps it's too long?

@c-martinez
Copy link
Collaborator Author

I like the plotting because it is something you always need to understand what your data looks like, and it makes things more concrete.

I would be inclined to keep it as one block demo to wrap things up and make all the abstract concepts which we've seen through the lesson a bit more concrete.

As for lesson length, I agree, it is a bit long and what I find I end up doing is skipping some episodes (which is the nice things of having episodes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants