I’ve been telling everyone that I’d do something “data fun” when I hit 20K Twitter followers, so I posted an analysis of my podcast listeners! I used python and pandas in a Jupyter notebook for the first part, then I did a dashboard in Tableau for the last part.
Category: python
Data Science Learning Club Update
For anyone that hasn’t yet joined the Becoming a Data Scientist Podcast Data Science Learning Club, I thought I’d write up a summary of what we’ve been doing….
Data Science Tutorials Flipboard Magazine
I have been getting great feedback on my “Becoming a Data Scientist” Flipboard magazine, and I had this other set of articles bookmarked that didn’t quite fit into it. I want the Becoming a Data Scientist one to be the “best of the best” of articles I find on Twitter about data science, and to… Continue reading Data Science Tutorials Flipboard Magazine
Playing With Google Cloud Datalab
This weekend, I played around with the newly-released Google Cloud Datalab. I learned how to use BigQuery and also played around with Google Charts vs Pandas+Matplotlib plots, since you can do both in Datalab. I had a few frustrations with it because the documentation isn’t great, and also sometimes it would silently timeout and it… Continue reading Playing With Google Cloud Datalab
API and Market Basket Analysis
I was considering waiting until I’m done before posting about this project, but instead I thought I’d post my progress and plans while I think about the next steps. I posted earlier about using the UsesThis API to retrieve data about what other software people that use X software also use. I thought I was… Continue reading API and Market Basket Analysis
The Setup (usesthis.com) API
There’s a really interesting site usesthis.com AKA “The Setup” which interviews people and lists all of the gear that they use, including software. I found out that they have an API, (documented here) and I wanted to use my new API skills in Python to test it out! This one returns JSON unlike the NPR… Continue reading The Setup (usesthis.com) API
IPython, Requests, lxml, and the NPR API
Last week, I decided to learn how to use python to get data from an API. I started with the Codecademy “Introduction to APIs in Python” course, which got me oriented to how requests work, and in the subsequent NPR API lesson, specifically how the NPR stories API works. Certain parts of the course assumed… Continue reading IPython, Requests, lxml, and the NPR API
Data Science Practice – Classifying Heart Disease
This post details a casual exploratory project I did over a few days to teach myself more about classifiers. I downloaded the Heart Disease dataset from the UCI Machine Learning respository and thought of a few different ways to approach classifying the provided data. ——————————————- “MANUAL” APPROACH USING EXCEL So first I started out by… Continue reading Data Science Practice – Classifying Heart Disease
Codecademy Python Course: Completed
I can cross off another item on my Goals list since i finally jumped back into the Codecademy “Python Fundamentals” course and completed the final topics this afternoon. I think the course would be good for people that have had at least an introductory programming course in the past. I didn’t have much trouble with… Continue reading Codecademy Python Course: Completed
Machine Learning Project 4
So immediately after I turned in project 3, I started on Project 4, our final project in Machine Learning grad class. We had a few options that the professor gave us, but could also propose our own. One of the options was learning how to implement Random Forest (an ensemble learning method using many decision trees) and analyzing a given data set, so I proposed using Random Forest on University Advancement (Development/Fundraising) data I got from my “day job”. The professor approved it, so I started learning about Random Forest Classification.