DataSciGuide Update
I finally had a chance this weekend to make some progress on my “Data Science Directory” website, DataSciGuide.com, and I would love your feedback on it! That site isn’t open for comments yet, so I’m directing people to leave feedback here. If you haven’t kept up with the development of DataSciGuide, here are a few things to read: original vision for the site updates on my progress content that has been posted so far Let me know if you want an account to post some reviews while I test things out! (I’ll even post content that you want to review, just for you.) Also, tell me any thoughts you have about the site in the comment form below! (or tweet...
The Imitation Game, and the Human Element in Data Science
Last night, my husband and I watched The Imitation Game. First of all, it’s a great movie and you should see it. Secondly, there was a moment that got me thinking about the human element of machine learning.
[Spoiler Alerts – but you probably already know much of the story, and the movie is still good even if you know the historical outcome.]
I thought a moment like this may be coming when Alan Turing was first applying to work at Bletchley Park, and Denniston can’t believe he’s applying to be a Nazi codebreaker without even knowing how to speak German. Alan emphasizes that he is masterful at games and solving puzzles, and that the Nazi Enigma machine is a puzzle he wants to solve. He starts designing and building a machine that will theoretically be able to decode the Nazi radio transmissions, but the decoder settings change every day at 12am, so the machine must solve for the settings before the stroke of midnight every day in order for the day’s messages to be decoded in time to be useful and not interfere with the next day’s decoding process. Turing can’t prove his machine will work, simply because it is simply taking too long to solve the daily puzzle. In the meantime, people are dying in the war, and the Nazis are going on transmitting their messages over normal radio waves believing the code is “unbreakable”.
My “Secret” Side Project, Revealed
OK So I was actually hoping to show this to you all long ago, and I kept coming up with more and more ideas for it, so it’s not going to be “ready” to reveal for a while, but I figured I’d go ahead and show it to you anyway. My main motivation is that I keep hearing people say (and sometimes feel myself) that learning to becoming a data scientist on your own using online resources is totally overwhelming: there are so many different possible topics to dive into, few really good guides, lots of impostor-syndrome-inducing posts by people you follow that make you feel like they’re so far ahead of where you are and you’ll *never* get there…. but there’s so much great data science learning content online for everyone from beginners to experienced data scientists! We need a better way to navigate it. Hence my new website: “Data Sci Guide”. It will eventually have a personalized recommender system and structured learning guides and all kinds of other features to help you find the resources to go from where you are to where you want to be, but for now it’s “just” a directory / content rating site. And it’s not ready for you to interact with yet, but it’s getting there, and I’ll need your help fleshing it all out soon. So go take a look! Then come back here to give me feedback and suggestions, because you have to be registered to comment there and I didn’t turn on new user registration yet. OK go now. Don’t forget to come back! >>>> DATA SCI GUIDE.COM <<< So…. what did you think? What do you think of the overall idea and plans? What should I be sure to remember to include? Tell me below!...
Entry Level Data Analyst Skills
Between an interview from a local TV station about my job and going through the process of hiring someone onto our team, I’ve been thinking about what would be the bare minimum skills someone would need to have a chance at being hired as a data analyst. Maybe this would be a helpful list for someone trying to change careers and trying to decide where to focus their learning time. I posted this picture on Twitter: and got some interesting responses: @BecomingDataSci I'd include familiarity with business process in one of those columns. Can't analyze in a vacuum,. — Karen Clark (@clarkkaren) July 17, 2015 @BecomingDataSci @aflyax You've got analytical thinking & problem solving. Maybe add "adaptable to a variety of environments" as generic? — Karen Clark (@clarkkaren) July 20, 2015 @barbarafenton i mentioned that as a misconception! i spend a lot more time communicating than most people think — Data Science Renee (@BecomingDataSci) July 17, 2015 @DataSkeptic yes i think that's important, but you can get an entry level job w/just basic charting skills. was trying to keep to minimum. — Data Science Renee (@BecomingDataSci) July 17, 2015 @BecomingDataSci so e.g. "SQL" could be "data manipulation skills (e.g. SQL)" – don't get hung up on a specific tool to to the job! 2/2 — Martin Monkman (@monkmanmh) July 17, 2015 @BecomingDataSci This is great! My ready-fire-aim data science side says to add "asking forgiveness is easier than permission" to traits :P — Shannon Quinn (@SpectralFilter) July 17, 2015 @BecomingDataSci I'd add : autodidact — craig pfeifer (@aCraigPfeifer) July 17, 2015 What do you think? I’ll revisit this topic later, and I’ll also post about the conference I’m attending (APRA Data Analytics Symposium) when I have a chance to summarize. For the moment, heading back to the...
The Data Science Central “Incident”
I’m writing this post to respond both to what many of you saw Vincent Granville said about me on Facebook a couple days ago, which was brought to my attention yesterday: (in context) and to his apology this evening: I didn’t want to write a second post about Data Science Central, but after the huge response on twitter today, I want to document everything in one place so anyone looking back at this has all of the info to evaluate what has been said. I have thought a lot about Vincent Granville’s apology this evening, and honestly when I heard he had apologized, I hoped (but doubted) it would be sincere. I would have loved to be able to accept his apology and move on from all this. However, I can’t bring myself to accept the apology because it’s not really an apology, it’s an accusation. After writing a truly vile post about me, his “apology” accuses me of harassing *him*. He says that I have “attacked” him for 14 months, and is casting himself as a victim. He’s basically saying “I acted a fool in a heated moment because she’s been attacking me non-stop for over a year” (the heated moment apparently being a Facebook post about Ellen Pao that reminded him of me, and the “attacking” being me pointing out his questionable practices in a blog post and on twitter, I guess). Because of that, and because there are a lot of people who are *actually* harassed online who I think would be offended by his characterization, I want to document everything I’ve said about him, and challenge his definition of harassment. What I have done is document what I saw as some very questionable (if not unethical) behaviors, and occasionally initiated or participated in conversation on twitter about that. I have never “attacked” him in any way, but I want to leave it up to you readers to decide. Here is the history of my comments about Data Science Central and Vincent Granville: April 21-22, 2014: Initial twitter conversation with @tesherista about Data Science Central’s contest to find fake accounts created to attract women and minorities to Data Science Central, where Vincent Granville deleted Cory’s comments questioning the practice. (screenshots by @AltonDataSci) @tesherista went back to refer to @DataScienceCtrl. were your comments deleted?? — Data Science Renee (@BecomingDataSci) April 22, 2014 July 1, 2014: Original blog post ““Something has been bothering me about Data Science Central” here on Becoming a Data Scientist, where I wrote about the above experience, as well as exposing one of the fake Data Science Central profiles “Amy Cordan” as having a fake LinkedIN profile (still there as “Amy Sangrene”) with a fake Stanford Computer Science PhD, violating LinkedIN terms & conditions. (He mentioned me questioning his advanced degrees in his “apology”, and this is the only academic credential I have brought under scrutiny, that of “Amy”.) In response to this post, I received the following comments from readers (among others, you can see them at the end of the post linked above): Alton discussing his negative experience with the Data Science Central contest Ellie talking about losing trust in DSC when she tried to contact “Amy” and realized she wasn’t real Hubart and David questioning his academic background (maybe this is why he thought I did? because commenters on my post did?) “System Administrator” recalling another use of the name Amy Cordan by Vincent Granville online in the past Eric mentioning he found that Vincent Granville was accused of paying people to write positive Amazon reviews of his Developing Analytic Talent book A comment by someone who claims to be the “real” Amy Cordan (Henriques) and used to be close friends with Vincent Granville’s wife July 1-5, 2014: Twitter conversation with @altondatasci following the blog post above, as well as an explanation for why I wrote the post: My hope in writing this post is that @DataScienceCtrl will reach out and hire some real female and minority data scientists/writers. — Data Science Renee (@BecomingDataSci) July 2, 2014 October 26, 2014: Tweet to @kissmetrics (and conversation following) alerting them that Amy was a fake profile. June 8-10, 2015: Tweets after the real Amy Cordan commented on my blog, conversation between @ellieaskswhy, @metabrown312, and @tesherista on Twitter about fake Amy and deleted comments. Follow-up tweets warning people again about what we had found, and talking about the suspended accounts. June 24, 2015: Tweets about finding out my Data Science Central account was suspended. Here is every tweet I've ever mentioned @DataScienceCtrl. I have never contacted him otherwise. https://t.co/Er0ndmlbZO — Data Science Renee (@BecomingDataSci) July 7, 2015 Throughout all...