Something has been bothering me about Data Science Central

So, what I’m about to write about actually occurred a few months ago, but I am reminded of it every day when I receive an email from Data Science Central or see someone tweet an article from the blog network (which includes Analytics Bridge, Big Data News, etc.), so I figured if it’s still bothering me, it’s worth writing about.

In April, I saw a post by Vincent Granville, owner of and primary author at Data Science Central, which said something like

One way we attract women and minorities to Data Science Central is to create accounts that post articles with female profiles and photos, which are not actually written by women. Can you use data science to find these 5 faux bloggers? The winner will receive $500.

I have to try to remember the original post and paraphrase here (I’m sure this is not close to the original text, but I hope I am capturing the message), because the post has now been modified to appear as if the contest was only to find the one “example” fake account with a camel avatar (current version here).

However, you can tell that the original contest was different based on the submission by “Alton” on the site, who was nice enough to hold back the names of the accounts he found in case they weren’t decoys, but was clearly trying to find more accounts than just the “camel” decoy. Below is a screenshot in case it gets modified on the site.

Alton_DSC_comment

My initial response to the original post with the fake female bloggers was originally, “How many sites do this? Is this sexist? It sure is off-putting for this guy Vincent Granville to post articles under fake accounts, pretending to be a woman or underrepresented minority. Do we actually fall for this kind of thing? Is it a widely accepted practice?” I scrolled down and saw that Cory Teshera had posted a comment questioning the practice, and responding incredulously to the approach as well. I posted about it on twitter, showing my surprise and asking questions about the approach (Including her name and tweets here with her permission):

No one had responded, then I checked back to the post and saw that Cory’s comments had been deleted! I couldn’t believe that I was seeing a post talking about attracting women to the site, but the first woman to comment on the approach was being silenced!

I found Cory on Twitter and asked whether she was the one that posted and whether she had deleted the comment or the site had, and she responded:

Then I let her know I might blog about it and we chatted a bit via tweets and DMs. At this point, the blog post had been modified to remove all reference to the practice. Neither of us had received responses from the site. The only response was the “silent” deletion of comments and editing of the contest post.

I was curious at this point, and started browsing Data Science Central to see if I could find any of these fake female accounts. I didn’t have to use any data science methods to find one right away. I just looked at the top featured posts, clicked on one with a female avatar, and found this article:
Good and Not So Good Companies for Data Scientists
Here is Amy’s profile: http://www.datasciencecentral.com/profile/Amy
I see “she” is blogging heavily now since I last looked. She lists no last name, so I can’t look her up anywhere else that way, but I did an image search on Google and found: “Amy” Image Search Results
Amy is spending a lot of time posting on all of the Data Science Central network sites. On the Hadoop360 site, she is listed as “Amy Cordan“. At this point, I was still holding out a glimmer of hope that Amy could be a real woman, and looked her up on LinkedIN. I would have been happy to find out she was actually working for Data Science Central as a writer. Oh look! There is an Amy Cordan on LinkedIN who is listed as a “Data Scientist” with a PhD in Computer Science from Stanford! Her photo looks a little different though… she sure has a lot of endorsements… but she only has one experience listed… “Co-Founder for Data Science Foundation”… let’s check out their site… DataShaping.com. Uh… well, this is clearly a dummy site, and the email address is apparently Vincent Granville’s. I did find this “staff” page, which strangely doesn’t list “co-founder” Amy. It appears the whole profile, including the sparse LinkedIN profile page with the Stanford PhD but no experience other than working on a data science blog, is totally fake.

Anyway, you get the point. Amy does not appear to be a real woman. What really got me is that there is an apparently off-topic response to “Amy”‘s post (linked above) by Vincent Granville about how “Amazon should hire people to improve security on AWS and deal with fake reviews.” Excuse me, mister… you are replying to a post on your own site, which was written by a fake author, which was probably written by you! How hypocritical.

At this point I was totally turned off from Data Science Central, so if the intent of these fake profiles was to attract women to the site, it definitely backfired for me.

Here are my questions now. How many of the females on the site are actually real? Are there many women and minorities joining the site, and are they influenced by these fake accounts falsely making it appear as if more females are participating than actually are? Is this a common practice among technology networking websites? Does it work? Should we accept it as necessary? Has Vincent Granville made any real effort to ask females to write for Data Science Central?

Do the people endorsing “Amy”‘s LinkedIN profile know it is fake? Are they all fake profiles that Vincent Granville created and had endorse each other?

I have so many questions, and am not coming up with many satisfactory answers myself, other than feeling sad and put off by it all. Please let me know what you think!

(P.S. If you’re reading this, Mr. Granville, posts like this that say things like “The first to prove or disprove our conjecture will win $500 and will have his name associated with the theorem in question” aren’t helping you attract any female readers.)

As for me, it has left a bad taste in my mouth, and I’m currently not retweeting anything that I recognize as being from the Data Science Central network, because I just can’t trust anything it produces at this point.

If anyone wants to do any analysis on the site posts, I’m sure there are algorithms out there that can determine whether they’re likely to have all been written by the same author (same typos, style, etc.). I know there is also this analysis tool which is supposed to be able to tell whether a male or female likely wrote a clip of text: Text Gender Classifier (h/t Paul Marks)
The software is of course not perfect, but the text from “Amy”‘s short article above came out to 68% likely to be written by a male, while this article you’re reading right now was classified as 65% likely female.

I guess I’m surprised at how little “sleuthing” I needed to do to see right through all of this. I didn’t spend hours poring over the site, I clicked on the first article I saw with a female author photo, and researched that author’s profile using Google. It’s practically out there in the open, and since Mr. Granville posted the contest – which has now been edited – to identify these faux bloggers, it appears he wasn’t trying to hide the practice.

And by the way, though Vincent Granville apparently has trouble finding females in Data Science to write for his blog, they do exist and aren’t hard to find on twitter or LinkedIN. I’ve started following the data science women I find on Twitter using a twitter list (Please suggest more in the comments!):
Women in Data Science Twitter List

Also check out Meta Brown’s “Binder fulla Women in Analytics” posts on LinkedIN!

56 comments

  1. I’m glad I’m not the only one who has had this on their mind. Normally I’m a live and let live kind of guy but it has bothered me too. However ultimately I think that Vincent has good intentions and that his efforts are helping many but sometimes ends don’t justify the means.

    I thought it was a little arrogant to try the community on such a sensitive topic but as a good data scientist, when I have the time, I’m usually up fora good challenge. From the comments you’ll notice that Vincent agreed to reward me the prize. Because I was deemed the winner I took screen shots to capture the agreement and hold him accountable. I didn’t think they would come in handy but since he hasn’t shown the willingness to live up to his side of the deal then I don’t feel guilty for exposing his strange practices regarding fake profiles both on his own network and several other networks including LikedIn.

    The screenshots are located at https://drive.google.com/file/d/0B4JAreDAupYgb0ZEcFNMdHM0YU0/edit?usp=sharing
    https://drive.google.com/file/d/0B4JAreDAupYgeG1iYlZzQWxQb28/edit?usp=sharing

    Also if you are curious. To solve this problem I used a classification method on a feature set of all users of his websites. The feature set included things like time since joined, number of posts, number of likes, favorite website, number of comments, and then the frequency and nature of each of those including common text analytics like word count.

    Note that I used this feature set to also run another post which was deleted but seems to be cached at google http://webcache.googleusercontent.com/search?q=cache:ju9MLUdF3JoJ:www.datasciencecentral.com/xn/detail/6448529:Comment:162518%3Fxg_source%3Dactivity+&cd=1&hl=en&ct=clnk&gl=us

    Last I cant find the complete data set that I used but I do have a portion of it saved for those interested in exploring these users: https://dl.dropboxusercontent.com/u/96237511/recent_DSC_members.csv

    I only spent a couple of hours on this (the bulk of it spent collecting and cleaning the data) so naturally I expected my results to be wrong. Still, here is the list of individuals that I originally found having high probability of being fake according to the training metric and feature set I had.

    http://www.datasciencecentral.com/profile/Amy

    http://www.datasciencecentral.com/profile/Alesia

    http://www.analyticbridge.com/profile/DorothyHewittSanchez

    The one with the camel icon:
    http://www.analyticbridge.com/profile/Titus

    Thanks again for your great journalism and for using best practices while paving the way for many aspiring data scientists.

    Sincerely,
    Alton Alexander
    @10altoids

    1. Hi Alesia!

      The fact that you’re real doesn’t “disappoint” me (unlike Amy, your LinkedIN profile looks believable). I’m curious, have you ever interacted with “Amy”? What’s your take on that account?

      Are you on twitter? I’d like to add you to my Women in Data Science list.

      Renee

  2. I’ve noticed a very similar practice on social media sites although it appears to be more of a ploy to reach younger demographics than necessarily women or people of color.
    It seems to be commonly used as a tool in politics – ie: one side or the other in a particular election will create fake profiles to engage in the political discussions on Facebook.
    Luckily, enough goes into an authentic online presence that aping one realistically becomes pretty difficult after about 2-3 weeks, especially if anyone does even pretty basic Google work to check you out – as you prove here.
    Anything along these lines definitely turns me off whoever does it.

    On a feminist note here, I wonder how many people have tried to contact “Amy” or “Alesia” as speakers for an event and took their inevitable refusal as evidence that it’s hard to find female speakers? I wonder how many people have seen their LinkedIn profiles and took their lack of experience as evidence that women don’t bother to get involved in the field? I am even less cool with the legitimate potential harm to real diversity efforts than I am with someone attempting to manipulate me with a false effort.

    1. I have been active on Analytic Bridge since 2007. I thought his Ning websites might be useful in my career and job searches. I have education in operations research and math and work in quantitative risk analysis. I didn’t care for Vincent Granville’s anti-vaccination attitude, nor his dislike of the US Census Bureau, both of which he stated on Analytic Bridge. These are my profiles on his websites, and yes, I am real, not a Granville figment!

      http://www.datasciencecentral.com/profile/LKW
      http://www.analyticbridge.com/profile/lek

      I was quite angry about the Titus fake profile, as I had interacted with Titus (him?), not realizing. I thought he was real. Now I feel like a fool.

      As for Amy, I became suspicious in March 2013 when I tried to contact her, and realized she didn’t seem to exist. I came to the same conclusion about Granville’s Amy Cordon today. This post was the third search result returned foe her name by Google! I am in agreement with author Renee and E Rose. This dishonest behavior by Granville is unprofessional, regardless of whether or not it pertains to women. Titus was an elderly man, I thought. The admin of a professional group shouldn’t deliberately deceive members, then arrogantly play guessing games with identity. Talk about losing trust! That he would do this with women for his absurd reasons, including creating fraudulent profiles on LinkedIn is contemptible and violates LinkedIn Terms of Service, at a minimum.

  3. Thanks for your comments, everyone. I appreciate your additions to this conversation!

    Ellie, your comment about trust is an important one. I think a key factor that will help bring more women into tech is knowing they can trust the people they’re working with to have their best interests in mind. When a major website in the field is faking female profiles in order to appear more diverse, it breaks that trust and drives women away from what could otherwise be a valuable networking resource.

    I’ve been seriously thinking about starting my own site that can serve as an alternative to Data Science Central after I finish grad school (in the spring).

    1. Renee,

      Please consider joining our LinkedIn Group: About Data Analysis, as a prelude to your introducing your alternative.

      A number of us met on blogs like Data Science Central. What we saw led us to found our own group, About Data Analysis. There are so many founders watching everything that systematic fraud is unlikely. We are open to more founders/managers too.

    2. Reneee,
      I put a link to this discussion about Data Science Central into the middle of our discussion about same.
      https://www.linkedin.com/groups/Data-science-versus-statistics-solve-8156839.S.5932985228309069826?trk=groups_items_see_more-0-b-cmr
      By Diego Kuonen

      RE: Amy Cordon
      RESP: We noticed that Amy never answered our emails. Then we found that she has at least three completely different pictures. One is from an Obamacare ad and that girl was named in the media—her name is not Amy.

  4. Thanks for the very enlightening post. What little I have read from Granville/DSC posts, has been off-putting at best. He frequently makes bizarre & uninformed generalizations about various disciplines of quantitative analysis (“traditional statisticians don’t like newer machine learning algorithms…” huh???); although more often than not I find myself getting lost in his rambling and incoherent writing style. Hearing that he suggests that companies post fake profiles to attract women/minorities is certainly disgusting, but not surprising. This is just more evidence to stay away from whatever bottles of snake oil this guy peddles.

  5. Niubius,
    RE: He frequently makes bizarre & uninformed generalizations about various disciplines of quantitative analysis (“traditional statisticians don’t like newer machine learning algorithms..” huh?)
    RESP: You are correct. The term ‘Machine Language’ was coined by statisticians. Statistical ML is the part for analyzing data. IT ML is the other part for managing data. The term ML just describes the mechanism, not the application. Traditional statisticians deal in the application of analyzing data and are masters of the rebranded Statistical ML tool box.

  6. Censorship

    Here is another LinkedIn group that practices censorship: ‘Statistics And Analytics Consultants Group.’

  7. @Nubius :
    “He frequently makes bizarre & uninformed generalizations about various disciplines of quantitative analysis (“traditional statisticians don’t like newer machine learning algorithms…” huh???); although more often than not I find myself getting lost in his rambling and incoherent writing style.”

    +1 I am happy to see that I am not the only one to think that of him.

    1. Sounds like word is starting to get around as more and more people get suspicious of Data Science Central’s content.

  8. I checked Vincent Granville’s background, he does not even have a college degree, just high school. Cambridge, PhD, patents, VC funding – it’s all fake. Maybe he/she/it is fake too, maybe a robot. But you can say the same thing about all of us here, how many are real?

    So why don’t we use real data science to beat him. For now, we are just a little group of complainers, who claim me know better than VG, but what about applying our data science knowledge to make Renee’s blog followed by one million people, rather than a couple dozens. That’s the only way to discredit him, Amy and all the fakes. Though making a business out of trashing someone else is not the best way to become successful, but what else can we do?

    1. I’m seriously considering starting an alternative to DSC… in which case I will be very happy to get millions of followers! I’ll keep you all updated on that front :)

      (Not likely to happen until at least May when I finish grad school)

  9. Hi Hubart,

    I’m going to produce a list of fake data science profiles. Please contact me at rstanzach@datascience.stanford.edu, I would be happy to read your thoughts about Vincent Ganville, Amy Cordon and a few others. Claiming to have a PhD when you don’t even have a college degree is very unethical. These people should be barred from ever earning a college degree.

    Best regards,
    Ronald

  10. My, I’m pretty late to the party on this discussion, but how enlightening!
    I had noticed all the suspicious female profiles that Alton mentioned here (no algorithms required, you can see with the naked eye that they don’t look real), but it never before occurred to me that Vincent would fake profiles, let alone publicly admit it. It’s a shame that he would do that and taint his reputation.
    On another note, thank you for mentioning Meta’s Binder Fulla Women in Analytics! I have a mammoth post on women authors in analytics in the works.

  11. Went looking for his dissertation – couldn’t find it. Maybe I’m not very good at this kind of thing (although I have been fact-checking job candidates in analytics for a number of years now, so I shouldn’t be entirely clueless). Can anyone else find it? His LinkedIn Bio says Facultés universitaires ‘Notre-Dame de la Paix’
    Ph.D., Statistics, Mathematics, Science
    1983 – 1993

    I also looked into the patents he lists. Now, I *know* I’m no good at reading patent documents. I can see the application dates on the patents and “publication date” – 18 months after the application, the US Patent Bureau “publishes” patents – prior to that, the contents are secret. I don’t see any evidence that any of the patents were granted but maybe I just know how or where to look.

  12. Glad I see this article before joining Data Science Central website. It is unethical to attract people using dubious practices, and as one commenter points out, faking profiles on LinkedIn violates their terms and conditions. This makes Data Science Central and its founder none other than a dishonest commercial website trying to trick people into buying their products.

  13. I read this thread with growing personal interest. I’ll explain…

    Years ago, I was surfing Craigslist (w4m) and came upon a posting that had a photo of a luscious babe – self-described as a “fractal expert”. Fractals were big in the nineties.

    I wrote to the poster, and received a reply from “Amy Cordan”. However, I was never able to raise up a subsequent response from “her”. I did some web surfing and came up with a link of some sort indicating that “Amy” worked for one Vincent Granville….

    Soooo… being slightly psychotic, I found Vincent’s home phone number up there in the northwest, and I called him to inquire about Amy. He became instantly and intensely agitated, and yelled into the phone “Don’t ever mention her name again to me!” and hung up.

    This was all a long time ago, maybe ten years or so…

    I still have a copy of the fractal, and a sexy photo of Amy in my Temp directory…

    Write to me if you want to see them.

    I also count both Vincent and “Amy” among my LinkedIn contacts. I wonder if Vincent will figure out which one I am. I know that, if he does, Amy will too.

    With odds of .002%, the exercise will prove just how sharp he is.

  14. I can’t believe my attention has been so intensely hijacked by this thread.

    The result of my Sherlocking is:

    1) There is a real Amy Cordan. Her present name is Amy Henriques and you can find her on LinkedIn. You can also verify her other surnames with a simple veromi.net search (specify Pennsylvania).

    2) Paris Granville is married to Vincent. You can find her on LinkedIn as well, listed as working for the Washington Office of Superintendent of Public Instruction. Interestingly, though she does not mention that she is also employed by her husband’s company, she does show that Amy Cordan has endorsed her French and other skillsets.

    3) The connection I have discovered is that both Amy and Paris studied French at the University of Northern Iowa at the same time (1994 – 1998). There are other nuances involving the French connection, but that one grabs me best.

    So what do we have here, folks? Did Amy have her name appropriated by Vincent or Paris? Or does Amy have a second, hidden identity? I doubt the latter given that Amy’s entire formal education is in the foreign language arts.

    It is all almost as intriguing as it is stupid. I feel my interest waning as my mattress beckons.

    I’ll check back to see if anyone contributes an epilogue… or epitaph… or even some epinephrine if you have any.

    1. It’s hard to imagine someone wouldn’t know that a contact is using their former name to create a public profile… I see her LinkedIN profile is pretty sparse, though, so maybe she just isn’t on the site much and isn’t aware of the other.

      She’s actually in my “day job” area of work (development/fundraising), so maybe I’ll contact her and find out whether she knows…

  15. Wow, sorry just logged into approve comments and didn’t realize interest in this article had kind of taken off of late!

    Thanks for your research, everyone. This is… interesting!

  16. I have avoided DSC and the other related sites as well as Mr. Granville due to the lack of integrity and cloud of suspicion about the content there. Interesting to find this and learn of the practices being used on the site. Very interesting.

  17. I regularly accidentally open articles on Data Science Central because someone will share something on twitter and I’ll click through the short URL without realizing what I’m clicking on.

    Did that today and it was a post by “Amy”. Apparently her handle on DSC is now “Data Science Girl”. Ugh.

    Well, maybe Mr. Granville saw us calling him out for the fake LinkedIN profiles and such and decided to anonymize “Amy”.

  18. Made the “mistake” of googling myself. This is intriguing and disturbing given that I once was close friends with Vincent Granville’s wife.

  19. I happened across this thread after Googling “Vincent Granville fraud”! I too have the same opinion of DSC as most of those who have responded here. My primary beef with him, however, has been around his book. While I’ve never read it, it has HORRIBLE reviews on Amazon (which is the primary reason I’ve never read it), but what really stands out to me is the fact that he was offering to pay people to write positive reviews about his book (on Amazon), which is blatantly unethical (and, to their credit, many people called him out on it!).

  20. I decided to post this here instead of creating a new post so it’s in context with everything else we have all said above.

    I’m not going to add any comment other than to say that we hadn’t heard a response from Vincent Granville about all this (except for suspending my Data Science Central account), until now…

    thanks to a reader for telling me about this facebook comment he saw by Vincent Granville (on his Analytic Bridge account) about me:
    https://www.facebook.com/burton.lee/posts/10153315598360971?comment_id=10153316728250971&offset=0&total_comments=5&comment_tracking=%7B%22tn%22%3A%22R1%22%7D

    and in case it gets edited/deleted, here is a screenshot:
    https://www.becomingadatascientist.com/wp-content/uploads/2015/07/vincent_granville_data_science_renee_comment_single.jpg
    in context:
    https://www.becomingadatascientist.com/wp-content/uploads/2015/07/vincent_granville_data_science_renee_comment.png

    1. I wanted to make sure I had permission to thank him publicly before posting, but now I can: Thanks Gerard de Melo for alerting me to this post on Facebook!

      (which has since been deleted, so see the screenshots)

  21. The guy is extremely disturbing.

    It’s super unfortunate that he moderates the very popular LI group: Advanced Business Analytics, Data Mining and Predictive Modeling

    As soon as you challenge his ideas – comment or thread deleted.

  22. Update: “Amy”‘s Data Science Central and Analytic Bridge profiles have been deleted, the fake LinkedIN profile has been modified to remove all references to Amy Cordan or a Stanford PhD, and it looks like they deleted all of “her” 100+ blog posts!
    https://www.becomingadatascientist.com/2015/07/08/the-data-science-central-incident/#comment-1909

    I would consider this confirmation that she was indeed a made-up person, and also say this is a step in the right direction for Data Science Central. I hope they did the same with any other faked accounts.

  23. All very shocking and disconcerting. Maybe sites like LinkedIn should require an educational degree and background verification of hosts of groups.

    1. Based on a linkedin account with 1 item under experience, location in Belgium, and 0 connections, plus “contact us at [his email address]” in one of “her” DSC posts, I’m inclined to agree. I see recent usage, too.

  24. Someone close to me read an article from DSC, not knowing it was from that site because it was on Flipboard, and decided to show it to me because he thought it was ridiculous that the “resident data scientist” would question the author’s analysis and approach at the top of the article before you even read what she wrote.

    It turns out, it was another Laetitia Van Cauwenberge post. She could be a real woman, but if she is, she’s just acting as a front/editor for Granville’s writing. Every article she writes (which scrolling through top posts appears to be daily) is basically just quoting VG and linking to his other blog posts and papers.

    They posted the author’s photo as if she was writing for DSC (after questioning her approach and analysis). What’s worse? There were actually 6 authors on the original article and it was a tutorial from a coding school. So, it wasn’t even an analysis, it was an example of how to implement kNN. I suspect he just had to call it out because he thought it was a woman writing it. And you know what? It was a good article. I’m going to be sure to share hers on twitter, and let the coding camp know DSC posted it that way.

    Original:
    http://blog.cambridgecoding.com/2016/01/16/machine-learning-under-the-hood-writing-your-own-k-nearest-neighbour-algorithm/

    DSC re-post:
    http://www.datasciencecentral.com/profiles/blogs/k-nearest-neighbor-algorithm-using-python

  25. I became interested in the topic, because I joined data science central more than 5 years ago and liked some of the posts. I focused on finding their publications. A journal search found his article in the Journal of Number Theory (dating back to 1988 and confirming his previous affiliation with a university in Namur, Belgium) and on IEEE, but was unable to find any paper on Journal of the Royal Statistical Society: Series B . I found 17 documents in Scopus, ISI and Wiley; more than 5 of them at symposiums (i.e., not full papers). Far from his alleged “40 papers in statistical journals”… at least far from 40 papers in high quality statistical journals. Really disappointing. A list of such publications should be public due to all the concerns raised in this blog

  26. Taken from KD Nuggets’ article ‘Top Datapreneurs’
    http://www.kdnuggets.com/2015/09/top-datapreneurs-data-science-analyticsvidhya.html/3

    The commentary inside brackets are mine:

    Vincent Granville (AnalyticsBridge, DataScienceCentral)

    “Vincent co-founded AnalyticsBridge, DataScienceCentral in 2011 and claims to perform…

    [article continues describing both companies and then in the second paragraph, goes to describe his background which ends with…]

    “His area of expertise lies in analytics, big data and data science.”

    Which I think should be now changed to:

    His areas of expertise: lies, analytics, big data and data science.

  27. Very interesting information. One of my students asked about him and his website. His writings seemed, let’s say, odd, and so I poked around a bit more.

    I may have missed this tidbit in postings from some other folks, and if so I apologize, but I do find it interesting.

    In one area Granville lists these accomplishments

    Post Doctorate at Cambridge University and U.N.C. Chapel Hill.
    Ph.D. in Mathematics, University of Namur, Belgium, 1993, summa cum laude.
    Laureate of the Belgian Olympiad of Mathematics.

    and specifically states extensive post-doctoral work with Dr. R. L. Smith, both at Cambridge and at U.N.C. Chapel Hill. However, nothing in any of Dr. Smith’s publications or c.v. has any mention of Granville.

    Those discrepancies, with his extremely uninformed (and much worse, in regards to women and people with disabilities) views on vaccinations, have led me to advise students to avoid his work and articles.

Comments are closed.