jueves, 2 de junio de 2016

Taxpayer-funded inaccuracies: another data story.

Ever since the inception of mankind, decisions are made for a variety of reasons, e.g. leisure, work or sheer survival. And, as humans are prone to errors, wrong decisions are made, too: choosing the wrong mate for a venture, taking a longer path on your daily commute or simply regret eating a dish at a restaurant.

Most of the times, wrong decisions are made on the basis of incomplete, inaccurate, non-existent or even false information (e.g. iraq wars). Nonetheless, there are times in which data are improperly presented, even though they are accurate per se. The data government institutions are no exception  to this rule, as you will see in the next picture.

Number of convicts admitted to penitentiary centers due to felonies of the common regime in Mexico by state - Year: 2014 (source: http://www3.inegi.org.mx/sistemas/mexicocifras/default.aspx)
There are a couple of  things that should be highligted here:

  • Color scale: despite the color scale goes from an almost bloodish red to a sky blue, it does so in a discrete (i.e. in predefined ranges) and irregular way, leading to:
    • Confusion: a state near the lower bound of  a color range and another state near the upper bound of the same range will be portrayed with the same color. 
    • More confusion: as the chrominance of the color scale is not determined by the number of convicts, the color scale becomes less intuitive and harder to interpret.
  • Lack of context: even though the number of convicts of  each state is accurate, it does not lead to actionable insights but to rather informative ones. This is because it is commonsense that states with higher populations will tend to have more convicts. 

The consequences of this map being released to the public may be:

  • Inability to correctly interpret the map (or completely interpreting the map at all);
  • Opportunity for policy makers to turn a blind eye on marginal states and overstate government action on the most populated states.

In order to counter the aforementioned issues, these could be possible actions:

  • Use relative numbers instead of absolute ones. By using, for example, number of convicts per 100.000 inhabitants, a more real sense of the convict situation in each state will be drawn out, leading to better interpretations a,d, thus, decisions.
  • Use a continuous color scale instead of a discrete palette. A continuous color scale (e.g., a "rainbow" one) will make the map more useful, as the number of convicts will be directly related to the color.


viernes, 18 de marzo de 2016

A (rather funny) Data Science story.


(At Lana's apartment) 
  • Lana: I like Roger. I think he could make me happy. (Hypothesis) 
  • Steph: By no chance. He’s a professional cheater, I am pretty sure about it. (Another hypothesis) 
  • Lana: I am so confused… What's more important: His loyalty to me or his ability to make me happy? (Data Science’s uttermost matter: Are we asking the right question?)
  • Steph: Bah… what's happiness anyways? It depends on the eye of the beholder. Instead, cheating is something that is “black and white”: either he cheats on you or not. (Every Data Science question should be specific) 
  • Lana: Let’s stick with the basics: What would upset me the most is him flirting with other girls; I will dump him out if I catch him doing that. (Refining the question)
  • Steph: I have a great idea! I will ask Gloria about how to determine whether a man is a womanizer or not. (Model selection) 


(WhatsApp conversation)
  • Steph: Wussup, Gloria! How do you determine if a man is a womanizer or not? (Consulting to an expert) 
  • Gloria: Womanizers do party a lot, and they dress very fancy clothes. Also, they tend to be tall people. Trust me, it's been years of dating a lot of guys, hun. (Domain expertise)
  • Steph: That seems pretty reasonable, except for the height; it just sounds crazy! Thanks a lot! (Variable selection) 


(At Lana’s apartment -again-)
  • Lana: Steph: have you figured out a way to determine if he’s a womanizer or not? Today, he’s attending his sister's wedding reception, and there's going to be a lot of girls; I don't want to be seen as the “poor victim”. (Business asking for results)
  • Steph: Calm down, Lana! I did figure out a way for doing that. You know, Roger is a fancy guy, and does party a lot, so he surely is a womanizer. I knew it, I knew it, I knew it! (Data Insight) 
  • Lana: Do you really think that he's a womanizer? I don't think so. I need a firmer ground to stand on. (Skepticism about findings) 
  • Steph: I’ve asked Gloria, which is an expert on these matters, and I've got her feedback. Also, I've added some ideas of my own and experiences I’ve heard in the past, so I won't be wrong. (Model training) 
  • Lana: have you asked several people about this? Just make sure this is not a thing of one or two people. (Statistical power) 
  • Steph: I would bet my honor without any fear on that! In fact, I will go and enter the wedding without authorization, and thus I will demonstrate that my way of thinking is accurate. See ya! (Model validation)


(At Roger’s sister’s wedding)
  • Steph: What a filthy animal! Roger is dancing with that girl, and he’s speaking to her next to her ear, feeling so confident and comfy… And look at her! I can't believe it! (Model in production before proper validation) 
  • Next chair’s neighbor: She's another sister of him. (Statistical evidence is not enough to reject null hypothesis) 
  • Steph: Oops… (Type I error) 


(At Lana’s house -with a XXXL hangover-) 

  • Steph: I've got to tell you that life is unfair. (Bad result communication) 
  • Lana: Say what??? (Message is unclear for intended audience) 
  • Steph: Can't you get my point? I thought Roger was a womanizer, but I've found out he does not really seem to be. (Model not valid for implementing) 
  • Lana:  Are you kidding me? I've dumped him once you told me what you thought… (Making wrong, rush-based decisions due to a wrong Data Science process) 

Yikes! (Source: https://letsalldogood.files.wordpress.com/2015/12/yikes-4.png)