Garbage in, garbage out: Irresponsible use of trafficking data

It takes all kinds in the gravy train of trafficking research, so I shouldn’t be surprised that newcomers to prostitution and sex-industry issues jump on with a statistical model attempting to prove that the Swedish anti-prostitution law works. They made this thing known after the government published its methodology-and-evidence-free evaluation of the law criminalising the buying of sex.

Niklas Jakobsson and Andreas Kotsadam, of the University of Gothenburg, did it on a blog, with ‘The Law and Economics of International Sex Slavery’, a working paper – a term academics use when they haven’t published an article yet in an academic journal. Journals send contributors’ submissions out to be reviewed by people in the same field; the process, called peer review, is usually double-blind, which means neither writer nor reviewer know the other’s name. This is not always required with a university-published ‘working paper’ (I don’t know whether it was carried out with this paper or not).

The authors engaged briefly with me, Louise Persson and others on Niklas Dougherty‘s blog, shortly after Louise and I published an article critiquing the government’s evaluation on Svenska Dagbladet. Niklas queried some of the information claimed by the authors, pointing out the egregious error they committed when accepting erroneous Danish figures on street prostitution – data that was debunked in the Danish parliament last year as well as in the media more recently. I find it inconceivably irresponsible that researchers desiring to present themselves as ‘scientific’ would use known false data.

On Niklas’s blog (see comments), I confronted the authors for failing to recognise that the ‘data’ they claim to be using is inherently faulty and therefore unusable. I said

It’s a fantasy to think you can talk about ‘data’ when there is not agreement about who is to be counted. Some counting projects call all women migrants who sell sex trafficked. Others call all undocumented migrants trafficked. some call all women who sell sex trafficked. The numbers come from small ngos and police departments who use different definitions and often admit to being confused.

I also take exception to being given evidence from tiny, super-homogeneous places like Bergen (Norway). Nordic research is about very small places with recent, short histories of in-migration, undocumented migration being even smaller. It is misleading and silly to compare ‘data’ from such sites with whole large countries with long and varied migration histories.

The defensive (and inexperienced) response was to accuse me of being anti-science. This is nonsense. The principle here is known everywhere as Garbage In, Garbage Out: it doesn’t matter how pretty your statistical model looks or what a fancy machine you have to crunch the numbers in if the original information you put in is rubbish, and I am far from the only one to think so. The ‘science’ we want to see is honest.

Here is the peer review the authors would have received had their working paper been sent to Paula Thomas, mathematician and statistics analyst in the UK (and if you are cowed by the language, look at the final paragraph).

Comments on The Law and Economics of International Sex Slavery

1. The vector X_i

Only indicative information is given as to what this is. We are told (p12) that it includes population, GDP, migration share (is this immigration only?), heroin seizures and a measure of the rule of law. It would appear that there were other things in the vector but we are not told what they are.

But the main weaknesses here are threefold:-

(a) The use of categorical data

Categorical data is, in my view, dangerous, because:-

(I) It imposes value judgements.

(II) More seriously it obscures the extent of a problem whilst appearing to clarify it.

(II) Is best illustrated using crime figures. London’s Metropolitan Police Service has an excellent crime mapping system. However it does have some weaknesses, and the one that is relevant here is its use of categorical data (fortunately this is mitigated by the use of actual figures as well). My own area went from above average for residential burglary in May 2010 to low for June 2010 on the basis that there were 3 fewer crimes! Do I need to say more?

(b) The lack of any attempt to model (Delta Trafficking), that is the change in trafficking over time.

(c) The lack of any clarity regarding how the weighting variables beta_0 and beta_1 were chosen. In particular doubt must surround beta_1 as it is a single weighting for a whole vector and the elements of the vector have different units, so some dimensional analyses should have been performed.

It would be most helpful if there was a proper ‘methodology’ section explaining the processes used to get the results quoted.

2. The model used

The model is a Logistic regression model the normal formula for which is:-

z=beta_0+sum(from i=1 to n)(beta_{i} x_{i})

Normally this model only applies where the the data are modelled by a binomial distribution.

One question then must be is the data here binomially distributed? This is for the originators of the report to justify.

I also notice the use of a ‘normally distributed error term.’ What error is this term expressing? And how?

Another point is that the variable ‘z’ is not used directly. The probability calculation is:-


Which indicates the blindingly obvious point that trafficking is not an appropriate z.

The event the variable z gives the probability of must be a yes or no event. Since trafficking is only yes/no on an individual basis (ie the level of trafficking is not yes or no), the model is suspect.

Reviewer’s probable advice to journal: Article not publishable without major revisions.


11 thoughts on “Garbage in, garbage out: Irresponsible use of trafficking data

  3. Iamcuriousblue

    Thank you for making this point, because its something I try to point out again and again in discussions about sex work and pornography. One very common feature of the opponents of these things (the “antis” as many of us refer to them) is that they love to spout statistics without reference to where they came from, how they were derived, or the applicability to the phenomenon they’re addressing. A notable example of this is the often-repeated statistic that “the average age of entry into prostitution is 12 years old”. The actual paper this was taken from was a study of 13-17 year old prostitutes and not representative of prostitutes more generally. And as an example of the slippage that takes place, the religious anti-trafficking group The Defenders is now claiming that the average age of entry into pornography is 12 years old.

    The thing is, now matter how impressive the statistical model you use, the “garbage in garbage out” model always applies. If your sampling and/or raw counting is off-base, no statistical model in existence is going to fix that. “Look at the r-values!” simply isn’t an answer.

    It looks like the working paper you discuss suffers from this problem, plus it looks like there are additional problems with the statistical modeling itself. I’ll note that the model versus distribution problem is a subtle but very real one; some statistical models can only be used with normally distributed data, others with binomial distributions or Poisson distributions, and still others for ranked data (as opposed to numerical data). Use the wrong model and you don’t get real numbers.

    There have been many recent papers coming out of the anti-prostitution/ anti-pornography camp that just don’t hold up. One I’ve been hearing about a lot lately is a content analysis of “aggression” in mainstream pornography by Wosnitzer, et al. (Mostly, this has been circulated in the form of a non-peer reviewed conference paper, though a version of it was later published in a peer-reviewed journal.) Besides using a questionable measure of “top-renting” pornography, the actual measurement of what constituted “aggression” was extremely subjective and I think the anti-pornography conference video where the authors of this paper discuss this work says a lot about their biases going into the project. I think the section starting at 37:28 is particularly telling, where one of the coders who actually viewed and scored the films presents her experiences. Clearly, she is somebody who holds the strong anti-pornography views of her mentors, and this person’s subjective views clearly are reflected in the numbers the papers report. I have little doubt that if the exact same methodology was followed using the exact same videos, but using coders that were comfortable with pornography, the numbers would be very different.

    But, once again, this study creates lots of exact-sounding numbers that will be endlessly told and retold, with little regard for how they were actually derived. That creates a situation where the burden falls on critics of these statistics to deconstruct the numbers offered, and from a rhetorical point of view, this creates the problem of being perceived as denialists, etc.

  4. Laura Agustín

    thanks very much for this comment and the links, which i am sending on to some beleaguered folk i know who study pornography.

    using data produced by a student/advisee who is meant to ‘score’ film action scientifically seems particularly bad. so much of this research comes down in the end to someone’s personal feelings.

    the trafficking data suffers from this when social workers or police officers classify people, often based on a single hurried interview in completely non-scientific or non-controlled environments. it’s not the only problem with the so-called data, unfortunately, but it is there.

  5. Kris

    Iamcuriousblue, I think just viewing the average porn-movie, that, just judge by yourself, that pornography is pretty rough isn’t it? I mean, the hardcore anal sex, gaping anal holes, ass-to-mouth, etc…. That’s mainstream nowadays, see by yourgself. Luckily lately, I don’t have the urge to view these movies as I did in the past. I surely concur with the believe that pornography is violence.

  6. Nathan Shachar

    If there existed a solid empirical base from which to generalize on this touchy subject, there would be no need to couch this hypothesis in formal language. There is nothing in the functions and equations presented by the two gentlemen which cannot be expressed in quite plain ordinary language. The problem to overcome is not formalization of the argument, which serves no purpose here, but the gauging of data as relevant or not – if there are any clear-cut data at all. (What a common move this has become, alas, in the social sciences of today!)
    Good for the Swedes, as they wrestle with this huge and un-Swedish challenge – to keep a cool head while discussing sin and morals – to be able to count on Laura Agustin, the world’s leading authority on the subject,
    Nathan Shachar

  7. Iamcuriousblue

    No, Kris, I don’t buy the “pornography is self-evidently rough” argument for several reasons. First, define what “the average” porn movie is, because porn has *a lot* of subgenres and they are not all the same. Some very much do feature a lot of sexual aggression, even mock violence when you start getting into the BDSM and rough sex genres. However, many genres of porn are not at all like this. Gross generalizations about the content of “mainstream porn” is something that the anti-porn movement seems to traffic in, and its something that really contributes a lot of obfuscation to the debate around pornography.

    Also, there’s a failure to understand aggressive sex in context where it is shown in porn. Some, like the kind of movies Max Hardcore was making, really do mean to depict violence against women. Others, like the kind of movies Belladonna makes, are rough-and-tumble sex very much done in the spirit of play. That the ant-porn folks are unable to make this distinction speaks to their own tone deafness to context and failure of analysis, not to problems with most pornography.

    If you’re *starting* with the idea that porn is violence, its no surprise that given that framing you’re going to see it everywhere.

  8. Iamcuriousblue

    Another problem with trying to “model” the success of Swedish style prostitution laws is just how limited the data set is. Only three countries currently have this legal model, and only one has had it for more than three years. And these are countries with a lot of other social factors that make them quite different from other countries. The Nordic countries, with the possible exception of Denmark, are all relatively homogeneous, with less of a wealth gap, smaller migrant communities, etc than countries like the UK, France, or the Netherlands.

    Not to mention the fact that they were never major centers of prostitution to begin with. There would be significant differences in the prostitution situation between Sweden and the Netherlands even if they had the same legal model just based on the fact that several Netherlands cities are centers of “vice tourism”, whereas Sweden has never been this way.

    Given that, its difficult to make real comparisons. I think comparisons can be made between American-style prohibition models versus German/Netherlands/Nevada style legalization versus UK/Canada/Spain style half-decriminalization, but otherwise there’s not a just not a whole lot of comparable countries you can put into a model. Australia/NZ style decriminalization also suffers a bit from this, though Australia is at least comparable to the US in many ways.

  9. Clarisse Thorn

    I do wonder how people who claim that certain types of porn are “mainstream” arrive at that conclusion? Are they deciding this because they see it a lot on late-night channels or because it’s easy for them to find on the internet or … what? Are there any studies that show what kind of porn most people watch and actually do track shifting social preferences? Speaking as an advocate of S&M sex, I’m not at all appalled by representations of violent sex; and as someone who believes that censorship is wrong, I’m certainly not trying to limit porn; but I am curious to know if porn preferences have changed over time.

