Jon Harvey Associates: The lamp posts are not for bending....

In my ongoing explorations of the statistics and rationale underlying the Government's resolute commitment to Payment by Results (PbR), I asked them a series of further questions after their responses to my last set. (These two blog posts show their answers and my queries). Here is a copy of the letter I received yesterday (my questions in bold, their answers in italics):

Many thanks for your email & attached response to my questions. I note what you say about using section 22 not to answer my questions 15 & 16. I will await publication of the new batch of statistics on 25/7/13 naturally. I reserve the right to appeal against your decision subject to this information being able to answer my questions.

With regard to some of your other answers, I would see further clarification as follows:

a) With regard to Q2, you have not answered my question. You have merely reported on a series of facts and not explained the reasoning behind the decision to produce the statistics early. Please, may I request again, for my question to be answered fully. If that is difficult, I am happy to submit an FoI inquiry requesting all the email correspondence between senior civil servants and the Minister which led up to the early publication of this data. Which would you prefer?

As set out in the publication, we published the figures in an ad hoc bulletin on 13 June, rather than waiting to publish them in the Proven Re-offending statistics bulletin on 25 July, to ensure the information was made public as soon as it was available. This is in accordance with the Code of Practice for Official Statistics (http://www.statisticsauthority.gov.uk/assessment/code-of-practice/) which requires us to “release statistical reports as soon as they are judged ready, so that there is no opportunity or perception of opportunity, for the release to be withheld or delayed”.

Once the MoJ Chief Statistician had judged that we were in a position to publish statistically robust interim re-conviction figures, we published them at the earliest opportunity.

b) In your answer to 6/7 (you correctly identified that this is one question where a rogue carriage return had crept in) you referred me to “Table B3 of annex B from the MoJ’s proven re-offending statistics quarterly bulletin”. I looked at this table carefully but I could not see how it answered my query about the extent of the difference between national stats and the pilot’s stats. Please could you be more precise and show me more clearly how this factor is likely to affect comparisons. Thank you.

The two middle columns of table B3 show (for offenders released from prison or starting court orders) re-offending figures including and excluding cautions.

Looking at the top section headed ‘Proportion’, column 2 (“Previous measure: re-convictions (prison and probation offenders only), whole year”) is the re-conviction rate excluding cautions – i.e. the Doncaster measure. Column 3 (“New measure: re-offending (prison and probation offenders only), whole year”) is the re-offending rate, including cautions – i.e. the National Statistics measure. This shows, for example, for offenders discharged from prison or starting a court order in 2009 the proportion re-convicted was 34.7 per cent (column 2) but when we also count offences that receive a caution the proportion increases to 36.2 per cent (column 3). The difference over time is small - between 1 and 2 percentage points.

As noted in our previous response, we have not produced alternative interim figures on what the impact would be if different rules (such as including cautions) had applied to the pilots. However, the figures in table B3 show the impact at a national level of including/excluding cautions.

c) You answer to Q9 confirms, I think, that there is no element of randomisation in the selection of the ‘control’ comparator groups. As a scientist I find this most disturbing and I am not sure about you, but I don’t I would be prepared to undergo a course of medical treatment that had gone through a “quasi-experiment”. As PbR spreads (as I assume the Governments wants), it will become increasingly difficult to find comparator groups on this basis. Moreover, I cannot see, no matter how independent are the people who are choosing the comparator groups that this process will control for hidden factors. As a consequence, I do not think that you have yet answered my query “what is your considered professional judgement as a statistician as to the validity of these results to guide future practice?” I look forward to your thoughts. Thanks

The Payment by Results pilots were set up to test a range of approaches to achieving reductions in re-offending through paying by results, and different pilots use different payment mechanism designs.

Propensity Score Matching (PSM) is a well established statistical method for creating a control group when it is not possible to carry out a randomised control trial (as it is not in this case). As set out in our previous response, the control group will be selected, using the published PSM methodology, by an Independent Assessor.

The Ministry of Justice’s consultation response, Transforming Rehabilitation: a Strategy for Reform, described how, under our proposals, to be fully rewarded, providers will need to achieve both an agreed reduction in the number of offenders who go on to commit further offences, and a reduction in the number of further offences committed by the cohort of offenders for which they are responsible.

The consultation response stated that we would discuss the final details of the payment mechanism with practitioners and potential providers. To support this engagement, we have since published a Payment Mechanism Straw Man- available at www.justice.gov.uk/downloads/rehab-prog/payment-mechanism.pdf.

While the final design of the payment mechanism is still to be determined, the model set out in the straw man discusses setting a baseline for all reduction in re-offending targets for each Contract Package Area on the basis of average quarterly re-offending figures for the most recent year that data is available.

d) In answer to Q10 you say “The five percentage point reduction target was agreed after analysis of historic reconviction rates established that this would illustrate a demonstrable difference which could be attributed to the new system and not just natural variation” (with my added highlight). However later on you also say “we have not carried out statistical significance tests on the interim figures because, when it comes to the final results, neither pilot will be assessed on the basis of whether they have achieved a statistically significant change”. How can these two statements be compatible? Forgive me, but it seems to me you are using significant differences when it suits you and not when it does not…? Please justify this approach.
&
e) Moreover, given this last statement, may I confirm that taxpayers’ money may well be doled out to the suppliers on what could be a random happenchance difference in results rather than one which is (say) beyond a standard 5% statistical threshold of significance? I am interested in your views here too.

I can confirm that testing was used in the design of the Payment by Results pilots at both Peterborough and Doncaster, to ensure that the minimum targets for outcome-based payments in each pilot are set at such a level that we can be confident that to achieve them a provider must achieve an improvement which is attributable to their interventions and not just natural variation. Because this significance testing is built in at the target setting stage, there is then no need to conduct tests for significance again once outcomes are calculated; instead, outcomes can be judged on whether or not they exceed the targets. The benefit of carrying out the statistical significance testing prior to the start of the pilot rather than at the end is that the ‘goal posts’ can then be set and known by all parties at the outset. In addition, because these targets are set in terms of the final 12-month re-offending measure it is not helpful to carry out statistical significance tests on the interim figures, which measure re-offending over just 6 months and, in the case of the figures for the Doncaster pilot, have smaller offender cohorts than the final measure.

f) Your answer to my question 12 surprised me. It is well known, I thought, that certain crimes rise in the winter such as burglary due to the darker evenings etc. Whilst I recognise that you are comparing ‘like with like’ that does not exclude a seasonal effect, it could merely exacerbate one since your time sample is not across the whole year. Why not provide the summer six monthly data as well?

Using Doncaster as an example, we are not saying that the re-conviction rate for the Oct-Mar 6 months will necessarily match the re-conviction rate for the Apr-Sep 6 months. In fact, because of seasonality it is more likely that they will differ, as you say. Therefore, because we want to compare re-conviction rates over time, we must use the same period for the comparison in each year – that is comparing the various Oct-Mar periods over time. If instead we compared the pilot period of Oct11-Mar12 with say Jan09-Jun09, any difference could reflect a real change, but it could also simply reflect seasonal effects. By comparing the equivalent period in each year, we eliminate this risk of seasonality.

g) I hear what you say about the 19 month period but it really does look shady! Why not 18 months? Why not 6 months? Hopefully the overall data will clear all this up.

We process and analyse re-offending data on a quarterly basis. For the interim figures released on 13 June, the latest quarter for which we could provide 6 month re-conviction figures was the quarter ending March 2012. The Peterborough pilot started in September 2010 (partway through a quarter), which meant we were able to report on a maximum of 19 months of the first Peterborough cohort period. We could have chosen to round this down to a more conventional 18 months but we took the decision that we should include as much of the data as possible to maximise the robustness of the figures. The Doncaster pilot began in October 2011, at the start of a quarter, meaning we reported on a more conventional looking 6 month period.

It is all getting rather convoluted (which is one of the problems I have with PbR in that payments will steadily become more and more like arguments about how many angels can fit on a pin head). However, there are some points I will be raising from all this... (for another day)

But what are your thoughts? What questions now need to be asked?

Meanwhile, if you have not read it, here is my blog post about the next batch results that were published a couple of weeks ago.