LDC :: Better stats than a programmer, better programming than a statistician

So down to brass tacks: what skills do we need in our data teams and how do we get ‘em?

Deriving insight from large data sets essentially takes us into the world of data science. This is variously defined, but a useful and common explanation is given here. Essentially when mathematical, statistical skills are combined with computer science skills and a deep understanding of the subject area, data science can occur. The last of these three pre-requisites is, of course, knowledge rather than a skill, and it is disputed whether one person can hold all of these skills and knowledge simultaneously.

But if we return to the skills required across maths and computer science we quickly come to the somewhat glib but useful definition[1] of a person who is “better at statistics than most programmers and better at programming than most statisticians”. Given the importance of communication and influence I would add in data visualisation (which could be considered a subset of programming, although I would argue that it is distinct enough to be considered as a separate skill in and of itself).

My long paper, “So what? And for that matter how?” gives more detail on these skills, but in headline terms the following are consistently discussed.

Programming skills sufficient to:

Manage large datasets to allow automation of routine reporting (to create time to use other data for insight)
Query and analyse large, complex and often linked structured data sets
Harvest and categorise unstructured data

Statistical skills beyond the routine in order to:

Identify and analyse of relationships between features
Describe and explain complex systems and interactions within system
Predict behaviours within a system based upon numerous simultaneously interacting variables

Visualisation skills in order to:

Allow non-analysts to ask analytic questions without needing analytic support. This helps target managerial attention and action effectively.
Illustrate complex problems simply and intuitively

There is of course enormous devilment in the detail here, and the boundaries of this are open to interpretation. Of more immediate interest to me, however, is where we currently stand in terms of these skills.

Like most countries, one of the strengths of the New Zealand public sector is transferability of skills so that an individual can work across different sectors drawing on a common base of skills. However, the shadow side of this flexibility is an unstated but quite common belief that clever people can turn their hand to anything with minimal training. The level of sophistication of skills required is unlikely to come from “on the job training”, certainly at the scale and spread that we need.

The question then is “how to create the cadre?”

Two excellent solutions to this problem are being developed by the Office for National Statistics Data Science Campus. The first is creation of apprentice roles for school leavers, an innovation now being extended to a three year program that leads to a bachelor’s degree with an expectation of 80 percent of learning being “on the job”. The second is “Accelerator” programmes, essentially free mentoring programs to the broader public service where the participants bring a data science project with them and are provided with tools, mentorship and three months to deliver. The approach builds skills in real world situations and leaves behind a group of champions. These approaches, practical, focused in real world problems and relatively low cost seem to me to be relatively transportable to our context.

[1] Variously attributed

Better stats than a programmer, better programming than a statistician – the skills upgrade