Statbel, big data and privacy
Table of Contents
The challenge
Big data, such as scanner data, mobile phone data or satellite data enable Statbel to produce better and faster public statistics for citizens, enterprises and policies. However, big data are often personal data, and many people are concerned about a possible breach of their privacy.
These concerns must be taken seriously, but they can be mitigated by a better understanding of the strict legal framework in which Statbel operates and of the concrete procedures and practices that guarantee that the data are and remain anonymous, and are used exclusively for anonymous and aggregated statistics and - in certain cases and if 'pseudonymised' - for scientific research.
Big data are necessary
Policies cannot be made blindly, companies need to make informed decisions, and citizens need to be well informed. Statbel’s traditional data sources, i.e. surveys and the use of administrative data, have their limits. Big data, also with their limits of course, make it possible to produce statistics in a more modern and intelligent way, that are much faster and more detailed, at a lower cost, without having to bother citizens or enterprises, even on matters that were previously impossible to identify.
Example of possible future applications.
A better matrix place of residence - workplace
The residence-workplace matrix shows where people living at a certain place go to work, which is important, for example, for mobility, public infrastructure and labour market policy. The current statistics, at municipal level and based on the Population and NSSO registers, could be replaced by a matrix based on mobile phone data at neighbourhood level that are much faster available, with a higher frequency. Statbel does not need to receive individual data from the telecom operators for this, but only the result of an agreed selection and calculations.
There are strict laws and rules...
For both its traditional data collection and big data, Statbel is subject to Belgian and European statistical legislation, which states that data may only be collected to produce anonymous and aggregated statistics and that the confidentiality of the data (personal and other) must be strictly guaranteed.
In addition, European and Belgian privacy legislation regulates the use of personal data, but allows storage and processing for statistical purposes under certain conditions.
... that are also put into practice
Statbel implements a strict separation between, on the one hand, the data collection, which immediately 'pseudonymises' incoming data by replacing each identification data with a code, and, on the other hand, the statistical processing, which only works with such pseudonymised data. The statistical results that are disseminated are never individual or personal, but are always aggregated in such a way that individual data cannot be derived through a combination.
These processing procedures are complemented by a secure database structure, both physically and in terms of procedures.
A step further: privacy by design
Neither privacy legislation nor practice is specific to big data. But big data offer the possibility to go even further in data security, via so-called 'privacy by design'; this is the way Statbel wants to work, now and in the future. Privacy by design in this case means that individual data do not leave the data warehouse of the owner (e.g. a telecom operator), but that an agreed query and/or calculation on the data is released and only the result is delivered. Such processing can be set up in such a way that the result becomes completely anonymous and non-individual. This avoids every possible privacy problem, and it has the additional advantage that the size of the data set supplied is greatly reduced, making it easier to handle.
Conclusions
Statbel is not interested in personal data as such but in relevant, reliable and accurate statistical results that are indispensable for citizens, enterprises and policymakers to make decisions. Personal and other individual data are protected by strict legal provisions that are implemented in practice, regarding use, storage, combination of data, and access. This concrete data protection is physically, organisationally and procedurally recorded and documented, and it is put into practice.
Moreover, in the case of big data, it is often unnecessary to retrieve individual data (which is often not even possible or practical, given the enormous volumes involved), thus ensuring the protecti