The consumer price index (CPI) with reference year 2013 = 100, which was introduced in January 2014, is a chain index that is updated every year in January. The purpose of the annual updates is to keep the CPI representative over time and to avoid bias in the measured inflation as the index ages. This is in contrast to an index with a fixed base year, which ages over time, possibly making the measured inflation less accurate after a certain period of time.
The representativeness and the quality of the index are guaranteed over time, among others by keeping the product basket up-to-date, adjusting calculation methods, integrating new price sources and by keeping a representative shop sample.
It is the sixth annual update in a row. This is a short overview of the main changes. The Index Commission gave a unanimous positive opinion on this to the Minister of Economy. The Minister follows this opinion and the changes will therefore be implemented by Statbel in the consumer price index of January 2020.
For private rent, a new method of data collection and compilation will be used from 2020 onwards. Furthermore, the index calculation will be refined for products that are followed with scanner data or web scraping.
Methodological changes for private rent
With a weight of 7.3 %, rent is an important item in the index basket. The data collection and index calculation method for private rent has been thoroughly changed. In the past, a sample of about 2,000 rented houses was used. The addresses were extracted from the database of registered leases of the General Administration of Heritage Documentation (GAHD) of the FPS Finances. The tenants were interviewed every year around the anniversary of their lease, and an index was calculated every month based on this information.
However, this method had a number of disadvantages, such as a low response rate, houses that could no longer be followed because the current tenant left and the new tenant does not collaborate or there is not a new tenant soon...
From 2020 onwards, a new method of data collection and index calculation will be applied to address these shortcomings. Instead of measuring the evolution of rents using a limited sample, the complete database of the GAHD going back to 2011 will be used. More than 200,000 new leases are registered in this database every year. Since it is an administrative data source, it was initially made usable for statistical purposes by means of the necessary data cleaning.
The actual index calculation is carried out just like before using a stratification model down to the provincial level. But the indices at the lowest level (province) are now calculated using a regression method (time product dummy method) with a rolling window period instead of a Dutot index (index calculated based on the ratio of average prices). In simple terms, we follow, over a period of time (a window of 8 years) the same houses in the database. This exercise is carried out for successive periods (rolling window) and the measured price movements are coupled together.
Concretely, this means that with the new method, the price increase of the same house will be measured between successive leases over a period of 8 years. This measured price increase gives a complete picture of the price evolution, i.e. the annual indexations and any price increases which occur when a new lease is concluded for the same rented house.
Methodological changes for scanner data and web scraping
One of the many advantages of the use of scanner data is that turnover information is available at product level. However, when this information is directly used in a monthly chain index, there is a risk of chain drift. This means that the index does not return to the initial level when the prices and turnover in the current period are again equal to those in the initial period.
In order to avoid any chain drift, we have worked until now with unweighted indices at the lowest level. The turnover information was of course used for other purposes, e.g. to determine which products were included in the sample. In the meantime, methods have been developed at international level where turnover information at product level can still be used and no drift will occur. These methods are called multilateral methods because several periods are compared with each other as opposed to traditional index calculations where two periods are compared with each other. As a consequence of index calculations using multilateral methods, past indices should be revised every time the time series is extended. This is because older periods are also compared with the most recent additions. Window periods are used in order to eliminate the revision effect, so that the revision of past figures is not necessary. These window periods can then be coupled together. From 2020 onwards, the so-called GEKS method will be used for scanner data. This is a specific multilateral method, named after Gini, Eltetö, Köves & Szulc. More background information is available in the analysis ’Évaluation des méthodes multilatérales de calcul de l'indice’, published by Statbel on 23rd October 2019.
A similar improvement will be applied to the index calculation of footwear from 2020 onwards. Although, unlike scanner data, turnover information is not available with web scraping, the sum of the monthly value of all articles within a homogeneous product can be used as a weight. This method is similar to a manual data collection, where items with a higher frequency of availability are more likely to be recorded by an interviewer. With weights at the homogeneous level, a weighted multilateral method can therefore be applied, just like for scanner data.
Product basket and weights
At each annual update, representative items can be added to the basket and less representative items can be removed.
In concrete terms, eight new items will be added to the basket in 2020, one of which will be followed by scanner data (non-alcoholic beers) and one by web scraping (sunglasses). Two items will be removed: tanning shop membership and wireless fixed telephone. The weight of these two items has been decreasing year after year and is for 2020 negligible. Two similar items are combined into one item: the items ‘regular’ and ‘combination’ microwave oven are combined into the item 'microwave oven’. The combination of these two items ensures that two separate larger samples are no longer needed, but that a single sample covering both items can be used.
|New representative items|
|07.2.3.0.07||Replacement and storage of summer and winter tyres|
|Removed representative items|
|08.2.0.1.01||Wireless fixed telephone|
|184.108.40.206.01||Tanning shop membership|
No index is published at the level of the representative items, but the introduction of the new representative items
- 02.1.3.3.01 Non-alcoholic beers
- 220.127.116.11.01 Umbrella
- 18.104.22.168.02 Sunglasses
will result in the publication of indices for 2 additional COICOP groups as of January 2020 (down to the 5th level). It concerns the groups
- 02.1.3.3. Non-alcoholic beers
- 22.214.171.124. Other personal effects, not elsewhere classified (n.e.c.).
As in the previous years, the weighting scheme has also been adapted. The weights are now based on the 2018 household budget survey. This survey takes place every two years and was conducted in 2018 among more than 6,000 households. The results were published in November 2019. The weights therefore initially relate to 2018. They were subsequently updated to 2019 with a so-called price update, since December 2019 is the new reference month for the chain index in 2020. Obviously, new items were also integrated in the weighting scheme.
Measure of the price evolution of products and services via scanner data and web scraping
Statbel will continue the integration of ‘big data’ (scanner data and web scraping) as sources for the consumer price index. The use of scanner data and web scraping improves the accuracy of the CPI. Indeed, the price index of a product group should no longer be based on a relatively limited sample of products, but we can process the prices of multiple items sold. This new method results in an index that more closely matches actual consumption habits.
Scanner data have been progressively introduced in the CPI since 2015. The weight of the basket followed using scanner data will amount to 23.7 % in 2020. These are the cash register data from the largest supermarkets. These scanner data are supplemented with price recordings in shops (example: baker, butcher, etc.).
In addition to scanner data, tariff prices, catalogue prices and traditional price recordings in shops, prices are also collected via web scraping. This is a technique for automatically scraping data from web pages. The data from web pages are collected and processed in a structured way, so that they can be used for statistical purposes. Given the growing importance of web shops and the online sale of "classic shops", it is necessary to include these data in the calculation of price indices.
The use of these data also allows to improve the efficiency of the data collection. In addition, the representativeness of the price indices increases, as the prices of a multitude of products can be followed, compared to the traditional price recordings.
Web scraping results are currently being incorporated in the index for DVD's, Blu-ray discs, video games and international train tickets, footwear, seaside and Ardennes weekends, hotel rooms, student room rental, second-hand motor cars. Sunglasses will be added to this list in 2020. The weight share of product groups followed via web scraping will amount to 4.3 % in 2020.
In total, 28 % of the index basket weight will be followed via scanner data or web scraping in 2020.
|CIOCOP||Big data in the CPI||2020 (‰)|
|01||Food and non-alcoholic beverages||175.50|
|02||Alcoholic beverages and tobacco||24.66|
|05.5.2.2||Miscellaneous small tool accessories||3.50|
|05.6.1||Non-durable household goods||9.06|
|09.3.4.2||Products for pets||7.44|
|09.5.4.9||Other stationery and drawing materials||1.79|
|12.1.3||Other appliances, articles and products for personal care||14.66|
|Total scanner data||237.47|
|03.2.1.1||Footwear for men||3.55|
|03.2.1.2||Footwear for women||5.36|
|03.2.1.3||Footwear for infants and children||3.32|
|04.1.2.1||Student room rental||3.45|
|07.1.1.2||Second-hand motor cars||15.38|
|07.3.1.1.11||Train journey abroad||0.41|
|09.1.4.1.02||DVD (music or film)||0.40|
|09.3.1.1.02||Video game for console||0.46|
|09.6.0.1.01||Seaside and Ardennes weekends||2.10|
|Total web scraping||43.02|
|Total big data||280.49|