Archiwa tagu: nowcasting

Predicting New Car Registrations: Nowcasting with Google Search and Macroeconomic Data

Published as: E. Tomczyk, T. Doligalski, Predicting New Car Registrations: Nowcasting with Google Search and Macroeconomic Data, [in:] Sł. Partycki (ed.),  E-społeczeństwo w Europie Środkowej i Wschodniej. Teraźniejszość i perspektywy rozwoju (e-Society in Middle and Eastern Europe. Present and Development Perspectives), Wydawnictwo KUL, Lublin 2015, p. 228-236.

Download the paper as pdf from SSRN:  Predicting New Car Registrations: Nowcasting with Google Search and Macroeconomic Data


Abstract

Based on search queries data and a macroeconomic index (PMI) we attempt to predict new car registrations. As the forecasting horizon is short, the modelling is performed in accordance with the idea of nowcasting. The study covers 48 monthly observations for sixteen car producers present on Polish market. The proposed model explains the level of new car registrations for the five major makes and allows to forecast the number of registrations for the current and next month.

Keywords: nowcasting, prediction, modelling, car, automotive, demand, registrations,  Internet, search, Google, Poland, CEE, PMI

 

Introduction

In modern economies there are several sources of data on real-time activities which may help in modelling the behaviour of various entities, such as consumers or businesses.  These sources of information include online auctions, parcel shipment companies, credit card or mobile operators, as they possess precise data on transactions in certain locations [4, p.1]. A special role among them is played by search engines which provide data on frequencies of their queries. Possibly the most popular is Google Trends presenting both number and location of chosen searches.  Availability of such data enables modelling in accordance with the idea of  nowcasting.

Nowcasting is defined as the prediction of the present, the very near future and the very recent past [2, p.4]. The reason for modelling the current or recent past events is the delay in availability of data, which in modern economies can amount to weeks or even months. The other circumstances increasing the application value of nowcasting are macroeconomic turbulences, great uncertainty and unique shocks, as they cause past values to lose their predictive power [8, p.4].  As Bańbura et al. state ‘nowcasting is based on exploitation of data which is published early and possibly at higher frequencies than the target variable of interest in order to obtain an ‘early estimate’ before the official figure becomes available’ [2, p.4].  The scope of nowcasted activities is wide and includes current changes in unemployment, private consumption or – beyond just the economic activities – development of infectious diseases [4, p. 2].

As mentioned above, search engines are a valuable source of data on various entities’ behaviour (mostly consumers’ behaviour) in modern economies. Their usefulness results from both their popularity and type of data gathered. Search engine queries, as opposed to questionnaires, are not biased by submitting false information aimed at creating a desired image. Below we present three conditions the fulfillment of which should increase the application value of nowcasting with search data.

  • The nowcasted activity is preceded with search engine queries.

The searching behaviour depends heavily upon the type of activity. Consumers are more likely to search information on products of high involvement or information related to some risk [5, p. 1340]. Under certain circumstances people tend to rely more on friends’ opinions and less on online searching information online (e.g. on local markets).

  • It is possible to identify search queries associated with the nowcasted activity.

The common problem is selection of queries not only typical for the activity, but also queries that precede it. Queries including brand or product name may be related not only to pre-purchase search, but also with post-purchase services or buying an used product. On the other hand, the list of specific queries related strictly to pre-purchase phase may be long and difficult to identify. Moreover, these phrases may be rarely entered and data on their search frequency may be not available.

(iii) Searches lead to a nowcasted activity to a similar extent.
In other words, searching consumers have similar purchasing potential. A consumer looking for information on a movie to watch in cinema is likely to do it only once (or not at all). An investor entering abbreviation of company quoted on stock exchange may in theory buy or sell any number of company’s stocks. In the first case, search queries are more likely to serve as a valuable predictor of the demand.  Demand nowcasting with Google search is probably easier on B2C than on B2B market as the size of transactions following the search varies less.  Some brands or products may however attract interests of consumers who do not intend to purchase them (e.g. prestigious brands, innovative products).

Interestingly Choi and Varian provide an example which demonstrates that the use of more sophisticated methods may help if the three above listed conditions are not met [4, p. 2].    They modeled a confidence index for Australian consumers by identification of phrase categories, which frequency of entering is correlated with the historical levels of the consumer confidence. The prediction of the consumer confidence index is based on the assumption that the identified correlations will persist in the future.

 

Nowcasting of automotive markets

We decided on car registrations as the modelled variable for the following two reasons. First, data on car registrations are available with monthly frequency thus offering relatively long time series. This is not typical; many macroeconomic data are available in quarterly or yearly intervals only. And second, scope of these data includes 20 bestselling car manufacturers in a given month and offers a wide cross-section through Polish new car market.

The customer behaviour on automotive market imperfectly meets the three conditions mentioned in the previous section. The potential buyers are likely to conduct search queries before the purchase in order to recognize the vehicle parameters or find out the dealer’s location. The names of the car makes are the phrases associated with the purchase. Unfortunately, these phrases as the broad matches may refer also to other activities (e.g. looking for spare parts). As we decided on modelling both consumer and business car registrations, a single search may lead to the purchase of more than one car. However, the great majority of Polish companies are small and medium businesses, so we can safely assume that an average transaction will include rather small number of vehicles.

The automotive market was the subject of a number of nowcasting analyses. Sun, Li, Li and Zhang  forecast automotive demand in China with macroeconomic, price, consumer and other factors (i.e. sales of competitive automobile types, advertising investment) [9, p. 431]. The category of consumer factors includes consumer satisfaction index as well as searching index based on queries of Baidu, the leading Chinese search engine. Their model offers more accurate car sales prediction than popular benchmark models, especially during market fluctuations. Researching the Chilean automotive market, Carrière-Swallow and Labbé create Google Trends Automotive Index, which together with an autoregressive component form the set of explanatory variables. The proposed model also outperforms and provides forecasts more rapidly than benchmark models [3, p. 5].

Results obtained in our previous article point to importance of autoregressive component (that is, lagged number of car registrations) and Internet search queries in modelling number of passenger car registrations in Poland. In case of four automotive brands (i.e. Fiat, Opel, Skoda, Toyota) autoregressive component and search data were two major factors influencing number of first registrations. The registrations of the remaining two brands (i.e. Peugeot, Renault) can be explained with previous level of registrations and car manufacturer web site traffic [6, p.6].

In this paper, we add a macroeconomic component to verify dependence of car registrations on the general situation in the economy. We also extend the scope of research to 16 best-selling brands on Polish market.  The purpose of this paper is to model new car registrations with the Google search and macroeconomic data and to evaluate forecasting quality of the nowcasting models.


Description of data

Our sample covers 48 monthly observations from January 2011 to December 2014.[1] Monthly data on new registrations of passenger cars are provided by the Polish Association of Automotive Industry (PAAI)[2] on the basis of the Central Register of Vehicles database administered by the Ministry of the Interior. A one-month lag is allowed by Polish law between the purchase of a private car and its registration. Current data on number of registrations becomes available on the PAAI webpage around the 5th day of the next month. Due to availability of data, the following 16 passenger car makes were included in the initial empirical analysis: BMW, Citroen, Dacia, Fiat, Ford, Honda, Hyundai, Kia, Nissan, Opel, Peugeot, Renault, Skoda, Suzuki, Toyota, and Volkswagen. Together, they constitute 85% of new car registrations in Poland.

Relative numbers of queries pertaining to car makes are defined in a way proposed by Choi and Varian [4, p.3]. Original search data is defined relative to BMW registrations in the week of June 6 –14, 2014. For aggregated monthly series, data is rescaled relative to the maximum monthly query in the period analysed (that is, number of BMW queries in June 2014, equal to 280.71) and defined in percentage terms.

Table 1 presents each car producers’ share in total volume of searches and ratio of number of searches to number of registrations (henceforth, S/R ratio). The shares of car producers in searches are calculated as the ratio of shares of particular producer to the sum of searches of all producers. Thus, they sum to 100%. S/R ratio is calculated by dividing the producer’s share in Google searches by its share in registrations.  If the ratio exceeds 1, the make is proportionally more often searched than it is registered. In case of BMW and Honda, the ratios are the highest. This illustrates that these brands to the highest extent attract interest of consumers who do not follow with actual purchase. On the other hand, Dacia, Volkswagen and Skoda are the producers who are relatively rarely searched as compared to their number of registrations. The diverse behaviour can be explained with the fact that BMW and Honda can be perceived as aspirational brands, while the latter three makes offer rather utilitarian than symbolic value, and thus draw proportionally less interest.


Table 1. Car manufacturers’ shares in Google searches and ratios of number of searches to number of registrations (S/R)

 

Shares inGoogle searches S/R ratio
BMW 12% 4.98
Citroen 8% 3.06
Dacia 4% 1.73
Fiat 8% 1.63
Ford 7% 1.47
Honda 12% 1.37
Hyundai 10% 1.16
Kia 5% 1.15
Nissan 5% 0.94
Opel 6% 0.90
Peugeot 7% 0.74
Renault 4% 0.58
Skoda 3% 0.46
Suzuki 6% 0.44
Toyota 3% 0.29
WV 1% 0.28
Total 100%

Source: authors’ calculations

Our previous results [6, p.6] suggest that web traffic variables and seasonal effects do not exhibit statistically significant or economically substantial influence on number of car registrations in Poland, and autoregressive component and search data remain the major factors in explaining the dependent variable. To extend our analysis, we add Purchasing Managers’ Index (PMI) to our set of regressors to account for the impact of macroeconomic environment on car registrations. PMI data, published by the Polish economic portal Bankier.pl,[3] becomes available on their webpage with one-month lag and free of charge. In comparison with other macroeconomic data PMI may be considered current and easily accessible, and our preliminary data analysis suggested that it is better suited to modelling car registrations than indicator of business conditions in retail trade. It also reflects the changes of activities related rather to B2B than B2C market, as opposed to data on frequency of search queries which may over-represent the consumer purchasing behaviour. We do not include explanatory variables taken from the Polish car market, for two reasons. First, autoregressive component in our models takes account of recent (lagged one month) number of car registrations; and second, car market data is not easily accessible on the Internet, and therefore less useful for the purpose of real-time analysis.

 

Estimation results

Based on the average number of registrations per month, car makes considered for empirical analysis can be grouped into three categories of car sellers: major (five makes), medium (three makes), and small (the remaining eight car makes; see Table 2).

 

Table 2. Average number of car registrations in 2011-2015

 

Make Category
of  producer
Averagemonthly

registrations

1. Skoda major 2 958
2. Volkswagen major 2 075
3. Toyota major 1 904
4. Opel major 1 763
5. Ford major 1 724
6. Renault medium 1 288
7. Hyundai medium 1 251
8. Kia medium 1 260
9. Nissan small 1 020
10. Peugeot small 999
11. Fiat small 985
12. Citroen small 825
13. Dacia small 795
14. Honda small 523
15. Suzuki small 506
16. BMW small 502

Source: authors’ calculations

 

We found that results of the subsequent stages of empirical analysis are very dependent on the number of car registrations. For medium and small producers we did not find an economically valid and statistically significant dependence of number of car registrations on PMI, and a limited one only on lagged search queries. However, for the five major players, the results look more promising.

Linear models with HAC standard errors (to account for serial correlation in the error term) have been estimated for five dependent variables describing number of first registrations of Ford, Opel, Skoda, Toyota and Volkswagen. Results are summarised in Table 3. AR(1) component (that is, number of car registrations lagged one month) and internet search data lagged two months were included on the basis of our previous analysis of car registrations data. Also, PMI data lagged two months were added; the lag is meant to account for one-month delay in publishing the index plus an additional one-month lag before it can be reflected in car registration numbers because of the delay allowed by Polish law between the purchase of a vehicle and its registration.

 

 

Table 3. Summary of estimation results

 

Ford Opel Skoda Toyota Volkswagen
constant −3778.03 ** −2451.22 ** −3339.64 ** −2667.55  * −1102.73
AR(1) 0.281 ** 0.348 ** 0.394 ** 0.457 ** 0.563 **
St-2 18.190  * 24.497 ** 69.425 ** 49.779 ** 79.434
PMIt-2 79.708 ** 39.612 ** 53.611 ** 36.241 14.579
R2 0.525 0.421 0.485 0.448 0.451
RESET p-value 0.745 0.341 0.592 0.206 0.157
normality p-value 0.084 0.016 0.956 0.900 0.101
maximum VIF 1.234 1.189 1.287 1.297 1.099
omit St-2: improved information criteria 2/3 0/3 0/3 0/3 0/3

** – coefficient statistically different from zero at 0.05 significance level; * – coefficient statistically different from zero at 0.10 significance level

Source: authors’ calculations

 

The strongest explanatory power is exhibited by the autoregressive component (that is, number of registrations lagged one month) and search queries lagged two months. Lagged dependent variable has positive and statistically significant influence in all cases, and so does lagged search data, with the sole exception of Volkswagen. Hypothesis of dependence of car registrations on Purchasing Managers’ Index is confirmed in three out of five cases: for Ford, Opel and Skoda.

All five models are characterized by satisfactory coefficients of determination, comparable to those obtained in previous research, and all are correctly specified according to the RESET test. All but one (that is, Opel) have normally distributed standard errors when tested at the 0.05 significance level, and since sample size is adequate, absence of normality of standard errors does not negatively influence estimation results. There is no multicollinearity in any of the models. In addition, we conducted the omitted variable test for lagged search queries to verify whether ignoring internet search data improves statistical quality of the estimated models. In only one case (that is, Ford) two out of three information criteria point to higher quality of the reduced model; in four remaining cases, models with search variable perform better than models without it.

 

Evaluation of forecasting quality

 

To assess forecasting quality of the models, they were re-estimated on the basis of a limited sample: from January 2011 to June 2014 (that is, on the basis of 42 observations). The remaining six months of 2014 were used to evaluate forecasting quality of the models using mean errors (ME) and mean absolute percentage errors (MAPE). Results are reported in Table 4.

 

Table 4. Measures of forecasting quality

 

Ford Opel Skoda Toyota Volkswagen
ME -160.12 316.57 -200.57 346.79 122.23
MAPE 12.8% 17.3% 15.8% 14.6% 13.8%

Source: authors’ calculations

 

 

Results presented in Table 4 suggest that number of registrations of Ford and Skoda are somewhat overestimated by the models, and those of Opel, Toyota and Volkswagen – slightly underestimated. However, size of bias does not appear substantial as compared to actual number of registrations. In-sample forecast errors as measured by mean percentage absolute errors range from 12.8% for Ford to 17.3% for Opel. As far as we are aware, there exist no similar studies for the Polish market to compare our results with but we consider them acceptable.

 

Discussion and limitations

 

Empirical analysis of passenger car registrations suggests a notable disparity between small/medium and major car producers. For the first category of car makes, we were not able to define and estimate economically meaningful and statistically significant relationships with lagged search queries and Purchasing Managers’ Index representing the macroeconomic environment. It seems that registration volumes of minor car producers are not directly influenced by aggregated economic variables, and it is generally difficult to fit a model which would meet standard quality criteria.

For the five major sellers, we find that number of registrations of Ford, Opel, Skoda and Toyota cars are adequately explained by the autoregressive component, internet search data lagged two months, and – with the exception of Toyota –  Purchasing Managers’ Index lagged two months. It appears that major new car sellers share a common pattern of dependence of their registration numbers on internet search and macroeconomic factors. It is interesting to note that Volkswagen registration numbers are statistically significantly influenced only by the autoregressive component.  It seems that web searches of this car make do not fall in step with purchasing decisions. As shown in Table 1, Volkswagen is characterized by one of the lowest ratios of number of searches to number of registrations.

Forecasting quality of the models constructed for the five major sellers seems satisfactory. We also found that among the estimated coefficients, the ones that exhibited the largest instability when shortened sample was used were those associated with lagged search variables. This may suggest that search data, being volatile and subject to major variation from month to month, reduces stability of results of econometric analysis and therefore presents additional challenges when used for nowcasting.

Estimation results show that Google search and macroeconomic data can be used to explain number of registrations of 5 major car producers in short term (i.e. one month). The estimates are based upon publicly available data and do not require any insider knowledge. The nowcasting estimation can also serve to assess the current sales level of major producers, as the sales precede the registrations with about two weeks (to be exact: from zero to four weeks). This conclusion remains in accordance with the definition of nowcasting which is the prediction of the present [4, p. 2; 2, p.4].

Nowcasting models which use search data may serve as a source of marketing insight on current trends in consumer purchasing behaviour. Knowledge of this type is difficult to acquire via traditional research methods such as surveys or in-depth interviews. As mentioned above, nowcasting with Google search data also helps to identify changes in consumer purchasing behaviour. This is especially useful for businesses requiring sustaining efficient infrastructure or extensive resource planning.

Our results, as well as conclusions of other studies [8, 7, 3] show that modelled activity is explained also with the autoregressive component (i.e. level of the activity from previous period). Thus nowcasting seems to be more feasible in industries in which data on subjects’ behaviour is recorded and publicly available. If the data is collected in higher frequencies (e.g. weeks instead of months), nowcasting models may provide faster and more accurate results.

The research is burdened with following limitations. Data on popularity of Google searches are so called “broad matches” presenting the volume of all searches including the given keyword (here: automotive make). Thus they include also queries not related with purchasing new car, but referring to e.g.  spare parts or used cars.

Furthermore, in this paper we attempt to model the number of car registrations conducted by both consumers and businesses. Many models using Google search data as explanatory variable refer to private consumption [8]. Data on registrations on separate markets are available, however in shorter time queries.  In the future it will be possible to model private car registrations and hopefully achieving better results. On the other hand, the current research reflects the entire sale volume and thus is of greater practical use.

Among other potentially productive directions of further analysis we would suggest readdressing the question of factors influencing number of registrations for smaller car producers since they seem to differ from those influencing registrations of major players; and searching for additional macroeconomic explanatory variables that would be available easily, free of charge and (almost) in real time.

 

Literature

 

  1. Askitas N., Zimmermann K.F. (2009). Google Econometrics and Unemployment Forecasting. Applied Economics Quarterly, 55 (2), 107-120.
  2. Bańbura M., Giannone D., Modugno M., Reichlin L. (2013). Now-Casting and the Real-Time Data Flow, Working Papers, European Central Bank, no. 1564.
  3. Carrière-Swallow Y., Labbé F. (2013). Nowcasting with Google Trends in an Emerging Market. Journal of Forecasting, 32 (4), 289–298.
  4. Choi H., Varian H. (2011). Predicting the Present with Google Trends. Economic Record, 88: 2–9. doi: 10.1111/j.1475-4932.2012.00809.x
  5. Dholakia U.M. (2001). A motivational process model of product involvement and consumer risk perception, European Journal of Marketing, vol. 35 iss: 11/12, pp.1340 – 1362.
  6. Doligalski T., Tomczyk E. (2015). Nowcasting New Car Registrations with Google Search Data and Car Manufacturers’ Website Traffic. Working paper.
  7. Li N., Peng G., Chen H., Bao. J. (2013). A Prediction Study on E-commerce Orders Based on Site Search Data. 6th International Conference on Information Management, Innovation Management and Industrial Engineering, 2, 314-318.
  8. Schmidt T., Vosen S. (2009). Forecasting Private Consumption: Survey-based Indicators vs. Google Trends. Ruhr Economic Papers, 155.
  9. Sun B., Li B., Li G., Zhang K. (2013). Automobile Demand Forecasting: An Integrated Model of PLS Regression and ANFIS. Advances in Information Sciences & Service Sciences, 5(8), 429-436.

 

 

[1] With the exception of Honda registrations data which are available up to September 2014 (45 observations).

[2] Polish Association of Automotive Industry, http://www.pzpm.org.pl/en, [2015.04.02].

[3] http://www.bankier.pl/gospodarka/wskazniki-makroekonomiczne/pmi-polska-pol, [2015.03.10].

Nowcasting New Car Registrations with Google Search Data and Car Manufacturers’ Website Traffic

Tymoteusz Doligalski, Emilia Tomczyk, Nowcasting New Car Registrations with Google Search Data and Car Manufacturers’ Website Traffic, paper accepted at the 6th EMAC Regional Conference, Vienna 2015.

 

Abstract: The purpose of this paper is an attempt to nowcast (here: to predict in a short time horizon) new car registrations in Poland based on data of Google search queries and website traffic of car manufacturers. The study covers 47 monthly observations for six automotive makes. The strongest explanatory power is exhibited by the autoregressive component (number of registrations lagged one month), followed by the number of search queries. The website traffic of car manufacturers significantly influences the number of registrations in two out of six cases.

Keywords: nowcasting, prediction, car, automotive, Internet, search, Google, website traffic, Poland, CEE


Introduction

Nowcasting is defined as “the prediction of the present, the very near future and the very recent past” (Bańbura, Giannone, Modugno, and Reichlin, 2013, p. 4). The reason for nowcasting is the delay in availability of data, which in modern economy can amount to weeks or even months. The other circumstances increasing the application value of nowcasting are macroeconomic turbulences, great uncertainty and unique shocks, as they cause past values to lose their prediction power (Schmidt & Vosen, 2009).  As Bańbura, et at. state nowcasting is based on ‘exploitation of information which is published early and possibly at higher frequencies than the target variable of interest in order to obtain an ‘early estimate’ before the official figure becomes available’ (2013, p. 4).

According to Choi and Varian (2011), nowadays there are several sources of data on real time economic activities which may help in predicting the present (as opposed to predicting the future).  Possibly the most often used is data from Google Trends presenting number and location of chosen search queries.  The data may be used to identify the current changes in unemployment, private consumption or – beyond the economic activities – development of infectious diseases (Choi & Varian, 2011).  The other sources of information are parcel shipment companies or credit card operators, as they possess precise real-time data on transactions in certain locations.

There exists another source of data which can be used in nowcasting. This is the data on website traffic. Usually these data are fragmentarized. The website owner possesses thorough knowledge on his or her traffic, but does not know the traffic of other websites. There exist however research entities that provide data on traffic of various websites in a certain category. Megapanel PBI/Gemius is such a research program. It monitors online behaviour of Polish internet users, thus providing monthly data on traffic on most popular websites in Poland. This kind of data meets the above mentioned requirements of nowcasting.

The purpose of this paper is an attempt to nowcast new car registrations in Poland based on the data of Google search queries and website traffic of car manufacturers. Usefulness of web search data in predicting behaviour of economic variables has already been noted in literature (Askitas & Zimmermann, 2009; Choi & Varian, 2011; Li, Peng, Hang, Jiaxing, 2013). A few publications on nowcasting concern the automobile markets (Choi & Varian, 2011; Sun, Li, Li, Zhang, 2013; Carrière-Swallow & Labbé, 2013). Their common approach is the use of search data as the independent variable. The data on car manufacturers’ website traffic – to our best knowledge – has not served as predictor of car sales or registrations yet.

Description of data

Our sample covers 47 monthly observations from January 2011 to November 2014. Monthly data on new registrations of passenger cars[1] are provided by the Polish Association of Automotive Industry (PAAI)[2] on the basis of the Central Register of Vehicles database administered by the Ministry of the Interior. A one-month lag is allowed by Polish law between the purchase of a private car and its registration. Current data on number of registrations becomes available on the PAAI webpage around the 5th day of the next month. First registrations of makes of passenger cars included in the empirical analysis (that is, Fiat, Opel, Peugeot, Renault, Skoda, and Toyota) are coded with variables starting with the letter R; for example, R_fiat stands for the number of Fiat cars first registered in a given month. Seasonal effects are expected: so-called summer inertia, that is, lower numbers of first registrations in the summer months (June, July and August), and higher end-of-year sales in the winter months (November, December and January).

Relative numbers of queries pertaining to car makes are coded by variables starting with the letter S and defined in a way proposed by Choi and Varian: “The query index is based on query share: the total query volume for the search term in question within a particular geographic region divided by the total number of queries in that region during the time period being examined. The maximum query share in the time period specified is normalized to be 100 and the query share at the initial date being examined is normalized to be zero.” (Choi & Varian, 2011, p. 3). Original search data is defined relative to Opel sales in the week of March 31 – April 6, 2013 (the maximum query share in the period analysed). For aggregated monthly series, data is rescaled relative to the maximum monthly query in the period analysed, that is, number of Opel queries in October 2014, equal to 396.43. For example, S_fiat stands for the percentage of queries on Fiat cars in a given month relative to the maximum level of 396.43.

Traffic variables, coded with variables starting with the letter T, reflect the number of unique visitors of car manufacturers’ websites in a given month. This type of data has not been previously used to explain first registrations or other sales numbers. The source of traffic time series is Megapanel PBI/Gemius which monitors the online behaviour based on a panel of Polish internet users. The data on website traffic is subject to estimates and cannot be interpreted in a straightforward way but it allows to compare various websites visited by Polish consumers. Data provider records missing values when number of visits is lower that a minimum defined level (in case of car manufacturers, 40,000 hits); for the purpose of this paper, two approaches to missing data were undertaken:

  • imputation of the “lower bound” value of 40,000 visits,
  • calculation of an average of two months preceding and two month following a missing value; in special case of Renault, missing value for November 2014 is calculated as an average of four preceding months.

Results of empirical analysis (see next section ) show that treatment of missing values does not influence outcomes in a significant way.

We limit our dataset to internet data (that is, search and traffic data) and lagged values of the dependent variable (car registrations), omitting car market data and macroeconomic variables. The rationale for this approach is that internet data are available almost in real time, and car registrations data become accessible speedily, on the 5th of the next month. On the other hand, macroeconomic data are published with at least two-month lags which limits their application for nowcasting.

 

Empirical results

Linear models with HAC standard errors (to account for serial correlation in the error term) have been estimated for six variables describing number of first registrations: Fiat, Opel, Peugeot, Renault, Skoda, and Toyota. Results are summarised in Table 1. The strongest explanatory power is exhibited by the autoregressive component (that is, number of registrations lagged one month); it is the only regressor that is statistically significant at the 0.05 significance level in all six models, and the size of the estimated coefficients vary from 0.314 for Toyota to 0.642 for Fiat.

As far as search data is concerned, in four of the models (for Fiat, Opel, Peugeot and Skoda) search variables lagged either one (in case of Peugeot) or two months exhibit positive and statistically significant influence on the number or registrations. This result is consistent with the one-month delay between car sale and its registration allowed by Polish law, taking into account that additional delay may be expected between internet search and actual signing of the contract.


Table 1. Summary of estimation results

Fiat Opel Peugeot Renault Skoda Toyota
AR(1) 0.642 0.396 0.382 0.399 0.485 0.314
St 64.987
St-1 -25.100
St-2 25.285 22.428 49.425
Tt-1 0.749 1.085
summer dummy -348.374 -354.485
winter dummy -117.006
R2 0.485 0.371 0.368 0.301 0.503 0.533
RESET p-value 0.189 0.527 0.068 0.503 0.677 0.170
normality p-value 0.000 0.166 0.043 0.780 0.927 0.924
maximum VIF 1.004 1.081 1.657 1.000 1.154 1.557

All variables are statistically significant at 0.05 level.

 

Traffic variables do not exhibit systematic and statistically significant influence on the number of car registrations for any lag considered in the models. In two cases (that is, Peugeot and Renault) traffic variable lagged one month is statistically significant; however, these two models exhibit the lowest coefficients of determination and their descriptive value is therefore limited.

The models provide only limited support for the hypothesis of seasonal behaviour of car registrations. Summer dummy variable estimated coefficient exhibits its expected negative sign (for summer consumer inertia) and is statistically different from zero in two cases only, for Skoda and Toyota. Winter dummy variable is only statistically significant (but with negative coefficient which contradicts the expectations of high end-of-year sales) in the Peugeot model.

As far as general statistical quality of the estimated models is concerned, they are correctly specified according to the RESET test. All but two (for Fiat and Peugeot) have normally distributed standard errors, and since sample size can be considered sufficient, lack of normality does not negatively influence estimation results. There is no multicollinearity in any of the models. Treatment of missing values in traffic data does not influence the general conclusion that traffic numbers do not have statistically significant impact on the dependent variable.

 

Discussion

There are two major factors influencing number of first registrations: autoregressive component and search data. The result is coherent with Choi and Varian’s conclusion, as “simple seasonal AR models that include relevant Google Trends variables tend to outperform models that exclude these predictors by 5% to 20%.” (Choi & Varian, 2011, p. 8).

What remains to be explained are the differences in lag lengths between internet search and car registration. The lag may be non-existent (for Renault, where search variables do not exhibit significant impact at all, and Toyota, where only current value does), equal to one month (for Peugeot) or to two months (for the three remaining car makes). Certain delay between search and registration is expected but should be similar for all car makes; to explain the differences, factors such as length of order fulfilment and sales policies of car manufactures should be taken into consideration.

Econometric analysis of car registrations data suggests that there subsets of car makes may be distinguished: Fiat and Opel (of which registration numbers seem to follow similar patterns based on search data lagged two months); Skoda and Toyota (which exhibit significantly lower registrations in the summer months); and Peugeot and Renault, where influence of lagged traffic data may be observed. Otherwise, car manufacturer website traffic appears to have limited predictive value. Interestingly, in these two cases the search factor is either not-existent (Renault) or negatively correlated with number of registrations (Peugeot). For a Polish consumer these two brands are more difficult to spell than other included in the research. As we included only properly spelled brands, the search data might not have fully reflected the number of queries. The result may suggest that website traffic may be considered as a predictor when search data is unavailable or clear attribution of search queries with the nowcasted activity is problematic.

This study is burdened with the following limitations. Its purpose was to nowcast new passenger car registrations, performed both by consumers and businesses. Often Google search data is used for prediction of private consumer activities. Due to limited availability of detailed data we attempted to nowcast all passenger car registrations. It might have resulted in lower prediction quality, but it is of greater practical use as it reflects the entire sale volume.

Macroeconomic variables (e.g. consumer satisfaction index, general business conditions index) and automobile market data were not included in the research. They are published with a delay, thus do not meet nowcasting requirements.  However,  lagged data of this kind could be included into more sophisticated models.

 

Acknowledgment

The research project would not have been possible without the support of Polskie Badania Internetu Sp. z o.o (PBI). The company provided us with the data on website traffic of car manufacturers in the period from January 2011 to November 2014. The research program Megapanel PBI/Gemius presents the behaviour of Polish Internet users based on a study Net Track Millward Brown SMG/KRC, conducted on a sample selected and weighted by PBC.

 

References

Askitas N., Zimmermann K.F. (2009). Google Econometrics and Unemployment Forecasting. Applied Economics Quarterly, 55 (2), 107-120.

Bańbura M., Giannone D., Modugno M., Reichlin L. (2013). Now-Casting and the Real-Time Data Flow, Working Papers, European Central Bank, no. 1564.

Carrière-Swallow, Y., Labbé, F. (2013). Nowcasting with Google Trends in an Emerging Market.  Journal of Forecasting, 32 (4), 289–298.

Choi H., Varian H. (2011).  Predicting the Present with Google Trends. Economic Record, 88: 2–9. doi: 10.1111/j.1475-4932.2012.00809.x

Li N., Peng G., Chen H., Bao. J. (2013). A Prediction Study on E-commerce Orders Based on Site Search Data. 6th International Conference on Information Management, Innovation Management and Industrial Engineering, 2, 314-318.

Schmidt T., Vosen S. (2009). Forecasting Private Consumption:  Survey-based Indicators vs. Google Trends. Ruhr Economic Papers, 155.

Sun B.; Li B.; Li G; Zhang K. (2013). Automobile Demand Forecasting: An Integrated Model of PLS Regression and ANFIS. Advances in Information Sciences & Service Sciences,  5(8), 429-436.

[1] For legal reasons, the category of personal cars includes also cars with cargo compartment (Polish: samochody z kratką) in the period January-July 2014.

[2] Polish Association of Automotive Industry, http://www.pzpm.org.pl/en, [2015.04.02].