{"id":1491,"date":"2025-12-04T09:49:51","date_gmt":"2025-12-04T07:49:51","guid":{"rendered":"https:\/\/2026.inimareng.ee\/aruanne\/%chapter%\/introduction-3\/"},"modified":"2026-06-09T08:00:57","modified_gmt":"2026-06-09T06:00:57","slug":"introduction-3","status":"publish","type":"article","link":"https:\/\/2026.inimareng.ee\/en\/aruanne\/hariduse-andmetarkus\/introduction-3\/","title":{"rendered":"Introduction"},"content":{"rendered":"\n    <div class=\"highlight-box highlight-box-purple p-8 xl:p-12 my-10\">\n                    <div class=\"mb-6 font-bold text-3xl uppercase text-purple\">KEY MESSAGES<\/div>\n        \n        <ul>\n<li><strong>In Estonia, the truism that administrative register data offer broad coverage but limited depth while surveys provide rich detail but limited coverage does not hold.<\/strong> High-quality research can be conducted directly using register data, and survey research can be strengthened through their integration.<\/li>\n<li><strong>Stronger data protection in education may inadvertently shield us from future opportunities.<\/strong> If sensitive administrative register data cannot be linked with education system data, we risk substantially underestimating problems within the system.<\/li>\n<li><strong>The underuse of register data generates recurring financial costs, while commissioning new surveys entails opportunity costs.<\/strong> Archiving virtual datasets created for register-based research would help reduce future research expenses.<\/li>\n<li><strong>The revolutionary use of microdata should be accompanied by a corresponding shift in policy impact assessment and monitoring.<\/strong> Alongside cross-sectional surveys, greater emphasis should be placed on automated evaluation of policy measures.<\/li>\n<\/ul>\n    <\/div>\n\n<h2 class=\"mb-6 text-3xl uppercase font-medium text-purple\">\n    INTRODUCTION<\/h2>\n<p class=\"wp-block-paragraph\">Data-driven decision-making and governance require the collection and storage of data, but their real value lies in using enhanced data to make better decisions. Data are not an end in themselves but a means of reducing uncertainty in future decisions and of assessing the efficiency and effectiveness of those already taken, allowing course corrections where necessary. In theory, this sounds straightforward; in practice, it is more complex. The first article in this chapter (Terje Trasberg, Marre Karu, Liina Osila and Kadri Rootalu) illustrates how data can be used more effectively, while the second (Eneli Kindsiko and Liis Roosaar) examines what happens when valuable data are left unused. The first draws on broader examples; the second focuses on a specific case and shows, using new data, how even a high-quality study may significantly underestimate developments in the education system.<\/p>\n\n<p class=\"wp-block-paragraph\">The broader aim of both articles is to demonstrate the substantial potential of Estonia\u2019s administrative register data to improve policymaking.<\/p>\n\n<h2 class=\"mb-6 text-3xl uppercase font-medium text-purple\">\n    LINKING REGISTERS AND ENHANCING SURVEYS<\/h2>\n<p class=\"wp-block-paragraph\">Administrative register data are collected to carry out public functions and typically cover the population as a whole. However, from the perspective of targeted research, such data are secondary and often limited in depth, containing only indirect or partial information relevant to specific research questions. Surveys, by contrast, are designed to answer clearly defined questions and usually gather highly detailed data. This richness, however, applies only to the survey sample, and findings must therefore be generalised to the wider population. Register data are generated automatically through the operation of state systems, whereas survey data are collected at considerable additional cost in response to identified research needs. In practice, this often means that data with broad population coverage but less detail are continuously updated, while more detailed data cover only a narrow sample and are either not updated at all or updated only infrequently through repeat surveys.<\/p>\n\n<p class=\"wp-block-paragraph\">Both types of data present distinct challenges. With register data, researchers may need to rely on proxy indicators where precise, purpose-built measures are unavailable. Registers also tend to capture behavioural outcomes rather than the motivations behind them. Survey data, in turn, raise concerns about sampling bias, the rapid ageing of data and, in the case of questionnaires, potential inaccuracies in self-reported information compared with administrative register data.<\/p>\n\n<p class=\"wp-block-paragraph\">In this chapter, we show that these generalisations do not apply in the Estonian context, as we have both the theoretical and technical capacity to link register data with one another and with survey data. The article by Terje Trasberg, Marre Karu, Liina Osila and Kadri Rootalu outlines several examples of how the limitations of both registers and standalone surveys can be addressed through linkage, thereby enhancing the value of the data. For example, in the recent register-based population census, place of residence was determined using around 20 data sources \u2013 that is, a more accurate indicator was constructed through the combination of proxy measures. Another approach is to link a purpose-designed survey with a register in order to collect separately only the data missing from the register and to incorporate directly those already available there as reliable indicators. For example, data from the Estonian Education Information System (EHIS) on ongoing studies are integrated into the Estonian Labour Force Survey to reduce the need for respondents to provide the same information again in full. <\/p>\n\n<p class=\"wp-block-paragraph\">Linkage can serve not only as a way to improve registers or surveys but also as a default principle to avoid collecting the same data repeatedly.<\/p>\n\n<p class=\"wp-block-paragraph\">The Public Information Act requires institutions collecting data for the performance of public functions to follow the once-only principle and to justify any exceptions. However, how can it be justified that, in surveys commissioned by these same institutions, the repeated collection of identical data remains common practice, while linkage with registers \u2013 that is, the application of the once-only principle \u2013 must be separately requested and justified? It would be reasonable to apply the once-only principle more consistently. Where data already exist in registers, it should be the decision not to link them that requires justification, rather than the linkage itself. <\/p>\n\n<h2 class=\"mb-6 text-3xl uppercase font-medium text-purple\">\n    DATA PROTECTION SHOULD NOT SHIELD US FROM A BETTER FUTURE<\/h2>\n<p class=\"wp-block-paragraph\">The article by Eneli Kindsiko and Liis Roosaar on the educational inequality hidden in PISA results offers a striking example: because an imprecise indicator is used, the impact of socioeconomic background is underestimated by roughly half. In the most recent PISA study \u2013 whose methodological quality is generally not in doubt \u2013 13% of the variance in mathematics test results is attributed to parental background. However, when mathematics examination results are linked with administrative register data on parents\u2019 income at aggregate level, mothers\u2019 or fathers\u2019 income explains 24\u201326% of the variance \u2013 approximately twice as much (see Article\u00a02.2, Table\u00a02.2.2). The difference is even more pronounced in Tallinn, where nearly 60% of the variance is explained in this way (see Table\u00a02.2.3). <\/p>\n\n    <div class=\"highlight-box highlight-box-purple p-8 xl:p-12 text-2xl xl:text-3xl text-brown font-semibold my-10\">\n        \n        In the case of surveys as well, the principle that data should be collected only once could be applied more consistently. Where data already exist in registers, it should be the failure to link them that requires justification, rather than their linkage.\n    <\/div>\n\n<p class=\"wp-block-paragraph\">How can the PISA results and the picture derived from administrative register data differ so markedly? The explanation lies in how PISA measures parental income: in the absence of a better option, pupils are asked about household possessions. While possessions may at times serve as a proxy for family wealth, in a context where administrative registers contain parents\u2019 actual income and education levels, it would be more appropriate to enable the use of these data rather than rely on pupils\u2019 assessments of proxy indicators. The article concludes: \u2018Educational inequality in Estonia has the face of a child who studies in an under-resourced school and comes from a disadvantaged socioeconomic background.\u2019 Although Estonia\u2019s internationally high PISA results \u2013 which are not in dispute \u2013 are a source of justified pride, the underlying inequality appears blurred, as if viewed through lenses that fail to bring the picture into focus. If, out of concern for children\u2019s and parents\u2019 privacy, we refrain from enabling data linkage, we may paradoxically be shielding these children from better educational opportunities in the future. The scale of educational inequality that is both growing and underestimated by roughly half directly influences whether and how education and social policy are directed to address it.<\/p>\n\n<h2 class=\"mb-6 text-3xl uppercase font-medium text-purple\">\n    UNDERUSED DATA, RECURRING COSTS AND RECURRING RETURNS<\/h2>\n<p class=\"wp-block-paragraph\">This Human Development Report is data-driven and presents several findings in the field of education based on data that have so far been underused. A central premise of data-driven decision-making and governance is that the real value of data lies in their repeated and varied use. Data collection and storage are pure costs, whereas linking datasets creates added value, and using them to inform better decisions generates a return. Terje Trasberg, Marre Karu, Liina Osila and Kadri Rootalu describe studies conducted entirely or largely on the basis of administrative registers as so-called extra-programme statistical work. Such projects produce temporary, unique virtual linked datasets, which are typically deleted once the study is completed. There are valid reasons for deletion, but an unavoidable consequence \u2013 beyond the initial cost \u2013 is that each similar study, or any study relying on the same data, generates new costs. This creates a recurring and significant opportunity cost in both time and money, as recreating the same virtual dataset requires new authorisations and reconstruction of the data, even though those resources could be used elsewhere. One solution would be to archive the virtual datasets created and, subject to approval by the relevant ethics committee, allow controlled access to them. This would reduce recurring data-use costs and increase the returns from existing data.<\/p>\n\n<h2 class=\"mb-6 text-3xl uppercase font-medium text-purple\">\n    FROM CROSS-SECTIONAL MEASUREMENT TO CONTINUOUS MONITORING<\/h2>\n<p class=\"wp-block-paragraph\">Milton Friedman argued that policies should be judged not by their intentions but by their results.<a href=\"#references\" id=\"reference-1\" class=\"reference-number\">1<\/a> However noble the objective, a policy that is ineffective or produces the opposite outcome is ultimately a poor policy. Since the 2010s, Europe has witnessed what can be described as a microdata revolution: raw administrative register data have increasingly been linked and used for analysis, particularly to assess the impact of policy measures. Yet even when based on linked datasets, impact assessments typically take the form of static reports, and each new question requires a separate evaluation. <\/p>\n\n    <div class=\"highlight-box highlight-box-purple p-8 xl:p-12 text-2xl xl:text-3xl text-brown font-semibold my-10\">\n        \n        If, out of concern for children\u2019s and parents\u2019 privacy, we refrain from enabling data linkage, we may paradoxically be shielding these children from better educational opportunities in the future.\n    <\/div>\n\n<p class=\"wp-block-paragraph\">In the conclusion of their article, Eneli Kindsiko and Liis Roosaar propose practical steps to address the identified educational inequality. Implementing such measures, however, presupposes a data-driven approach to impact evaluation. Administrative register data are generated automatically through the routine activities of public institutions, meaning that when policy impacts are assessed using register data, additional data collection costs are minimal or already covered by the register owner. We should therefore commission fewer static cross-sectional studies and place greater emphasis on the automated generation of knowledge from data that already exist. <\/p>\n\n<p class=\"wp-block-paragraph\">Accurately identifying the scale of educational inequality determines the form and scope of possible policy interventions, while their effectiveness can be assessed only through continuous, automated monitoring. It would be problematic to discover three or four years later, after spending millions of euros, that although a policy was well intentioned, its real-world effects were limited or even counterproductive. The need for course correction must be identified before substantial and irreversible costs are incurred.<\/p>\n\n<p class=\"wp-block-paragraph\">The reflections accompanying the articles in this chapter consider how to move forward in light of these challenges. Dan Bogdanov notes that the Official Statistics Act both permits and restricts data processing: Statistics Estonia is well equipped to process data but less able to share them. Liiri Oja explores how to balance data protection and data use through the proportionate application of legal principles and the use of technological solutions. At times, achieving change may require renaming an established institution and using that moment to redefine its substance and expand its mandate. Dan Bogdanov raises the idea of transforming Statistics Estonia into a Data Agency \u2013 a proposal that warrants discussion.<\/p>\n\n    <div class=\"highlight-box highlight-box-purple p-8 xl:p-12 text-2xl xl:text-3xl text-brown font-semibold my-10\">\n        \n        We should commission fewer static cross-sectional studies and instead prioritise the automated generation of knowledge from data that already exist.\n    <\/div>\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"featured_media":0,"parent":0,"menu_order":0,"template":"","chapter":[3],"class_list":["post-1491","article","type-article","status-publish","hentry","chapter-hariduse-andmetarkus"],"acf":[],"_links":{"self":[{"href":"https:\/\/2026.inimareng.ee\/en\/wp-json\/wp\/v2\/article\/1491","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/2026.inimareng.ee\/en\/wp-json\/wp\/v2\/article"}],"about":[{"href":"https:\/\/2026.inimareng.ee\/en\/wp-json\/wp\/v2\/types\/article"}],"wp:attachment":[{"href":"https:\/\/2026.inimareng.ee\/en\/wp-json\/wp\/v2\/media?parent=1491"}],"wp:term":[{"taxonomy":"chapter","embeddable":true,"href":"https:\/\/2026.inimareng.ee\/en\/wp-json\/wp\/v2\/chapter?post=1491"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}