The FDZ is now celebrating an anniversary: for twenty years, the Research Data Centre (FDZ) of the German Federal Employment Agency (BA) at the Institute for Employment Research (IAB) has provided external researchers with access to microdata for the purpose of non-commercial research. IAB-Forum took this as an opportunity to talk to two women who have provided direction and support for the work of the FDZ for many years: Dana Müller, Head of the FDZ, and Monika Jungbauer-Gans, Chair of the German Data Forum (RatSWD).

Ms Müller, first of all congratulations on the anniversary! To start with, we’d like to ask you what you feel is special about the FDZ?

This is a portrait picture of Dana Müller.

Dana Müller has been head of the Research Data Center (FDZ) of the Federal Employment Agency at the IAB since October 2016.

Dana Müller: The name “Research Data Centre” actually says it all – what’s special about the FDZ is our data. It’s truly a unique resource for labour market and occupational research. The law states that this data is to be made available for research, and it’s our job to make this happen.

We create standardised data products from administrative data provided by the Federal Employment Agency. This enables researchers worldwide to answer their research questions. My colleague in Data Management always remarks that what we do at the FDZ is to “refine” the data. We also provide access to IAB surveys and link them with administrative data to increase their analytic potential.

What are the biggest challenges or milestones that the FDZ has faced over the last 20 years?

Müller: 20 years sounds like a very long time – but for me it’s actually gone by fairly quickly. The biggest challenge from my point of view was setting up the data infrastructure. We started out in 2005 with a single guest workstation in the office of one of my colleagues. This enabled external researchers to work with our data on-site for the first time. Over the years we’ve created many additional ways to access the data. Safe rooms where researchers can access our data are now available in Germany, other European countries, the USA and Canada.

Another challenge is the fact that our social data is subject to special protection. Data privacy is a very important aspect: that’s why our projects can’t be implemented overnight – they simply take time.

One milestone that we have achieved is linking surveys to the administrative data. We started 20 years ago with the Linked Employer-Employee data sets, linking the responses of participating employers from the IAB Establishment Panel to administrative data of the people employed at those companies. For the first time, this enabled researchers to analyse the supply and demand side of the labour market at the same time.

Jungbauer-Gans: “My colleagues in Nuremberg would often come to the IAB to work on the data at a guest workstation.”

On the picture you can see a portrait of Monika Jungbauer-Gans

Prof. Monika Jungbauer-Gans teaches at the Friedrich-Alexander University Erlangen-Nuremberg, heads the German Centre for Higher Education Research and Science Studies (DZHW) and is Chairwoman of the German Data Forum (RatSWD).

Ms Jungbauer-Gans, how did you first come into contact with the BA’s FDZ at the IAB?

Monika Jungbauer-Gans: That actually does go back quite a long way. When I held a professorship at the University of Kiel I was in touch with the IAB’s regional research group there. We were doing a joint analysis of diversity in companies as part of two projects funded by the German Research Foundation (DFG). For this purpose we were using precisely the IAB’s linked data set that Dana Müller just mentioned. While these projects were running I moved to the University of Erlangen-Nuremberg, which of course fitted in perfectly. My colleagues in Nuremberg would often come to the IAB to work on the data at a guest workstation. That wouldn’t have been so easy to do from Kiel at the time.

So originally you were first and foremost an FDZ user?

Jungbauer-Gans: Yes, exactly. Later, as a member of the IAB Scientific Advisory Council, I was even more aware of the work being done of course, and that included the Institute’s work with data.

You’re also the Chair of German Data Forum (RatSWD), which is dedicated to improving access to high-quality data for research purposes. Where do you currently see a particular need to catch up in terms of data access, the interlinking of data records, and data infrastructure?

Jungbauer-Gans: I see a major problem in the fact that in Germany it’s not per se possible to link anonymised data in complicance with data privacy regulations, as is the case in Scandinavian countries or in the Netherlands. In Germany data is not viewed in cross-sectoral terms, it’s seen from the point of view of individual policy areas. The German Social Code only regulates the protection of labour market data, for example. It would be very worthwhile to finally be able to link education data to labour market data, for instance.

Is that possible in the Netherlands and in Scandinavia?

Jungbauer-Gans: A lot of data can be linked in Scandinavia, including register data. Depending on the question, you can link labour market and health data, for example, or data on schools and teacher qualifications. Here in Germany, these data spaces are worlds apart.

Jungbauer-Gans: “In Germany, we have to trawl through all kinds of things to find out what data is available where.”

Even though the mindset in the European Union is much more progressive, does the EU still tend to think more in terms of data spaces?

Jungbauer-Gans: The European Union has in fact been a relatively strong pioneer. The Data Governance Act says there should be a central point where you can check who has which data, for example. We’re a long way away from that in Germany. Here we have to trawl through all kinds of things to find out what data is available where.

But at EU level there’s also a tendency to focus solely on certain particular data fields such as health, mobility and traffic. There’s often a failure to look at the interlinking options here. As a researcher, you’re always dealing with questions that are relevant to a whole range of areas. If I were a pharmaceutical company, health data would be enough for me, but if I want to analyse the impact of the coronavirus pandemic on commuter mobility, I need two data spaces.

As one of 41 accredited data centres, the BA’s FDZ is involved in the process that aims to ensure easier data access for research purposes in Germany. How do you rate the work done by our FDZ to date?

Jungbauer-Gans: The BA’s FDZ is one of the pioneers among the research data centres. It was one of the first to be founded after those of the statistical offices, so it did essential groundwork in terms of data privacy requirements, anonymisation of data and data access, for example. That was a truly pioneering achievement.

How many new enquiries does the FDZ receive each year from researchers who have never worked with FDZ data before?

Müller: We have around 200 new users every year. Our data is used by young researchers, too: after all, you can use our data to write a bachelor’s thesis, a master’s thesis or a doctoral dissertation. At the same time, our data is also used by a lot of “old hands” who provide policy advice, so they know exactly whether our data is particularly well-suited to answering a particular research question compared to other sources.

Müller: “About half of our user projects have collaborators from academic institutions outside Germany.”

Do you get many requests for FDZ data from abroad?

Müller: By now, about half of our user projects have collaborators from academic institutions outside Germany. IAB data is used by well-known international researchers, too, such as Nobel Prize winner David Card.

How would a researcher based in a place like California access the FDZ’s valuable data? Would she have to fly to Nuremberg?

Müller: Not necessarily – although Nuremberg is always well worth a visit, of course. We have data access points worldwide; there has been one at the University of California in Berkeley since 2013, for instance. But it always depends on the question you’re asking and how much data you need. Our data cannot simply be downloaded.

Normally you have to start by filling out an application form online, and this asks you to meet certain requirements. Once we’ve checked whether the research question can be addressed with our data, we have to anonymise the data for all data access points outside Europe in accordance with the General Data Privacy Regulation and conclude a special data use agreement with the institution where the researcher is employed. Providing there are no hiccups, the researcher will be able to work with the data within a period of around six weeks.

Müller: “We’ve set our sights on improving access to confidential data in an international context – and that’s no small undertaking.”

Are there comparable research data centres in other countries which the FDZ cooperates with?

Müller: In Europe, we’re particularly involved with the International Data Access Network. This is a network of data providers who, like us, offer confidential data and are focused on enabling access to it across national borders.

Here we collaborate with the Central Agency for Statistics (CBS) in the Netherlands, for example, which is the Dutch equivalent of Germany’s Federal Statistical Office, if you like. Other institutions involved here are the French data providers Secure Data Access Centre (CASD) and also the UK Data Service (UKDS) and the Leibniz Institute for the Social Sciences (GESIS). We’ve set our sights on improving access to confidential data in an international context – and that’s no small undertaking.

Can you give us a specific example of why such collaborations are important?

Müller: If we’re talking about data links, it would be nice to be able to pool the data of two countries so as to be able to undertake a national comparison of Germany and France, for example. At the moment, this type of analysis has to be carried out separately for each country. It would be much easier if you could analyse the data simultaneously. This is what we do in the International Data Access Network.

The BA’s FDZ also cooperates at national level with other research data centres in Germany. Why?

Müller: Even though the research data centres in Germany are subject to different state laws – the Federal Statistical Office is covered by the Federal Statistics Act for example, while we’re regulated by the German Social Code – we can still learn and benefit from each other. If the data protection officers at the Federal Statistical Office have previously worked on a topic or created working aids, for example, we can adapt these for our own purposes. There’s no need to reinvent the wheel.

We can also get together to create new data products. We’re currently running a major collaborative project with the Leibniz Institute for Educational Trajectories’ (LIfBi) research data centre where we’re interlinking the data from the National Educational Panel Study and making it available. Another example is a cooperation with the research data centre at the Centre for European Economic Research: here we’ve linked administrative operating data which we’ll soon be able to make available.

Müller: “The FDZ’s own research is diverse, and it’s incredibly important in terms of enabling us to provide good advice on the data.”

The FDZ not only offers access to data and advice on its use, it also conducts research itself. Why?

Müller: We do our own research because that’s the best way to develop new data products: it enables us to take a closer look at data quality. If you set up research data products in a standardised way and you have a research project or a doctorate in progress at the same time, you delve deep into the data – and then you know how to group certain variables, for example. Or you find out that there are certain variables we shouldn’t investigate further for reasons of data privacy or quality. The FDZ’s own research is diverse, and it’s incredibly important in terms of enabling us to provide good advice on the data.

Big data has been the buzzword in recent years when it comes to collecting large amounts of data. Are these big volumes of data becoming increasingly important in research, too? And what is the role of the FDZ in this connection?

Jungbauer-Gans: I see big data as an interesting opportunity to obtain data in areas that we haven’t yet tapped into so well. However, the problem is that big data initially involves huge, unstructured and usually one-dimensional volumes of data. For example, loyalty card data shows how much a consumer buys, but reveals nothing else about them. It doesn’t tell us about their regional mobility or personal background. A researcher would first have to find an interesting question with which to analyse this enormous, unstructured mass of individual numbers.

Müller: “Big data is not the same as good data, and the effort it involves is enormous.”

Müller: Exactly, which data should be used will primarily depend on the research project. Big data is not the same as good data, and the effort it involves is enormous. The IAB collected mobility data via respondents’ smartphones with their consent, for example. That was millions of pieces of information. This data, was elaborately processed into mobility indicators, but it has its limitations – gaps in the data when a smartphone was switched off, for example.

This is where machine learning might come into play. How do you assess the potential of artificial intelligence in terms of data processing in general and for the FDZ in particular?

Müller: Our users already apply machine learning to our standardised data by searching for patterns, for example, and some researchers are experimenting with ChatGPT to help them write programme codes.

In the future, artificial intelligence might be useful in the service sector, too – for advising users, for example. So, these are issues we can’t avoid. But we need to keep a close eye on where developments are heading, and data privacy has to be taken into careful consideration, too.

Jungbauer-Gans: “It’s important to me not to glorify artificial intelligence and to always take a close look at what’s going on behind the scenes.”

Jungbauer-Gans: It’s important to me not to glorify artificial intelligence, but rather view it as a tool – and to always take a close look at what’s going on behind the scenes. We need to understand and assess the algorithms before we incorporate artificial intelligence in our data work. How does it select data, and what does that mean in terms of the subject matter? Quite simply, we have to be careful not to draw the wrong conclusions.

Ms Müller, looking ahead to the next 20 years: how would you like to see the FDZ develop going forward and where do you see the biggest challenge?

Müller: First of all, I have high hopes for the Research Data Act. We urgently need a legal foundation in Germany that strengthens research and improves our ability to link data to answer research questions. I’d like to see a link between our data and the data from the Family Benefits Office (Familienkasse), for example. Up to now, our administrative data has mainly enabled us to look at individuals. Human beings don’t live in isolation, however, but often in partnerships and families. Information on the latter could be provided by the Family Benefits Office, and this would significantly advance our research.

Ms Jungbauer-Gans, what recommendations do you have for the FDZ for the next 20 years?

Jungbauer-Gans: Generally speaking, it’s vital to remain open to innovation and to persevere in trying to improve the overall conditions in which we operate. But I don’t think that’s something we need to be telling the FDZ. Our colleagues are already doing this – and they provide a good example for other data areas.

More about the FDZ

Located in the Institute for Employment Research (IAB), the Research Data Centre (FDZ) of the Federal Employment Agency (BA) was founded in 2004 with the aim of improving the exchange of information between research and statistics. Its mission is to make access to BA and IAB microdata transparent and standardised for the purpose of non-commercial empirical research.

The FDZ not only provides access to data, it also helps researchers select, access and handle data. The FDZ’s consulting services include analysis options, range and validity of the data. A total of 1,409 researchers made use of the FDZ’s data products in the second half of 2023. The FDZ is not only a provider of data, however: it also conducts empirical research itself. Its focus here is on topics such as the linking of process and survey data, the internationalisation of data access, and content analyses relating to labour market research.

The FDZ promotes cooperation with national and international research institutions. It now has 20 locations with data access points in seven countries. In addition to Germany, these are the USA, France, England, Luxemburg, Canada, Spain and Poland.

Bild: beeboys/stock.adobe.com;

DOI: 10.48720/IAB.FOO.20240410.01

Winters, Jutta; Keitel, Christiane ; Deckbar, Laura (2024): “The Research Data Centre provides a good example for other data areas”, In: IAB-Forum 10th of April 2024, https://www.iab-forum.de/en/the-research-data-centre-provides-a-good-example-for-other-data-areas/, Retrieved: 5th of November 2024