FIDA
The Film Industry Data Repository
Introduction
The Film Industry Data Repository (FIDA) is a lifecycle-wide data infrastructure developed by the CresCine consortium to support evidence-based decision-making in Europe’s film sector, with a particular focus on small and mid-sized markets. FIDA brings together fragmented information from across the film value chain – from production and festival circulation to theatrical exhibition, streaming availability, and television programming – and integrates it into a single, interoperable analytical environment.
Designed for producers, distributors, sales agents, film agencies, and scholars alike, FIDA enables comparative analysis across countries, release windows, and time periods that are rarely visible in commercial analytics services. The repository combines public and open data with licensed industry data, cleaned, harmonised, and linked via a shared internal identifier to allow robust cross-source analysis while respecting licensing constraints.
FIDA integrates data from The Movie Database (TMDB), Wikidata, the World Bank, Lumiere (European Audiovisual Observatory), Cinando, International Showtimes, UsherU, the European Audiovisual Observatory Yearbook, media-press.tv, and curated CresCine festival datasets. Together, these sources cover production metadata, festival selections and awards, admissions and box office, cinema showtimes, streaming availability, television broadcasts, and socio-economic context.
Access to FIDA is provided through analytical dashboards and aggregated datasets, offering a durable evidence base for industry strategy, academic research, and film policy development across Europe. Access to FIDA dashboards and associated dashboards is free to all.
In the case of derivative analytical use, CresCine and FIDA must be credited, and this explanatory paper referenced: https://zenodo.org/records/17829236.
TEAM:
VOD Distribution of Films Produced in Europe
This dashboard provides an overview of how European films circulate in Video-on-Demand (VOD) services. The dashboard uses aggregated data from CresCine’s Film Industry Data Repository (FIDA) to observe when films enter VOD catalogues, how long they are available, and how distribution may differ across streaming services, production countries, and genres. Any user of the data will have to, in any related publication, refer to the paper explaining the database that can be found here: https://zenodo.org/records/17829236
The streaming data specifically was originally provided by UsherU, an Irish film industry data collection and analytics firm. FIDA includes 4 years (2021-2024) of monthly data on the availability of European films on VOD streaming services across the world. Streaming data has been enriched with data from other sources (TMDB, Lumiere Pro, Wikidata).
Explanation on acronyms used in the dashboard: EU Big 5 are Spain, Italy, Germany, France and Poland; EU Small Countries are all the rest of the EU countries; EC Small Countries are the rest of European small countries that are the members of the Council of Europe; Turkey, UK and Ukraine form a separate group as they are the large countries next to the EU; “International” refers to all other countries.
The overall aim of this dashboard is to provide an accessible analytical tool for researchers, policymakers, sales agents, and cultural institutions seeking to understand the reach and mobility of European film in VOD services.
Using a Python-based framework, the Box Office Simulation Tool explores the potential market performance of a film by estimating a range of expected box-office outcomes across different territories. It simulates a scenario in which a film, regardless of its background, is distributed and promoted in a given country as an average release – screened nationwide for a typical theatrical window and marketed using standard promotional practices. Users can model alternative exhibition scenarios by selecting combinations of production and distribution variables, including country of production, primary language, genre, target market(s), and estimated budget. This allows for comparative analysis of how different strategic choices may shape a film’s market prospects.
The Box Office Simulation Tool draws on data from FIDA, a film-industry data repository built on the cloud-based Databricks platform. FIDA integrates datasets from The Movie Database (TMDB), Wikidata, Cinando, Lumiere Pro, the European Audiovisual Observatory, International Showtimes, and festival film data collected within the CresCine project. When citing this tool, please reference the FIDA data paper available at https://doi.org/10.5281/zenodo.17829236.
Box Office Simulation Tool
Source: FIDA data repository (2025). This simulation tool employs data from The Movie Database (TMDB), Wikidata, Cinando, Lumiere Pro, the European Audiovisual Observatory, International Showtimes, and festival film data gathered by CresCine.
How it Works
The Box Office Simulation Tool is built on CatBoost, a machine-learning algorithm well suited to handling categorical variables such as genre or language alongside numerical factors such as production budget. After the user selects the relevant parameters and runs the simulation, the model estimates the most likely box-office outcome based on historical data from the past decade (2015–2024).
The model is trained on films that received a theatrical release in Europe and generated at least 5% of the average box office of foreign films in a given export market in their year of release. Re-screenings and later re-releases are excluded, ensuring that predictions reflect initial theatrical performance rather than long-term circulation.
Limitations
Unified data format.
The model requires input data in a unified and simplified format. For categorical attributes that may have multiple values at the same time – such as films belonging to several genres (e.g., Action–Drama, Action–Adventure) – only the primary (first-listed) category is used. To address this limitation, users are encouraged to run multiple simulations using each relevant genre separately.
Limited data coverage.
The training dataset is restricted to films released in European markets. As a result, some genres, languages, and production-country combinations are under-represented. When the model lacks sufficient historical examples to generate a reliable estimate, it will display a warning message instead of a prediction.
Data availability and consistency.
The model relies on data provided through FIDA. Data availability varies across countries due to differing reporting practices: some territories do not publish box-office figures at all, while others suspended reporting during the COVID-19 period. In addition, production budget data is frequently missing, which reduces the influence of budget on simulation results compared to what might be theoretically expected.
Film Production in Europe
This dashboard provides an overview of the types of films produced in Europe. It uses aggregated data from the Film Industry Data Repository (FIDA), which pools European film industry information from a variety of sources. The data for this dashboard derives from Lumiere Pro, Wikidata, the World Bank, and TMDB. In case of use, please reference the data paper accessible here.
Explanation of specific design choices and acronyms used in the dashboard: The graphs that analyse films by genre include only titles for which genre information is available. Of the 104,721 films in the database, 31.62% lack genre data. These entries originate primarily from the Lumiere database and consist mainly of short films, TV movies, festival compilations, and similar formats. The remaining dataset is broadly representative of films that have been in active circulation, including feature-length fiction films, animations, and documentaries. Regarding the acronyms used: EU Big 5 are Spain, Italy, Germany, France and Poland; EU Small Countries are all the rest of the EU countries; CoE Small Countries are the rest of European small countries that are the members of the Council of Europe; Turkey, UK and Ukraine form a separate group as they are the large countries next to the EU; “International” refers to all other countries.
The overall aim of this dashboard is to provide an accessible analytical tool for researchers, policymakers, sales agents, and cultural institutions seeking to understand the reach and mobility of European film.
Source: FIDA data repository (2025).This dashboard includes data from The Movie Database (TMDB), Wikidata, Lumiere Pro, and the World Bank.
Theatrical Demand
This Theatrical Demand dashboard is designed to enable analysis of film consumption in cinemas and demand for European films by presenting aggregated data from the Film Industry Data Repository (FIDA). The dashboard focuses on the years 2013-2014. Created by the CresCine consortium, FIDA pools European film industry information, including production output, film festival performance, theatrical admissions, box office, cinema showtimes, distribution in streaming services, and TV showtimes. By integrating structured and unstructured data from public and private sources, FIDA is designed to enable the study of the lifecycles of European films across release windows and territories. In case of use, please reference the data paper accessible here: https://doi.org/10.5281/zenodo.17829236
Explanation of specific design choices and acronyms used in the dashboard: The graphs that analyse films by genre include only titles for which genre information is available. Of the 104,721 films in the database, 31.62% lack genre data. These entries originate primarily from the Lumiere database and consist mainly of short films, TV movies, festival compilations, and similar formats. The remaining dataset is broadly representative of films that have been in active circulation, including feature-length fiction films, animations, and documentaries. Regarding the acronyms used: EU Big 5 are Spain, Italy, Germany, France and Poland; EU Small Countries are all the rest of the EU countries; CoE Small Countries are the rest of European small countries that are the members of the Council of Europe; Turkey, UK and Ukraine form a separate group as they are the large countries next to the EU; “International” refers to all other countries.
FIDA is built on a cloud-based platform Databricks, compiling data from The Movie Database (TMDB), Wikidata, Cinando, Lumiere Pro, the European Audiovisual Observatory, UsherU, International Showtimes, media-press.tv, and festival film data gathered by CresCine.
The overall aim of this dashboard is to provide an accessible analytical tool for researchers, policymakers, sales agents, Lumiere Pro and cultural institutions seeking to understand the demand of European film.
Source: FIDA data repository (2025).This dashboard includes data from The Movie Database (TMDB), Wikidata, Lumiere Pro, and the World Bank.

