Data Manager (1 Position)

  • Location:
  • Salary:
    negotiable / YEAR
  • Job type:
    FULL_TIME
  • Posted:
    2 months ago
  • Category:
    Education, Information and Communication Technology, Management and Strategy, Research and Data
  • Deadline:
    14/11/2024

JOB DESCRIPTION

Job Summary

ECMWF is building a world-leading, machine learning based probabilistic weather forecasting system (AIFS), to complement our existing physics-based system (IFS). We are pioneering the operationalisation of machine learning forecasting models in this domain. ECMWF now runs both deterministic and probabilistic AIFS forecasts daily, providing open data and products to users around the world. Within the Destination Earth initiative, AIFS workflows are being expanded towards an Earth-system model capturing land, ocean sea-ice and wave processes.

Data is the lifeblood of machine learning, with well-curated datasets being vital for learning accurate models. In this position you will play a leading role in the management of training datasets for machine learning models including the AIFS. You will manage machine learning datasets for ECMWF activities, such as for operational configurations, Destination Earth applications and ECMWF’s Member and Cooporating State undertakings. This involves liasing with users inside and outside of ECMWF, understanding the requirements for new datasets, and life-cycle management and curation of datasets between HPC systems across Europe.

This role is in the Data Archives and Dissemination Services Team of the Production Services Section. The team is responsible for archiving of operational and research data into the MARS archive and Fields DataBase (FDB) and the generation and dissemination of ECMWF’s products. The Production Services Section is responsible for the operational production services of ECMWF, including in the framework of DestinE and Copernicus services, working closely with teams across the organisation to maintain, develop and manage the operational forecasting systems and associated data services.

About ECMWF

The European Centre for Medium-Range Weather Forecasts (ECMWF) is a world-leader in weather and environmental forecasting. As an international organisation we serve our members and the wider community with global weather predictions and data that is critical for understanding and solving the climate crisis. We function as a 24/7 research and operational centre with a focus on medium and long-range predictions, holding one of the largest meteorological data archives in the world. The success of our activities builds on the talent of our scientists and experts, strong partnerships with 35 Member and Co-operating States and the international community, some of the most powerful supercomputers in the world, and the use of innovative technologies and ML across our operations.

ECMWF has also developed a strong partnership with the European Union and has been entrusted with the implementation and operation of the Climate Change and Atmosphere Monitoring Services of the EU Copernicus Programme. We also contribute to the Copernicus Emergency Management Service. Other areas of work include High Performance Computing and the development of digital tools that enable ECMWF to extend provision of data and products covering weather, climate, air quality, fire and flood prediction and monitoring.

ECMWF is a multi-site organisation, with a main office in Reading, UK, a data centre/ supercomputer in Bologna, Italy, and a large presence in Bonn, Germany. We appreciate the need for flexibility in the way our staff work. We have adopted a hybrid work model that allows flexibility to staff to mix office working and teleworking, including away from the duty station for up to 10 days/month (within the area of our member states and co-operating states).

See www.ecmwf.int for more info about what we do.

About Destination Earth (DestinE)

ECMWF is one of the three entities entrusted to implement the DestinE initiative of the European Commission, alongside with ESA and EUMETSAT as partners. DestinE aims to deploy several highly accurate thematic digital replicas of the Earth, called Digital Twins. The Digital Twins will help monitor and predict environmental change and human impact, in order to develop and test scenarios that would support sustainable development and corresponding European policies for the Green Deal. ECMWF is responsible for the delivery of these digital twins and of the Digital Twin engine, the software infrastructure needed to power them of some of Europe’s largest supercomputers, those of the European HPC Joint Undertaking (EuroHPC).

The second phase of DestinE covers the period June 2024 – May 2026, and future phases are foreseen (subject to funding). Phase 2 will focus on early operations with consolidation, maintenance, and continuous evolution of the DestinE system components developed in the first phase. There will also be an enhanced focus on ML activities, including the deployment of workflows of components of a ML model for the Earth system, optimisation of the Digital Twin Engine to enable efficient model training and simulations, and other activities. One key element of the ML activities in phase 2 includes training. This shall build on recent ML training initiatives at ECMWF, including the Massive Open Online Course (MOOC) on ML for Weather and Climate.

(see https://learning.ecmwf.int/course/index.php?categoryid=1)

For more information on DestinE, see https://ec.europa.eu/digital-single-market/en/destination-earth-destine and https://www.ecmwf.int/en/about/what-we-do/environmental-services/destination-earth

Duties:

  • Manage and support the data handling requirements of ML applications.
  • Taking the responsibility for relevant elements in terms of creating, storing and serving of datasets to be used for machine learning applications, in close collaboration with ECMWF ML experts and with experts in ECMWF’s Member and Cooperating States.
  • Collaborate with research and technical teams at ECMWF on ML developments, including the gathering of future requirements, producing projections of usage and performing resource capacity planning.
  • Act as Data Governance (DGov) facilitator for matters related to machine-learing, working collaboratively with ECMWF’s existing DGov Facilitators and building expertise in the data format standards involved.
  • Act as Data Curator for ML datasets, with emphasis on the maintenance of catalogues and ensuring data accuracy and completeness.

Personal Attributes:

  • Excellent interpersonal and communication skills are vital, to communicate with a wide range of technically skilled colleagues at ECMWF and in partner organisations, also to non-technical staff.
  • Dedication and enthusiasm to work as part of a team, taking initiative to work collaboratively with other team members and project partners.
  • Ability to at times work independently, but to know when it is important to seek advice.
  • Excellent analytical and problem-solving skills with a proactive continuous improvement approach.
  • Ability to work with precision and care and having a good awareness of the need to test thoroughly and document changes appropriately.
  • Ability and willingness to collaborate with internal and external experts on related aspects of governance and exchange of data for machine learning applications.
  • Active listener who seeks and respects the views of others.
  • Highly organised with the capacity to work on a diverse range of tasks to tight deadlines.
  • Flexibility, with a willingness to adapt plans if partner organisations work at different paces, and if priorities shift over time.
  • Candidates must be able to work and communicate effectively in English.

Education:

  • An advanced university degree (EQF Level 7 or above) or equivalent professional experience is required

Experience:

  • Experience curating large datasets (terabytes to petabytes) required
  • Programming skills in Python and/or scripting languages in a UNIX/Linux environment required.
  • Experience handling scientific data formats in High Performance Computing environments is desirable.
  • Experience of working collaboratively on software development projects is desirable

Knowledge and Skills (including Language):

  • Experience working in a scientific environment
  • Experience working in a High Performance Computing environment
  • Familiarity with big-data analytics, cloud technologies and machine learning fundamentals
  • An understanding of numerical weather prediction or meteorological applications
  • Knowledge of standardised data formats for international exchange

The following skills and experience would be an advantage. However, you are encouraged to apply even if you feel you don’t precisely meet all the requirements.

This job has expired.