Tech
CERN Digital Storage Demand Will Soar With More Intense Physics Experiments
In late May, I had a chance to visit CERN, near Geneva, and visit with Jakub Mościcki, head of CERN’s Storage and Data Management Group in the CERN IT Department and Alberto Di Meglio, head of IT innovation, including CERN openlab. CERN conducts one of the world’s largest science experiments and the facility houses the world’s largest particle accelerator, the Large Hadron Collider (LHC). We had a discussion and tour of CERN’s main data center and I also got to see the antimatter factory (which I mentioned in an earlier blog). We spoke about digital storage and data management for the large volume of scientific data generated by CERN particle physics experiments.
The CERN Storage and Data Management Group designs and operates large-scale, open-source storage services in order to manage storage for experiments, both data taking and long-term archiving for the various research programs (there are 37 of these), including the LHC. The group also supports global data distribution and management to about 200 computer centers worldwide and end-user access and analysis on edge devices such as laptops, desktops and computer clusters.
This group also supports cloud and infrastructure storage including home directories, block and object data access. The image below, from my briefing, gives an idea of CERN storage services and its support of experiments, users and data centers. Ceph provides block devices for compute services such as batch and interactive clusters, while CERNBox provides storage for data analysis integrated with end-user access to data on edge devices. EOS is the the large-scale disk system with much of the facilities data stored on hard disk drives (over 1exabyte of storage on over 100,000 HDDs). CERN’s Tape Archive has about 750PB of data on 180 tape drives in 5 library systems.
Currently CERN makes minimal use of SSDs and other NVMe storage devices, mainly using them for efficient metadata handling and some high IOPs applications. Digging deeper into our understanding of particle physics requires regular upgrades in CERN’s facilities. The image below shows the history of increases in the luminosity (that is the intensity of particle collisions generation) in the LHC over the years and projected into the future (up to 2040).
The increase in the intensity of the particle collisions generation results in increasing data generated during experiments, which then needs to be stored, archived and shared with researchers all over the world. The charts below show CERN HDD (left) and magnetic tape (right) storage capacity increases from 2010 through 2024.
Looking out beyond 2024, increases in the data generated during CERN experiments is projected to grow even further, with hard disk drive demand likely to be in a range of 5-10EB by 2037. With the growing amount of data archived, magnetic tape is expected to increase in installed capacity by much more, likely over 6EB of tape archival storage required by 2032. The image below shows Alberto Di Meglio with me in front of a Spectra Logic Infinity tape library in the CERN main data center.
CERN follows evolution of the technology and is ready to explore the move from HDDs using conventional magnetic recording (CMR) with perpendicular magnetic recording (PMR) to PMR HDDs with shingled magnetic recording (SMR, where tracks are written partially over each other to increase the track density and thus the HDD storage capacity). SMR drives have no significant read performance penalty, however, they are most useful for applications where data is written only once, since re-writing data on an SMR HDD requires extra steps, slowing write performance.
In addition, CERN plans to explore heat assisted magnetic recording (HAMR) HDDs as these become available. HAMR drives from Seagate are now available with 32TB capacities. I also spoke with my hosts about dual actuator HDDs, which could improve HDD data rates and might be useful to support the large data streams from their physics experiments.
Regarding future plans for archiving, CERN will continue to use magnetic tape but they are also looking into other data archiving technologies, including optical storage approaches such as those of Cerabyte (using a ceramic disc) and Project Silica (using quartz glass). They are also keeping an eye on using DNA storage for archiving applications.
CERN is pursuing continuous innovation using open-source software where software defined storage can be used to maximize performance at minimum hardware cost. They also seek to optimize the use of expensive tape infrastructure and provide new ways of accessing and sharing data for end-users “on the edge.” CERN also wants to integrate with Open Science FAIR services as well as data management and worldwide research infrastructures.
CERN, based near Geneva, Switzerland, hosts big physics experiments, including the LHC. The amount of data from these physics experiments is projected to swell over the next 10+ years, especially in the data requirements for archiving its scientific data. This will drive innovations in management software as well as storage hardware requirements.