Diamond Light Source - Annual Review 2022/23

82 83 D I A M O N D L I G H T S O U R C E A N N U A L R E V I E W 2 0 2 2 / 2 3 D I A M O N D L I G H T S O U R C E A N N U A L R E V I E W 2 0 2 2 / 2 3 Introduction to Scientific Software, Controls and Computation S cientific Software, Controls and Computation (SSCC) department manages all software, computing and control systems to facilitate and support the science programme of Diamond. The department functions as nine groups: Scientific Computing, Data Analysis, Data Acquisition, Beamline Controls, Accelerator Controls, Electronic Systems, Scientific Information Management Systems, Diamond-II Integrated Software and Cyber Security. The overall structure and function of these areas recognises the importance of, and is optimised to provide, the best possible delivery and support for software, computing, and control systems. Over the last year there has been an increasing emphasis on planning for Diamond-II. SSCC will deliver new software, control systems and computing as part of the machine upgrade and beamline development for Diamond-II. In addition, it was recognised that there needed to be developments in the underlying software and computing capabilities to prepare for the substantial increase in data rates that will come with Diamond-II. This has been addressed through the design of a new software architecture for photon beamlines and the definition of an extensive core software and computing programme for Diamond-II to deliver new enabling capabilities. (See below: Developing the Diamond-II Core Software and Computing). Diamond produced more than 10PB (a PB of data is equivalent to 213,000 DVDs) of data in 2022 from photon beamlines and electron microscopes. To support increases in data rates an additional 10PB of first level storage has been procured and along with 24 additional computing nodes (960 cores). It is recognised that provisioning all computing services within Diamond is not sustainable, so work is ongoing to decouple applications and services from the existing computing infrastructure by using containerisation technologies. This will enable the containers to be deployed on both private and public cloud infrastructure. (See below: The Journey to Cloud Native.) As experiments conducted at Diamond produce increasingly large and complex data sets, it becomes more difficult for users to transport their data back to their home institute and process it. To address this Diamond is increasingly providing users with data processing services to enable information to be extracted from their data. To facilitate this Diamond develops and maintains a suite of data analysis applications to support the photon science and electron microscope programmes. These same tools can also provide near real-time feedback to the user as their experiment progresses. However, some tools are computationally demanding and there is a programme to accelerate the processing using faster computing technologies, such as GPUs, to provide near real-time feedback during experiments. (See below: Accelerating Ptychography Reconstruction Codes.) Advances in toolkits for applyingArtificial Intelligence and Machine Learning (AI and ML) techniques have evolved considerably in recent years. They now provide an important opportunity to automate data reduction further and to improve data analysis. (See below: Applications of AI and ML to Diamond Science.) Historically, user facing applications have run within the graphical user interfaces based on the host operating system. These are now being superseded and made obsolete by web-based technologies. Web based technologies and communication protocols have now evolved sufficiently to now enable the level of performance required for scientific and engineering applications. So, a programme of developments to migrate these applications to web-based environment is ongoing. (See below: A new EngineeringWeb User Interface.) Developing the Diamond-II Core Software and Computing Significant advancement of Diamond’s software and computing capability will be required to extract data optimally from the Diamond-II upgrade, and so maximise the scientific opportunity and knowledge gained. Developments will be required to: • Handle faster detectors and deliver rapid data processing and reduction; • Support greater automation of experiments, data reduction and analysis; • Introduce and develop new data processing techniques, including exploitation of AI/ML; • Provide a more open software environment, facilitating greater collaboration between software and scientist; • Address obsolescence andmodernise the beamline software stack; • Adapt for changing needs and expectations of Diamond’s users. The Diamond-II core software and computing project will deliver developments focused across key areas identified in Summer 2021 and explored in depth since, covering: • High Performance Sample Stages; • Detector Readout, Data Compression and Reduction; • Modernisation of Data Acquisition Software Framework; • Science Specific Data Analysis Software Developments; • Data Archiving; • Post-visit Data Analysis Services; • User Administration and Information Management. The project is one of the five pillars of the Diamond-II programme and is described by a work breakdown structure (WBS) and six work streams: Hardware Infrastructure, Software Infrastructure, Data to Information, Real-time Data, ExperimentManagement and InformationManagement. These are hierarchically decomposed into over thirty detailed work packages. A project delivery plan based on phased and early migrations, where appropriate, has been developed. So, whilst the ultimate ambition of the project is to harness the brightness of the new Diamond-II machine and enable new flagship capabilities, this project will realise a continuous stream of incremental benefits to Diamond before this era, reducing technical debt, addressing critical obsolescence, and deploying new capabilities with greater flexibility and extensibility. These developments were documented in the Diamond-II Core Software, Controls & Computing Technical Design Report which was released in March. The project has enjoyed early success with the roll-out of a new web-based engineering user interface to theMachine. The development of a newAcquisition Platform, is underway with initial core services deployed in a test beamline. A series of projects have explored live data streaming to move from existing“serial” to live“on-the-fly”setups for Ptychography. Live analysis of X-ray diffraction data for experiment feedback is a critical part of the macromolecular crystallography unattended data collectionworkflow. Initial work has proven the viability of GPUs to reduce image processing time with verified identical algorithmic results. The Journey to Cloud Native Diamond is transitioning to a Cloud Native architecture across its scientific computing infrastructure, utilizing key Cloud Native enabling technologies such as containerisation and microservices design patterns. These software encapsulation and development methods drive higher speed and agility in software development, deliver higher application reliability, and provide more portable code. Ultimately, these methods will allow Diamond to develop and deploy software more quickly, decrease time to science, and increase our ability to respond to scientific drivers, opportunities and collaborations. The most important technology in this landscape is containerisation, which Diamond is adopting heavily in several areas: • High Performance Computing (HPC) - Adoption of a HPC specific containerization technology is enabling off-premise computing for lower priority data processing. By using containers, codebases are no longer coupled to Diamond’s on-premise infrastructure and can be easily executed on other public or private off-premise Cloud services (such as the STFC’s Openstack Cloud - IRIS). • Web Applications – Web based applications are the most common target for CloudNative architectures and tooling. Diamond has a number of production, in-house developed and third-party web applications running in containers. These include user interfaces to scientific software, as well as stand-alone data analysis notebooks such as Jupyterhub. • Distributed Control Systems – Both beamline and accelerator controls groups are now running production applications in containers, thereby delivering a consistent bootstrapping environment enabling control and communication with embedded devices. • Data Analysis Microservices – Data analysis middleware, deployed as a microservices application in containers, now underpins all MX data processing pipelines. In turn this is monitored using Cloud Native tools such as Prometheus, Grafana, and Alertmanager to surface and report any issues in the operation of the workflow and somaintain operation of these key services. Diamond is currently running thousands of containers across the areas listed above. To achieve this in a reliable way, an on-premise Cloud infrastructure has been deployed specifically for container execution and is based on the industry standard Kubernetes container orchestration system. The Diamond Kubernetes Cloud has approximately 3000 CPU cores deployed as a cluster ofmachines for high availability and scalability, and hosts a number of production applications that are essential to the operation of Diamond. Currently Diamond deploys one large Kubernetes cluster which services the whole organization. During the coming year, Diamond will investigate how multiple clusters canbe operated, with a viewto providing a cluster per beamline. This will present a performance, failure, and administrative domain for each beamline, and is the preferred deployment architecture for Diamond-II. Accelerating Ptychography Reconstruction Codes Ptychography is an increasingly important quantitative high-resolution imaging technique used across multiple Diamond imaging beamlines (I08, I13, I14) as well as the electron microscopy facility (ePSIC). Unlike conventional microscopy, ptychography is a lens-less imaging method in which a series of diffraction images (patterns) is collected while raster-scanning an object with an X-ray or electron beam. Using iterative phase retrieval algorithms, this collection of diffraction patterns is then converted computationally into a reconstructed high-resolution image of the object. The time taken to perform this reconstruction results in a gap between the experiment operation and being able to view an image, and limits our ability to understand the samples under study during the experiment in near real- time. There is also a drive to use this technique to look at larger samples and more dynamic experiments, so it is critical to minimise the time needed for the reconstruction process. Diamond, in collaboration with the Ada Lovelace Centre, has invested in accelerating the ptychography reconstruction codes that are used across the different beamlines and instruments, which includes the PtyPy and PtyREX software. Most ptychographic iterative phase retrieval algorithms include many steps that can be performed in parallel and are therefore suitable for distributed processing approaches and for harnessing the computational power of modern GPUs. To demonstrate the impact of code acceleration in ptychography, an example data set has been taken with 1257 diffraction patterns each of size 256x256 pixels collected within 74 seconds at the beamline I08-1 instrument for Soft X-ray ptychography. 200 iterations of the difference map (DM) algorithm as implemented in PtyPy resulted in a high-resolution reconstruction of the nanometre-sized structure of a ground scale from a female Junonia orithya butterfly wing (Figure 1 below). Running this reconstruction on a single CPU would take 1736 seconds but distributing the work over multiple processes reduces the time to 114 seconds for a single node on Diamond’s Hamilton HPC cluster with 40 CPUs per node. To reduce the time further, we have implemented a GPU-accelerated version of the DM (and other) algorithms using raw CUDA kernels for all major parts of the algorithmwrapped together in Python using CuPy . Running on a single HPC cluster node with 4 NVIDIAV100S GPUs reduces the reconstruction time to just 5 seconds – a performance boost by a factor of 22 compared to 40 CPUs. GPU-accelerated ptychography code has enabled us to implement efficient auto-processing pipelines for ptychographic reconstruction and has substantially improved the experience for beamline users. Applications of Machine Learning and Artificial Intelligence to Science Recently there have been significant advances made in the fields of Machine Learning (ML) and Artificial Intelligence (AI), which have attracted widespread interest. Both independently, and in collaboration with others, Diamond has sought to embrace these advances for the facility In Life Sciences, machine learning tools have been developed to help beamline scientists, and Diamond users, to automatically monitor experiments in which protein crystals are grown. Obtaining diffracting crystals of protein is often a challenging task that requires performing hundreds of experimental trials that are monitored by repeatedly taking microscope images of the small liquid droplets that are used as a growth environment. To automatically monitor these experiments, a set of tools has been created, collectively known as CHiMP (Crystal Hits in My Plate), that utilise deep learning neural networks. The first type of tool classifies the microscope images into different classes such as “clear droplet” or “crystals” giving a quick overview of the progress of the experiment. The second tool pinpoints the location of any crystals and the surrounding droplet in an image, thereby allowing automated targeting of the X-ray beam for collection of diffraction data from the crystals whilst still in the droplet. These tools are currently being utilised on the VMXi beamline enabling fully autonomous operation. Figure 1: Within 5 seconds, a series of 1257 diffraction patterns are converted by GPU- accelerated ptychography code into a high-resolution image. The colour wheel represents absorption contrast as brightness and phase contrast as hue (color). X-ray Technologies