BackgroundWhen science meets application portability. A team of scientists has set out to build the world’s largest radio telescope called the Square Kilometer Array (SKA) in the desert of South Africa. The telescope is currently being built in SA and partially in Australia and once finished this giant radio telescope will consist of thousands of 20 meter dishes. .
ChallengesThe problem is that they had scientists who wanted to crunch data using their own self-created platforms. Their scientists are often self educated in software engineering, taking courses, but never learned how to actually design a project, and perform extreme programming . The data generated from their projects is huge, in the petabytes or zettabytes, massive amounts of data, about 50 gb per second. It is also estimated that the SKA telescope will generate an amount of data equal to 10x times the amount go global internet traffic. If building a data center now, the amount of energy required is about a 1/4 power of a nuclear power station. The team needed a more efficient way of processing their data. The software the scientists use is complex and often badly written or untested and on top of that, the platforms they used were created internally and were not built for collaboration, making it hard for data to be shared across platforms. These self created clusters were developed on Linux computers. Since data rates are so high they need to use their private clusters. Needed a solution to work on all of these various platforms that scientists have gone off and created on their own. In some cases it was taking scientists 2-3 days to compiling their complete tool chain. This was a massive waste of time.
SolutionDocker makes it easier for deploying software, bringing software to the scientists. and for containing very fragile software. The team uses Docker in two ways. First, Docker is used to distribute software to their scientists working on the project. Packing up all important radio libraries, compile them into Ubuntu, and create Debian packages of these. The Debian package are then uploaded to Canonical. You can upload source debian packages to the service and then packages are built and placed into a public repos. People using Ubuntu can access the rep and install packages in their environment. Using Docker simplifies the process of getting software onto a super computer, but also enables scientists to run various programs that previously only ran on Linux, to also run on MacBooks with very simple commands, making applications more portable for them. The team also used Docker to create their new simulator tool called Rodriquez. The tool is an online telescope calibration simulator. This enables scientist tweak calibrations coming from the telescope, and can calibrate the images coming out of the telescope. Docker makes it possible to offload the simulations generated by the web application into more powerful virtual machines. In the future, the team intends on using Docker to build a long term tool for building pipeline solutions that is data aware but system independent and learning what Docker’s role will be in the project.