In November 2019, the U.S. Department of Energy’s Argonne National Laboratory (ANL) and Cerebras Systems, a leader in artificial intelligence (AI) compute, announced a partnership to explore how deep learning can advance research in cancer, traumatic brain injury and the properties of black holes, amongst other areas, using Cerebras’ industry-leading CS-1 system. When COVID-19 reached pandemic status in March 2020, however, ANL quickly pivoted their research to help in the race to find life-saving treatments and therapeutics.
Inventing a new drug, obtaining regulatory approvals, and reaching commercialization generally takes decades. A faster alternative to finding a therapeutic for COVID-19 is to reuse drugs that are already approved for use in humans. The problem then becomes determining which existing drug will be effective in the fight against coronavirus – that is, which molecules from already-approved drugs can best bind to the docking sites of the virus’ proteins to inhibit them from attaching to human host cells.
The first step in rendering the virus inactive is to identify which of the candidate molecules from existing approved drugs can bind to the pockets of the COVID-19 virus’ proteins and block its docking sites. Historically, the way to tackle this challenge was through brute-force computation. Each molecule would be computed a “docking score” based on its characteristics or descriptors, which ends up being a very compute-intensive task given there are billions of molecules, and each of the virus’ proteins have dozens of docking sites.
ANL took a different approach, leveraging the power of AI and machine learning (ML) to develop a model which much more efficiently processes ANL’s massive datasets to determine which molecules have the best docking scores. This ML model allows ANL to quickly predict which drug molecules are the best candidates for the next stage of testing. Instead of employing brute force computation, which would take substantially more time and require significantly more computing power, ANL used the full force of its supercomputing infrastructure and state-of-the-art AI technology, including the Cerebras CS-1 system.
The CS-1, introduced in November 2019, was designed to solve these types of challenges by delivering record-breaking performance and scale to AI compute. ANL is using the CS-1 to train the models which predict docking scores. The speed in turnaround is allowing the team to churn through the massive datasets much more quickly, enabling faster experimentation at a time when the problem is urgent and the potential exploration space is enormous.
ANL and Cerebras are collaborating on training a second model which uses an image-based representation rather than a numerical representation of the molecular characteristics, and has, in testing, yielded excellent model results. By using image-based representations of the drug molecules and virus proteins, ANL can train the CS-1 to learn using the same “language” that chemists use to communicate about molecules – diagrams that detail molecular structure and shape.
Once the CS-1 is running the ML models, the next step is to run inference over several billion samples to pick the ones most likely to bind to the virus’ pocket and prevent it from binding to other host cells. From there, ANL is building in-silico models to understand the interactions of the selected drug molecules further. Finally, the most promising drug candidates are passed to the wet-lab for verification. This enables drug development facilities to test only the most viable drug molecules, saving valuable time and resources.
Just weeks after testing has begun, the collaboration and results have already been very promising. By using the CS-1, ANL can train models hundreds of times faster than before, enabling them to quickly identify which drug molecules will be most effective in binding to the virus’ proteins and keeping the Pandora’s Box of COVID-19 closed. In the war against COVID-19 and other novel viruses, the power of AI supercomputers promises to build more robust workflows, accelerate research and development of deep learnings models and greatly advance the future of disease research.