Why protein modeling is poised for global wide industrialization
- Posted on May 5, 2020
- Estimated reading time 4 minutes
This article was originally written by Avanade alum Dael Williamson.
Proteins are often called the working molecules of the human body. Proteins do most of the work in cells and are involved in many functions of the body essential to human life including building up antibodies to fight off disease.
A complex, vital part of the human function
A typical body has more than 20,000 different types of proteins. This makes protein modeling very complex and very important because without stable proteins, our bodies would cease to function. Some diseases, for example, are caused by an unstable protein being expressed (Type-1 diabetes) or by the disruption to another protein which binds healthy insulin to the cells to process sugar (Type-2 diabetes).
Example of a 3D structure of a protein
Growing up in Zimbabwe and South Africa, I had never heard much about human immunodeficiency virus (HIV) until the 1980’s. Even then, the disease was still relatively unknown, but I saw the health and financial impact HIV had on communities around me.
As a graduate student at an academic university center in South Africa, I studied the science of protein modeling with the view of applying to drug discovery. I recognized its potential to unlock answers inside each of us as well as other species, including the threats (e.g. viruses). Protein modeling can provide multiple new insights into multiple areas from drug design and environmental contamination to industrial waste.
Flash freezing proteins
In our lab at the university, we used a method called cryo-electron microscopy to flash freeze proteins at cryogenic temperatures. It’s not actually ice. Proteins go into a frozen liquid state. In this state, you retain the properties and arrangements of those properties inside a protein.
Inside each structure are two primary arrangements of amino acids called the a-helix and β-sheet. α-helices are right-handed and resemble spiral staircases. β-sheets come in two forms: parallel and anti-parallel. If you freeze proteins too slowly, these fragile structures will fall a part because the ice crystals will break.
Viewing proteins at atomic resolution
One of the challenges back then was viewing proteins at an atomic resolution. This is at an extreme microscopic level. For comparison, if a soccer ball inflated to the size of the earth and an atom grew at the same rate, it would still only be between 10 to 20 millimeters. The microscopic resolution used to see fine details was inadequate. Images would turn out smudgy and undefined making it more time intensive to analyze with any amount of precision.
Pre-cloud: Scarcity of compute and storage = time and cost
The idea I had was to industrialize the modeling of a protein into a repeatable framework and platform. The protein data bank offered a limited archive that could possibly help jump start the process. However, the method used to develop the available templates was still highly manual and costly and needed to be experimentally derived.
Unfortunately, I attempted to tackle the ‘industrialization’ of protein modeling before the cloud, before advances in software and before most modern-day technology existed. As a result, the time and cost to model proteins limited its ability to scale at any industrial level.
Sail across the sea
The time it took to sail across the sea, 3,600 miles from Cape Town to Rio de Janeiro, is about the same amount of time it took me to produce one protein modelling simulation. The problem was that you could not run a second simulation at the same time without extending the timeline.
For example, I would run simulation A, which is the oligomeric helical reconstruction using experimentally captured cryo-electron images, and it took 27 days. Then I would run simulation B, which is the homology modeling to predict unknown tertiary structures using known structures, and it took about 20 days. The last step, the oligomerization and docking took another two days. Altogether it took a little over a month.
Machines catching on fire
We had machines catching on fire from the heat generated to process and simulate massive amounts of datasets. Our research demands also regularly overutilized the hard drive space available. As a result, we ended up writing our own low-level code to connect machines together, wiring up clusters of computers. It is amazing how innovative you can get when you have no other choice.
Protein modeling ready for global wide industrialization
While my days in the lab were frustrating at times, I have never been more convinced that protein modeling is even more relevant given the current disruption throughout the globe. I am convinced we can efficiently and effectively put protein modeling to work with the technology we now have available. Just look at the power behind cloud-computing technologies and how we can use platforms like Microsoft Azure to efficiently build mission critical applications. We can use tools like Apache Spark and Azure Databricks to auto-scale and collaborate on projects in an interactive workspace.
Technology is now available to help industrialize protein modeling. This is important at a time when we need therapeutics to help us slow down the spread of viral diseases, simulate vaccine efficiency and emerge even stronger.