全ての生命活動の基本的役割を担うタンパク質の立体構造がAIにより高い精度で予測可能に。DEEP LEARNINGのお陰により、生物学における中心的テーマの一つであるタンパク質の立体構造の解明が高精度・迅速にできる時代が到来。
わたしのnoteにおいては、最新の科学・経済・社会等の問題に関して、英語の記事を引用し、その英文が読み易いように加工し、英語の勉強ツールと最新情報収集ツールとしてご利用頂くことをmain missionとさせて頂きます。勿論、私論を書かせて頂くこともしばしです。
本記事は、AIによって、タンパク質の立体構造の予測の精度と速度が飛躍的に上がったこと、更に、複数のタンパク質で構成されるタンパク質複合体の立体構造の予測もかなりの精度で可能になったことが紹介されている。タンパク質の立体構造解析は、従来、非常な困難と時間を伴うもので、例えば、COVID-19のスパイク蛋白の立体構造と、スパイク蛋白が結合する生体側の蛋白の立体構造が分かれば、阻害剤の開発はかなり容易になるが、その立体構造を明らかにすることにおいて、多大な労力と時間と困難が伴うため、これまでの阻害剤の開発は、結合を阻害するか否かに関し、数万種類という物質をスクリーニングにかけて、そこから、阻害剤候補を拾い上げてくるという方法を取らざるをえなかった。しかし、今回、AIによりタンパク質の立体構造の予測が高精度かつ迅速にできるようになれば、スパイク蛋白と生体側の結合蛋白の立体構造を明らかにし、その明らかになった2つの構造から、両者が結合する部位を同定し、その結合部位に強力に結合する化合物(スパイク蛋白と生体側蛋白との結合を強力に阻害する化合物)を設計すれば、COVID-19の治療薬開発の成功確率および開発スピードは各段に上がることが予想される。
今回、AIによって、タンパク質の立体構造が精度良く、迅速に予測できるようになったことにより、タンパク質の構造と機能の理解が一気に加速され、今後、医薬品の開発、病態の解明、生命科学の理解等へのインパクトは絶大なものがある。
[参考]
タンパク質構造予測
出典: フリー百科事典『ウィキペディア(Wikipedia)』
タンパク質構造予測 (たんぱくしつこうぞうよそく、英: protein structure prediction) は、タンパク質についてそのアミノ酸配列をもとに3次元構造(立体配座)を推定することであり、バイオインフォマティクスおよび計算化学における研究分野の一つである。専門的な言葉では「タンパク質の一次構造をもとに二次構造や三次構造を予測すること」と表現できる。タンパク質のアミノ酸配列は一次構造と呼ばれる。タンパク質のアミノ酸配列は、その遺伝子が記録されたDNAの塩基配列から、遺伝コード(コドン)の対応表に基づいて、導出することができる。生体内において、ほとんどのタンパク質は、一次構造(アミノ酸配列)が決定されれば、一意的(自動的)に自ずと3次元構造(三次構造、コンフォメーション)が形成される。これをタンパク質が折りたたまれる(フォールディング)という。タンパク質の3次元構造を知ることは、そのタンパク質の機能を理解する上で有力な手がかりとなる。医学(例:医薬品設計)や、バイオテクノロジー(例:新しい酵素の設計)等々、幅広い領域において重要な役割を果たしている。
AI Can Now Model the Molecular Machines That Govern All Life. Thanks to deep learning, the central mysteries of structural biology are falling like dominos.
SingularityHub / By Shelly Fan -Nov 16, 2021
Just last year, DeepMind shocked the biomedical field with AlphaFold, [an algorithm that predicts protein structures with jaw-dropping accuracy]. The University of Washington (UW) soon unveiled RoseTTAFold, [an AI that rivaled AlphaFold in predictive ability]. A few weeks later, DeepMind released a near complete catalog of all protein structures in the human body.
Together, the teams (DeepMind and University of Washington (UW)) essentially solved a 50-year-old grand challenge in biology, and because proteins are at the heart of most medications, they may also have seeded a new era of drug development. For the first time, we have unprecedented (今までに例のない、かつてない、前代未聞の、未曽有の、史上初の、 /ʌnprésidèntəd) insight into the protein engines of our cells (細胞の機能の殆どはタンパク質が担っている), many of which had remained impervious (解析困難な/impə́ːrviəs) to traditional lab techniques.
Yet one glaring (顕著な/glériŋ) detail was missing. Proteins don’t operate alone. They often associate into complexes—small groups that interact to carry out critical tasks in our cells and bodies.
This month, the UW (University of Washington) team upped their game.
Tapping into (~に入り込む、利用する) both AlphaFold and RoseTTAFold, they tweaked (微調整する/twíːk) the programs to predict which proteins are likely to tag-team (どのタンパク質と、どのタンパク質が複合体を形成するか?) and sketched up the resulting complexes into a 3D models.
Using AI, the team (UW team) predicted hundreds of complexes—many of which are entirely new—that regulate DNA repair, govern the cell’s digestive system, and perform other critical biological functions. These under-the-hood (念入りに調べる) insights (物事の実態[真相]を見抜く力、洞察力/ínsàit) could impact the next generation of DNA editors (DNA改変・編集技術者) and spur (刺激する、〔~を〕促進させる、〔~に〕拍車を付ける/spə́ːr) new treatments for neurodegenerative disorders or anti-aging therapies.
“It’s a really cool result,” said Dr. Michael Snyder at Stanford University, [who was not involved in the study], to Science.
Like a compass, the results (予測されたタンパク質の立体構造) can guide experimental scientists as they test the predictions (予測された立体構造を検証しながら) and search for new insights into how our cells grow, age, die, malfunction, and reproduce. Several predictions further highlighted how our cells absorb external molecules—a powerful piece of information that could help us coerce normally reluctant cells to gulp up medications (通常、投与された薬剤を取り込まない細胞に、薬剤を細胞内に取り込ませ、薬効を発揮させる).
“It…gives you a lot of potential new drug targets,” said study author Dr. Qian Cong at the University of Texas Southwestern Medical Center.
The Cell’s Lego Blocks
Our bodies are governed by proteins, each of which intricately (複雑に/íntrikət) folds into 3D shapes. Like unique Lego bricks, these shapes allow the proteins to combine into larger structures, which in turn conduct the biological processes that propel life.
Too abstract? An example: when cells live out their usual lifespan, they go through a process called apoptosis (細胞の自然死◆プログラム化された細胞死/æ̀pəptóusis)—[in Greek, the falling of the leaves]—in which the cell gently falls apart without disturbing its neighbors by leaking toxic chemicals. The entire process is a cascade of protein-protein interactions. One protein grabs onto another protein to activate it. The now-activated protein is subsequently released to stir up (刺激する、引き起こす、惹起する) the next protein in the chain, and so on, eventually causing the aging or diseased cell to sacrifice itself.
Another example: in neurons during learning, synapses (sínæ̀ps) (the hubs that connect brain cells) call upon a myriad of proteins that form a complex together. This complex, in turn, spurs the neuron’s DNA to make proteins that etch the new memory into the brain.
“Everything in biology works in complexes. So, knowing who works with who is critical,” said Snyder.
For decades, scientists have relied on painfully slow processes to parse out (解析する、明らかにする) those interactions. One approach is computational: map out a protein’s structure down to the atomic level and predict “hot spots” that might interact with another protein. Another is experimental: using both biological lab prowess (優れた能力/práuəs) and physics ingenuity (創造力/ìndʒənúːəti), scientists can isolate protein complexes from cells—like sugar precipitating from lemonade when there’s too much of it—and use specialized equipment to analyze the proteins. It’s tiresome, expensive, and often plagued with errors.
Here Comes the Sun
Deep learning is now shining light on the whole enterprise.
The main idea is deceptively simple. Proteins are made of twisting (〔糸などを〕よる、より合わせる) strands (糸) and sheets of a single line of amino acids, like beads strung (1列に並べる、ひと続きにする) onto a tangled (もつれる、絡む) but semi-predictable mess of yarn (糸). Deep learning can parse (解析する) how the yarn folds into 3D shapes based on the structure of the amino acid “beads” alone.
Last year, DeepMind and a team from UW led by Dr. David Baker both took a crack (試みる) at the problem. Without knowing anything else about a protein, the teams’ algorithms, [AlphaFold2 and RoseTTaFold], were able to churn out (量産する) thousands of protein structures. Though both were impressive, compared to AlphaFold2, Baker’s AI wasn’t as accurate for single-protein predictions. But [where RoseTTAFold shone (異彩を放つ)] was in predicting proteins with multiple sub-units—in essence, a single protein made up of a handful of structures, each physically grabbing onto the next. It’s a perfect jumping-off point for diving into protein handshakes. (タンパク質複合体の立体構造解明への大きな飛躍点)
At the time, the AI only worked on proteins in simple creatures, like bacteria. In the new study, Baker’s team focused on a more complicated organism—the common yeast, which has a cellular structure similar to that of humans. The choice of focusing on yeast proteins was deliberate (よく考えた、熟考した上での/dilíbərət): as a lab favorite, its genome is relatively small, and there’s a “gold standard” set of protein interactions to test out the updated algorithm.
Almost immediately, the team ran into problems.
Compared to bacteria, which the older AI had tackled, yeast had a far more complicated system for translating its DNA into proteins. Each step added noise. To get around (〔問題などを〕うまく避ける、〔障害物を避けて〕回り道をする) the hiccup (しゃっくり/híkʌp), the team used an evolutionary approach. If a protein-protein interaction is important for biology, they reasoned, then the “hands”—the protein interface—[where they grab onto each other] should change together as species evolve to maintain the interaction.
They compared the amino acid sequences—[20 “letters” total, compared to DNA’s 4 (アミノ酸は20種類、DNAは4種類)]—of over 6,000 yeast proteins to nearly 6,500 proteins in other similar species. Like cracking a cipher (暗号), this allowed the team to home in on (~に狙いを定める、~に的を絞る、~に目を付ける、注目する) the amino acids that change in lockstep (足並みをそろえて). They then traced the “letters” to their protein owners and hypothesized that these owners likely formed a complex.
Using both AlphaFold and RoseTTAFold, the team next solved the 3D structure of these protein candidates. Surprisingly, each algorithm on its own (ひとりでに、勝手に、自然に、そのままで) struggled in performance and power consumption. But by tag-teaming, with RoseTTAFold [screening protein pairs] first, followed by AlphaFold [predicting 3D structure], they achieved “excellent performance,” the team said, with a precision of 95 percent for the gold standard set.
They next expanded their test to over eight million co-evolved yeast protein pairs. Together, the new algorithm found over 1,500 pairs likely to interact, and drew up 3D models for about 800 that hadn’t previously been characterized—that is, about half.
The success rate is a triumph for biology. Digging deeper, the team found that most of the newly predicted complexes and interactions “play roles in almost all key processes” and “provide broad insights into biological function.”
Among the AI-predicted complexes are those that control DNA repair after damage, a process dubbed homologous recombination. Recombination is the cellular machinery that CRISPR and its variants tap into (利用[活用]する). Understanding the protein members and complexes involved could potentially lead to new avenues for gene editing.
Other complexes are involved in the cell’s recycling mechanism, which often goes awry in diseases involving neurodegeneration. Over time, toxic proteins build up and overwhelm vulnerable neurons, causing them to malfunction. Other complexes include those needed for cells to swallow nutrients and medication, those that unwind chromosomes[—which house DNA—]during reproduction, and those that translate RNA into proteins.
Like any simulation, the results are only hypotheses for now. But they offer unprecedented clues, at a large scale, into potentially new complexes and functions that escaped previous study. These predictions are a great example of the promise of 3D structures, said Dr. John Jumper, one of the lead developers for AlphaFold. Just last month, his team at DeepMind posted a pre-print on AlphaFold-Multimer, an algorithmic variant that predicts protein complexes at about 67 percent accuracy in nearly 4,500 test cases.
The study is just the start. “As with any new method, it is important when interpreting the results to keep in mind the limitations of the approach,” the team warned. For example, the AI doesn’t work as well for protein complexes that only transiently interact or those that have extremely complicated structures. The results have so far only been tested in yeast protein complexes, and may miss (見落とす) those restricted to another species. The AI also isn’t very confident in its predictions—tests show confidence levels of about 70 percent for each complex.
Still, that’s the thrill. Thanks to deep learning, we’re cracking the protein complexes underpinning (〔主張などを〕支持する、根拠を与える、実証する/ʌ̀ndərpín) biology at a massive scale. “It’s a really exciting time,” said Baker.