Vol. 21 - Predictive models and the changing value of data
Because predictive models can do a lot more based on sequence data, we need a robust ABS system more than ever.
Earlier last month, I argued on the Verena blog that open science and access and benefits sharing are at least compatible.
(I still stand by this fact)
One thing that I have been thinking about in the past few weeks is how advances in predictive models change the value of existing data. Specifically, I would like to make a case that this is relevant for discussion about access to digital sequence information on genetic resources.
So let’s talk about AlphaFold (it goes without saying that I like AlphaFold, and I am using it as a framing example here).
The value proposition of AlphaFold is simple: given a textual representation of a protein, it predicts the spatial configuration of this protein. How this prediction appears is incredibly complicated and largely irrelevant here. What matters is that, without ever accessing a physical sample, we can get a lab-quality structure of the protein.
Looking at the case studies on the AlphaFold website, it is clear that drug discovery is a prime target for using this sort of approaches. Which is good!
But who gets access to the sequence information that leads to drug discovery (and the profits derived from it) has been described as “make or break” for the negotiations about the WHO Pandemic Agreement.
When the Convention on Biological Diversity was ratified in 1993, and even at the time of Nagoya in 2014, the kind of predictions that AlphaFold can deliver were unimaginable. In a sense, our understanding of the value of a piece of text describing a sequence was framed by what this information could be used for at that time.
But what of now?
Almost 10 years after Nagoya, we can do a lot more (predictions) with a lot less (physical access to material), and so the fact of publishing a sequence is enabling a lot more work.
It is established that countries from the Global South do not receive a fair share of therapeutics. The more we can predict without access to samples, the more data from the Global South are at risk of being pillaged for profits.