Episode 86 PANEL DISCUSSION Near-Term Challenges for ML/AI in Biotherapeutic R&D

June 9, 2026 | At this year’s PEGS Boston, industry experts gathered on a panel to explore how AI and machine learning are deployed in biologics R&D today. Moderated by Peter M. Tessier, Ph.D., Albert M. Mattocks professor of pharmaceutical sciences and chemical engineering at University of Michigan, the panel consisted of Andrew Buchanan, Ph.D., head of discovery at a stealth-mode biotech company; Norbert Furtmann, Ph.D., head of biologics AI and design of large molecules research at Sanofi; Konrad S. Krawczyk, Ph.D., founder and CSO at NaturalAntibody SA; Andrew C.R. Martin, Ph.D., emeritus professor of bioinformatics and computational biology at University College London; Melody Shahsavarian, Ph.D., senior director of data strategy and digital transformation of biotherapeutics discovery research at Eli Lilly & Company; and Bernhardt L. Trout, Ph.D., professor of chemical engineering at Massachusetts Institute of Technology.

The discussion covered data readiness, the feasibility of designing complex targets, and the boundaries of current predictive capabilities, as well as practical workflow strategies, where AI already provides measurable impact, and future opportunities, such as immunogenicity prediction.

Links from this episode:

Pharmaceutical Sciences & Chemical Engineering, University of Michigan
University of Michigan

Sanofi

NaturalAntibody SA

Bioinformatics, UCL
Biosciences Computational Biology, UCL
University College London

Eli Lilly & Company

PANELIST BIOS

Andrew Buchanan, Ph.D., FRSC, Head of Discovery, Stealth Mode Biotech
Andrew Buchanan is SVP and head of discovery at a stealth-mode biotechnology company. He has extensive expertise in large molecule drug discovery and development, spanning target selection through first-in-human (FiH) studies. Andrew has led pipeline projects and multifunctional teams that have delivered over 20 FiH drug candidates in oncology, inflammation, and cardiovascular therapy areas, resulting in three marketed medicines to date. A key highlight of his career has been fostering interdisciplinary collaborations with academic and industry partners in AI/ML, PKPD, and translational biology. These efforts have led to new partnerships and over 50 original publications and patents. As a science leader, Andrew is passionate about mentoring early-career scientists and advancing the development of tomorrow’s medicines.

Norbert Furtmann, Ph.D., Head, Biologics AI & Design, Large Molecules Research, Sanofi
Upon finishing his studies in pharmaceutical sciences, Dr. Furtmann pursued his interdisciplinary doctorate thesis in computational life sciences and pharmaceutical chemistry at the University of Bonn, focusing on computer-aided design, synthesis, and biological evaluation of protease inhibitors. After starting his professional career at Merck KGaA as principal scientist, he joined Sanofi in 2016 as lab head for data science within the biologics research department. Currently, Dr. Furtmann heads the global biologics AI and design teams responsible for the computational design and optimization of next-generation protein therapeutics.

Konrad S. Krawczyk, Ph.D., Founder & CSO, NaturalAntibody SA
Konrad Krawczyk specializes in computational methods to develop antibody-based therapeutics. He obtained doctorate and pursued a postdoc at the Oxford Protein Informatics Group, contributing to leading software currently used by academia and pharma industry alike. He was a postdoc and assistant professor of precision medicine at the University of Southern Denmark. Currently, he is the technical founder of NaturalAntibody, company that is focused on development and implementation of computational innovations in the pharma sector focused on antibody-based therapeutics.

Andrew C.R. Martin, Ph.D., Emeritus Professor, Bioinformatics and Computational Biology, University College London
Andrew Martin is emeritus professor of bioinformatics and computational biology at University College London (UCL). He obtained his undergraduate degree and doctorate degree from the University of Oxford. After some time self-employed developing scientific software, including working for Oxford Molecular on software developed during his doctorate program, he joined Professor Dame Janet Thornton's group at UCL. He then went on to be technical director at Inpharmatica before joining the University of Reading and then returning to UCL on 2004. His interests have been in developing software and methods for application to protein sequence and structure. In particular, he has focused on the effects of mutations and on antibodies. He developed one of the first methods for antibody modelling (AbM) and the first searchable antibody-specific sequence database (KabatMan), which has since developed into an integrated workbench for analysis of antibody sequence and structure (abYsis). He is co-founder of abYsis, Ltd. and consults for several companies. He has acted as an expert witness in a number of high-profile patent disputes and is an adviser to the WHO-INN on the naming and annotation of antibody-based drugs.

Melody Shahsavarian, Ph.D., Senior Director, Data Strategy & Digital Transformation, Biotherapeutics Discovery Research, Eli Lilly & Company
Melody Shahsavarian, Ph.D. is senior director of data strategy and digital transformation at Eli Lilly. Throughout her career, she has worked on development of different technology platforms for biologics discovery, such as implementation of immune in vitro phage display antibody libraries, droplet microfluidics for high-throughput sequencing of antibodies at single cell level, and next-generation-sequencing for immune repertoire profiling. She has a bachelor’s degree in bioengineering from the Jacobs School of Bioengineering at University of California San Diego and a master’s degree in biotechnology from Joseph Fourier University in Grenoble. During her doctorate program at the Enzymatic and Cellular Engineering Laboratory of Sorbonne University, she studied the presence and implication of catalytic antibodies in health and disease using mouse models and high-throughput in vitro display technologies. In her current role, she leads a team responsible for the establishment and execution of data strategy at BioTDR, in line with the company’s aspirations for digital transformation toward an AI-enabled biologics discovery process.

Bernhardt L. Trout, Ph.D., Professor, Chemical Engineering, Massachusetts Institute of Technology
Bernhardt L. Trout is the Raymond F. Baddour, ScD, (1949) professor of chemical engineering at MIT. He received his bachelor’s and master’s degrees from MIT and his doctorate degree from the University of California at Berkeley. In addition, he performed post-doctoral research at the Max-Planck Institute. Trout’s research focuses on the development of advanced manufacturing processes and rational tools for formulation and product design, primarily liquid formulations, but also lyophilized formulations. A major aspect of his research focuses on developing both microscopic and macroscopic models to design stable formulations efficiently. In addition, he is co-chair of the International Symposium on Continuous Manufacturing of Pharmaceuticals. He has published over 200 papers and has 18 patents issued or pending.

MODERATOR BIO

Peter M. Tessier, Ph.D., Albert M. Mattocks Professor, Pharmaceutical Sciences & Chemical Engineering, University of Michigan
Peter Tessier is the Albert M. Mattocks (Endowed) professor in the departments of chemical engineering, pharmaceutical sciences, and biomedical engineering, and a member of the Biointerfaces Institute at the University of Michigan in Ann Arbor, MI. He received his doctorate degree in chemical engineering from the University of Delaware (2003, NASA Graduate Fellow) and performed his postdoctoral studies at the Whitehead Institute for Biomedical Research at MIT (2003-2007, American Cancer Society Fellow). Tessier started his independent career as an assistant professor in the department of chemical and biological engineering at Rensselaer Polytechnic Institute in 2007, and he was an endowed full professor at Rensselaer prior to moving to the University of Michigan in 2017. Tessier’s research focuses on designing, optimizing, characterizing, and formulating a class of large therapeutic proteins (antibodies) that hold great potential for detecting and treating human disorders ranging from cancer to Alzheimer’s disease. He has received a number of awards and fellowships in recognition of his pioneering work: Pew Scholar Award in Biomedical Sciences (2010-2014), Humboldt Fellowship for Experienced Researchers (2014-2015), Young Scientist Award from the World Economic Forum (2014), Young Investigator Award from the American Chemical Society (2015), and NSF CAREER Award (2010-2015).

TRANSCRIPT

Welcome And Panel Introductions

Announcement 0:02

Welcome to The Chain, the podcast covering the lives, careers, research, and discoveries of protein engineers, scientists, and biotech professionals. In today's special episode, we hear a panel discussion on near-term challenges for ML and AI and biotherapeutic R&D, recorded live at the PEGS Summit in May in Boston. Peter Tessier, Albert M. Maddox Professor of Pharmaceutical Sciences and Chemical Engineering at the University of Michigan, moderates the conversation. Here he is, introducing the panel.

Peter Tessier 0:35

So before we get started, I just want to introduce this distinguished panel. So first, Melody Shahsavarian, who's Senior Director of Data Strategy and Digital Transformation at Eli Lilly. And her group really is involved in helping make data and models available and usable for both wet lab and dry lab scientists across biologics discovery and optimization. Andrew Martin, who is emeritus professor of bioinformatics and computational biology at University College London. Of course, a lot of us know him for he's a co-founder of, I call it Absis or Abesis. He can tell us which way it should be said. And he's worked in this field for a very long time, roughly 40 years, and was involved in very early application of AI in the area of antibody structure prediction and so on.

Peter Tessier 1:28

Andrew Buchanan is senior vice president, head of discovery in a stealth mode biotech, developing new antibody molecules and clinical development. He's worked for 25 years in this area of antibody discovery and early development. And he's actually been working on AI and ML for over 10 years now, both in with investigators in industry and academia.

Peter Tessier 1:49

Norbert Furtmann is head of biologics, AI, and design of large molecule research at Sanofi. And he works closely, closely aligned with computational design optimization of protein therapeutics, including VHHs and multi-specifics. Bernhardt Trout is a professor at MIT and has extensive work in the area of molecular engineering, protein therapeutics, formulation, and machine learning. And a lot of us know his group for contributing to many ML methods predicting antibody properties like aggregation, viscosity, and so on.

Peter Tessier 2:24

Konrad, I'm gonna say his name wrong, so let me take a minute. Krawczyk is founder and CEO of Natural Antibody, a company focused on AI ML driven biologics engineering. He's also assistant professor at University of Southern Denmark, and he's recognized as uh expert in the area of computational antibody design. Okay, so let's get started. And again, my request to the panel is let's make this practical. Let's try to make our thoughts something where we can have the audience leave this panel smarter and sort of better, sort of ready on Monday morning to sort of start thinking about how to change and think about how they're using AI and ML.

Where AI Works And Where It Fails

Peter Tessier 3:08

Okay, so opening question. And before I ask a question, please let's make this sort of specific. I'd like to avoid sort of a lot of theoretical thoughts and try to make this as specific as possible. What's one biologics R&D problem where AI ML is already generally useful today? You're using it, it's useful, it's changed the way you do your work or the work of your colleagues that you're involved with. And one what's one area where we're just not there yet? We could say it's hype, we could say it's just not ready, however, you want to say that. So, what is a practical example of where it's working and where is it not working yet? Who would like to start? Konrad?

Konrad Krawczyk 3:56

Right, it works. So I would definitely say that for lead optimization, for instance, that's where it can actually work. Yeah, because if you even if you have a few data points, but if you I mean like you know tens, like you know, maybe hundreds, then you can actually iterate to make your biology better. Yeah, so that's where it does work. One area where we know that it is, let's say, if I say diplomatically challenging, is mostly with clinical stuff. Yeah, because you know we do look at uh data from clinical trials, we look at the readouts from the different cohorts, and it is extremely difficult to forecast like how a biologically would fare in clinical trials. So that's where I would say that it's very challenging.

Peter Tessier 4:43

Okay, and can you be a little more specific on the clinical part? What are we talking about? Immunogenicity, efficacy, safety, anything in particular that comes to mind?

Konrad Krawczyk 4:53

Yeah, so initially we focused on immunogenicity, yeah. And the challenge was of course just you know data collection. Yeah, because even if you do have different data points for different cohorts, it is not given that the data are going to be comparable. Yeah, and if you want to train the model, then of course you do want a uniform, essentially a comparable distribution between them. Yeah, so that was the first point, yeah. But then also with any other readouts, like you mentioned, like they are extremely, extremely dependent on the metadata. Yeah, so we find that it might be easier to look at the metadata of an indication that like in your targeting to actually forecast like you know where the issues might lie, not in the actual sequence, yeah, quite you know, controversial.

Speaker 5:38

Okay, Norbert?

Norbert Furtmann 5:40

So maybe I start with where we still see challenges, and I'm very much aligned with what Konrad just said. But I mean, maybe even to go a bit further in the workflow, I mean predicting function is I think incredibly tricky. I mean, many of the the tools and programs we are using is kind of directed for affinity prediction, which by itself I think is a holy grail problem to have generalizable affinity prediction. But in the end, what we are aiming for is not only binding, it's like interference with a specific pathway. And I mean, many things can be explained by structure, but not everything. And I think specifically prediction of functional readouts is something where there is lots of room to further improve. And I mean, going to things which are working well, I mean, I think it's not all solved, but in the domain of developability predictions, I think there was lots of progress.

Norbert Furtmann 6:28

And I think there are quite some models which can help based on pre-trained data sets on stability or aggregation to filter molecules and where AI can provide value kind of to pre to replace wet lab experiments by predictions or at least to filter so that you have to test less molecules. And another part is I think diversification where AI can help. I think I mean to get to more diverse panels in terms of, I don't know, addressing different epitopes in early selections, or to make sure that you have like a broader distribution of general biophysical properties of your molecules. I think yeah, AI or computational tools can do a good job and provide an additional perspective over like what we can get from the wet lab readouts.

Peter Tessier 7:10

Could could I push you a little bit on the this? You you mentioned developability, and that's a place where you're seeing you know impact and it's working. You know, how how how well is it working? Are you are you making actual decisions based on these models? Is it changing what's what you're advancing? How how confident are you? Could we push on that a little?

Norbert Furtmann 7:32

I think it's hard to give a general answer because it's like it depends like which property. Like, I mean, we might have properties which are better understood and where we have more data, where we have data sets which have higher quality. So, for example, let's say thermostability. I think we are quite good already in selecting molecules or filtering molecules based on predicted thermostability. We also what we presented here at the conference is a model for multispecifics colloidal stability prediction, which we see is a rather simple model based on surface patch information, where we can get like a quite solid correlation from the predicted colloidal stability to the experimental colloidal stability. And specifically, I mean in the space of multi-specifics where the design spaces are huge, if we have like solid models, it can simply help to explore the huge design spaces and to limit and de-risk the panels of molecules we then actually put into experimental testing. But there might also be other properties, or there are other properties, where we failed to build models to predict those biophysical properties. And I think it depends on the quality of the data, on the diversity and how well the property is understood. Can we get the right molecular descriptors if it's like sequence PLM derived, if it's structure derived to get that correlation? But as compared to properties like affinity, where you have like that antigen layer, I feel confident that with improvements in high throughput technologies and quality and size, diversity of the data, that the advancements we see will further continue, while other areas like function and affinity predictions are much more challenging.

Peter Tessier 9:08

I think one you know one practical implication here is are you is the field going to get to a place where we stop measuring certain properties because we're confident enough in predicting them? Or is the field gonna get to a place where they're doing many fewer experiments because the confidence in the models is high enough that that affords. Could you comment on that?

Norbert Furtmann 9:30

I mean, I don't see us replacing so fully replacing the experimental testing with predictions as of today or in near future. I think we are more like, I mean, uh we are very much convinced that you need a tight integration between WetLab and Enzilico, like in iterative loops to learn from each other. But I think where we see benefit is like de-risking, so that we could like select based on the prediction molecules which go into experimental testing where we do not see the surprise afterwards. Okay, like we have a good binder, but in the end, like stability fails, aggregation fails. So I think we can move ahead more high-quality molecules than in the past, but we still need to test and confirm. And I also that's very think that's very important because we see progress in data sets, but we need to keep testing and funnel back that information into the models. Okay.

Peter Tessier 10:17

Melody, do you have any thoughts on that or more broadly, what's working and what's you know, where is the biggest gap right now?

Melody Shahsavarian 10:26

Yeah, I think I agree with Norbert. What's has you know proven to work is this triage in step where we can uh we can implement AI both for uh you know predicting attributes that we're looking for the molecules as well as better having the better diversity based on structure prediction. So I to your question of you know, has it do we have proof of this working? I'll mention that we did actually this study. It's it's a few years before my time that this AI-enabled sort of intelligent triaging workflow has been established in the Discovery Organization in Lilly. And they did a study to look at molecules that get to development and their sort of success rate, and significantly the the number of uh you know uh surprises or or liabilities that development gets has been reduced uh during this time. So there is definitely proof that uh it's it's impactful.

Peter Tessier 11:19

Could I ask you just when you talk about triaging, you know, what are you considering? Is it function? Is it affinity? Is it developability? What what what does that triage look like?

Melody Shahsavarian 11:30

So it's developability, yeah. We have so developability, but also diversity, you know, trying to get larger diversity, uh both sequence space and also epitope space.

Peter Tessier 11:42

Okay. Absolutely. Yeah, and where do you see the opportunities or the missing?

Melody Shahsavarian 11:46

Yeah, the opportunity, I think we were talking about this uh you know before the panel started. I really would like us to be able to apply these sort of methods to more complex molecules.

Peter Tessier 11:55

By complex you'd mean multi-specifics. Do you mean non-antibodies?

Melody Shahsavarian 12:01

What could you multi-specifics, perhaps conjugates, you know, ADCs, arcs.

Peter Tessier 12:06

Okay, thank you. Andrew, any thoughts on this or where you see where things are working or where the opportunities are? And by the way, I love your socks. He's got antibody socks on, okay? So very appropriate. Thank you.

Andrew Buchanan 12:25

Good thought. In essence, the technology is brilliant, uh, but the limitations are what yeah, in trying to apply it, it will only work if you have the appropriate data. Where teams tend to have the appropriate data is in developability because it's a generic property of the IG fold. You could be really challenging and say, are these molecules really learning anything? They're just neighborhood, anyway. But it it is really helpful in the triage because you can design hundreds, thousands, tens of thousands. You're not limited in what you can design computationally, but they are really useful for triaging down. There are really good examples of classifier models. That's really that's real world useful, but even that for aspect that's only for aspects of developability. There are other important aspects. But even in that, how well do we understand how well these uh a chow cell will make it? You might do self-free expression, you might do hack expression, but that's not chow expression.

Andrew Martin 13:23

And in research, we will do transient show. That's not a stable chow cell line. And these are data gaps that are really worth exploring, but they're also uh getting your institutes to buy into doing that, to generate essentially machine-ready data. I think no, but you had a you talked about it a bit in your talk. Actually committing to generate this data. You there are papers about it, but those papers don't have they're hardly ever published all the data that you really need to implement it. So even between our institutes. But those are some of the challenges in that one aspect. The other aspect about structure is what's has anything really changed in the amount of structural data that's available in the public domain? And we need to get beyond the static view because we essentially take molecule, we we understand we take some aspect of an antibody's biology or pharmacology out of the system. We study it by structure, by sequence, by function, and then we try and insert it back into the context that it all exists in. And that's the kind of pinch of reality. And part of that in the structure space is dynamics and finding ways to get beyond static views.

Peter Tessier 14:33

Maybe

How Reliable Developability Predictions Are

Peter Tessier 14:34

just push back for a minute. You know, it seems like there's some advocacy for, you know, we're making progress in the developability space. Do you or anyone else on the panel, what what would be the hardest developability property, or what are the hard ones right now where even though the field's making progress, what are the what's still the hardest for us? Do you have a thought?

Andrew Buchanan 14:53

I I have a thought. Most of the times in the industry, we run platform processes, those processes work really effectively, either in discovery and in development. But when you enter the world of multi-specifics, where we don't have, although we're getting good at it, we don't have such a track record. And I think the a simple aspect of understanding expression in stable shows, because stable shows are metabolically stressed in a way that our transient systems aren't. I think that's a fascinating area to explore. Yep.

Peter Tessier 15:27

Okay. Bernhardt, can we take it to you, especially as a developability expert? What's your perspective? You know, you've been working in this field a long time. Where do you see the field? Where do you see success and where what's missing?

Bernhardt Trout 15:41

Sure. So it's good to hear that they're used for screening or triage, whatnot. I think they started these methods started being used in the 2010s and have been propagating more and more. Since you said to be practical, I'll I'll say a few practical things at the at the end uh regarding that. And again, by developability, we mean aggregation, sorting, triage, viscosity, and various chemical liabilities, damagation, oxidation, and whatnot. What's missing is we can do, as you said, triage with maybe 80% is a rough estimate accuracy. In our lab, we keep working to get with more complicated models to 90%. But I I don't think that's generally the case. So, and probably to actual more than just sorting or triage, we need to get above 90%. So I think that's challenge number one. And then as far as the next step, I would say formulation design. So we started doing some of that, and I think that would be something that generating data, machine learning, you generate actually a lot of data when you do formulation work. So I think that could be an exciting new area, which has already begun to be done. As far as the practicality, I'll just mention two things. One is that these sort of methods are generally available on a lot of off-the-shelf software. We work with uh Biosim Pipeline Pilot, but I think most of the major companies have this. So there's not a huge investment because your companies probably already have it, the software.

Peter Tessier 17:25

Could you repeat the name of that again? It's

Bernhardt Trout 17:28

oh, Pipeline Pilot is the one that we use, but there are many, many others.

Andrew Buchanan 17:32

Even Amazon, the everything bookstore can now deliver you said software?

Bernhardt Trout 17:36

Yeah, you can go from Amazon. Yeah, exactly. The biggest challenge that I've seen, I've worked with companies in collaboration for over 20 years. The biggest challenge is having made a decision like Metimmune did many years ago, that they're gonna commit to this and actually committing resources that is resources of people. I've been involved with so many companies that said we want to implement developability, this person's gonna do it, and then a month later, two months later, they're called away on another high priority project, and then it kind of fizzes away.

Structure, Epitopes, And Affinity Reality

Bernhardt Trout 18:12

Okay.

Peter Tessier 18:12

Thank you. Andrew, do you have thoughts on this or where you see the where things are working and where the opportunities are?

Andrew Martin 18:20

Sure. So not in the antibody space, but in the general protein space. I think we really have now come to the stage of solving the protein folding problem with uh alpha fold type software. It's really is working very well, but it still doesn't work so well for the CDRs, which is the bit that we're interested in in terms of protein structure prediction, and particularly for CDR H3. And despite this idea that a lot of people have that uh antibody CDRs are very flexible, this is really not generally true. We published a paper on that uh a year or two ago. So, yes, dynamics is important, but uh it's probably not as important in terms of the uh the structure of the CDRs as people tend to tend to think. There are some exceptions, of course, there are always exceptions in biology, and some CDR H3s are more flexible, but in general they're not. So that is still a a problem and an opportunity in in modelling antibody structure, which is something I've worked on for 40 years. So that's one area. Another area, well, we've everybody's spoken about developability, and that's something that we've worked on recently as well, and which uh I'll be talking about in my talk tomorrow. But uh also related to that, a big problem uh if you if you if you have a panel of antibodies that you've raised against a particular protein and you want to know the ones that actually bind to the epitope that you're uh interested in. So predicting epitopes is is a big problem. And uh there was an independent assessment done a few years ago which really showed that none of the methods worked. We like to think we have now got slightly beyond that using sort of fairly novel AI approaches, still not brilliant, but it's a lot better, particularly when you know the antibody that you're you're thinking about using. And taking that a step further, then can you rank antibodies on affinity? So you know affinity measurements are difficult to be consistent across different uh assays and so on. But that's why we decided not to try and take a sort of regression approach where you're actually trying to predict numbers, but to predict ranking. And we have had some success with that. So I think you know there are a number of areas where things are are improving at the moment. Can I make a comment on a fit on this affinity question?

Peter Tessier 21:08

I think a short one, please.

Andrew Buchanan 21:10

Yeah, going to ranking is good, but I think this is where rather than just the AI ML bubble, it's actually integrating with physics. And I don't necessarily agree with your comments about dynamics. Molecule flexibility is molecule flexibility is key. We just because we're so used to static views, yes, CDRs might not we look at them but only when they're solved in structure, and we know we can see them. There was some brilliant work that Charlotte Dean's group have done repeatedly, get to trying to get us to look beyond the static. What its impact will be, I don't know. But on affinity prediction, I think that's another problem really worth working on, and this is where physics will help, particularly the kind of FEP approaches that chemistry pioneered, but now these methods are moving into biological into antibodies.

Peter Tessier 21:58

I want to transition here to The topic of

Data Becomes The Real Bottleneck

Peter Tessier 22:01

data. In our meeting, when we talked before this, I actually started to get worried this whole panel would be just about data because everybody has very strong feelings here about data. You know, and I think I think a lot of us understand that, you know, if we accept data, not the models, model architecture, computational methods, just data is now the bottleneck for AI and in biologics RD. The question here is, you know, what do we do about it? Right? What specific data is most urgently missing? Who's responsible for generating it? Is it a good idea that everybody's generating internally and keeping it? Should there be sharing? How do we curate it? So, what I'd like to do is go through, I think everybody has their opinions about data here. And I'd ask for non-overlapping opinions. So if somebody makes a comment about something, then okay, let's move on. Because there's so many, you guys had very strong feelings about this in many different aspects, right? Everything from how the data is curated, collected, SOPs, sharing, and so on. So, Konrad, I'm gonna start with you. When this question of data comes up, what's your first thought about what is needed next? If we're providing practical advice to people here, you know, actionable advice, guidance for the future, where would you start?

Konrad Krawczyk 23:22

So essentially, I think it's not a secret that like we are collecting a lot of data. So we started with public data sets, just cleaning them up.

Peter Tessier 23:31

In what kind of data? Just be a little specific. What are you collecting by the data? Everything.

Konrad Krawczyk 23:35

Okay, yeah, that's everything, and essentially, you know, structures, literature, and patents, and the reason for that was that we want to see where the gaps are. Yeah. And quite clearly, like in the one very big gap is developability data. Yeah. It's not that there aren't few, only few data points out there, yeah, but they are very different conditions. Yeah. So what I would say is essentially walk before you can run. Yeah. So it's great to focus on like in a lot of formats, it's great to focus on different conditions. But if you just like you make a foundational data set for developability, just one condition, yeah, and like we can predict it like in very, very well, yeah, then this could be transferable to other problems. Yeah, because this is something that like we have seen. If you have if you do train your model on a set of developability conditions and developability properties, but as a single condition, yeah, and then you apply it to smaller data sets, yeah, that might have been done at slightly different conditions, yeah, then there is predictive power, yeah, where it wasn't there before. Yeah, so like walk before you can run.

Peter Tessier 24:43

Okay. Norbert?

Norbert Furtmann 24:44

Yeah. I mean, I agree. I see the same that I mean we need to understand the data and kind of get to kind of an understanding of the readout, which shows us that we predict something meaningful in the end. But to move away from what Konrad has been saying, I think we see a gap of when utilizing legacy data versus what kind of data is really the most informative one to train AI models. So I think it's definitely worth investing into curation of legacy data. But I think on top of that, investing into data generation specifically to feed AI systems outside, let's say, of the context of portfolio programs. I mean, in our portfolio programs, I mean the goal is to be as fast as possible, to walk up the hill, like to get to the molecules with ultimately check all of the different boxes in terms of function, CMC, developability, whatever. But in the end, like to get to good AI models, I mean, you need to have diversity in the data sets. You need to have a good balance between well-behaving and not well-behaving molecules. And I think getting there, it really it really helps thinking like or changing kind of the mindset from how do we generate data within programs and how do we generate data which could have the best benefit or the most uh information to to improve our models. And I think this is kind of a balance and the and a mindset change uh which would be very much beneficial to advance how we are using data and how data could contribute to increase the performance of our models.

Announcement 26:15

Are you enjoying the conversation? We'd love to hear from you. Please subscribe to the podcast and give us a rating. It helps other people find and join the conversation. If you've got speaker or topic ideas, we'd love to hear those too. You can send them in a podcast

Legacy Data, Metadata, And Standards

Announcement 26:31

review.

Peter Tessier 26:31

You know, one quick question on this: are we making a mistake that we're still using legacy data? Is that is that interfering with progress? As the world advances and methods are better at generating data. Do you see cases where we should not be using legacy data or it's biasing us in sort of does anyone have an opinion on this?

Andrew Buchanan 26:53

I I definitely have an opinion. No, I don't think it's hampering teams. All these methods move on. What when you're generating data, the most important so there's data for platform foundational models, but data for your candidate drug that you're going to take to phase one, make it as translationally relevant as possible. So for an example, uh, one way we're hampered. People, when do people what temperature do people measure? Do groups do we all measure our affinity at? That would be room temperature because it's convenient. That is not translationally relevant. You need to measure affinity at 37 degrees Celsius because that's where your drug is going to work. So there probably is hamper, we probably are hampering ourselves on that old data, but technically it's more of a challenge. Okay. So that's a practical example. Do your affinity at 37 and feed it into the models.

Norbert Furtmann 27:40

I mean, I think we should not forget about legacy data. I think it's there and it might be big treasures which we can use. But I mean, you could see that maybe data grew over time. Essay conditions have been changed. So you need to figure out is it comparable what I have here? But it might still be a valuable starting point. But then from that starting point to if you define the goal, how do I improve my model? Then maybe the best way is not to just keep collecting data from programs, but to think about, I mean, for example, utilizing active learning strategies, like what would be the next data point to generate, not in terms of I will generate a good molecule, but I will generate information which helps to improve some more.

Peter Tessier 28:17

Melody, do you have thoughts on data different than that's been shared already or different areas we should think about?

Melody Shahsavarian 28:24

I think it's been touched upon, but I was going to mention standardization and sort of having uh common ontologies when we capture data. So establishing that and ensuring that you know the data that we capture moving forward complies with the so you know, uh, for example, uh enriching the data with all these metadata around conditions, you know, like how the data has been generated, methods that have been used in the generation data. So that's very important within an enterprise, but also like across industry, right? We if we get to a place where uh we have common data standards across an industry, it will be you it will uh you know open huge uh opportunities for cross-sharing of data and some of the efforts uh that are ongoing to collect.

Peter Tessier 29:06

Is that realistic, do you think? Is it realistic for us to get to a place where we have common standards for developability data, for example? I mean, I don't know the answer. I just is that does it feel realistic?

Andrew Martin 29:20

Can I just chip in on that? That in the field of microarrays, that was something that was done fairly early on. Lots and lots of groups working in microway arrays sat down together and developed an ontology for for storing all the relevant information. So I think it's really about motivation and getting people to sit down together. As one of the people in microarrays said, they sat and argued over pizza for hours until they uh they wouldn't be let out of a room until they came to a conclusion. I think that's what we need to do really for all these sorts of data as well.

Andrew Buchanan 30:01

I think there is a will in industry to do it. You see it in some of the federated learning opportunities. I'm no longer involved in them, but that there are at least two, AISB and FAIT, where big farmers, different biotechs, are attempting to federate and share data. But it's going to come back to Melody's point, which is so valid in terms of how comparable is our data. What can we do as an industry to agree a standard? This is a good way to report data. How can vendors who provide kits get data standards and how to import data out of the things? But I think industry standards are so important. The telephone industry has them. We don't in discovery, but development do.

Peter Tessier 30:42

Bernhardt, do you have a thought?

Bernhardt Trout 30:44

Sure, I have a thought. I agree with the points on standardization and whatnot. I have a suggestion, though, of something that could be done today without standardization, or actually sort of an auto-standardization, but someone needs to collect the data. So we can go company by company and interview the scientists who did the development projects and then categorize them. Was it extremely difficult? Was it pretty difficult or very difficult or just difficult? And you could categorize them and that could potentially help with screening, but someone has to do that.

Andrew Buchanan 31:21

So they do that in the image groups are very good at doing that. I haven't seen it done from a CMC perspective. But in these federated efforts, that that's essentially what they're doing. People have strong opinions, but it's finding ways to there should be more dialogue in this space to try and get as somebody said sit over the pizzas and work it out. Sorry, Andrew, your commentation.

Peter Tessier 31:41

Can you can you just expand that just so we're all clear on that? When you say is it difficult, extremely difficult, can you be more specific about collecting the data? About what what what flesh that out a little bit?

Bernhardt Trout 31:53

Ah just going through the whole development process.

Peter Tessier 31:56

I see, I see.

Bernhardt Trout 31:57

So looking at sort of molecules and how difficult they were to develop. Right, imagine interviewing,

Peter Tessier 32:01

I see. Right.

Bernhardt Trout 32:02

Imagine if you could then use that for predictive methods when you start.

Peter Tessier 32:07

That makes sense. Very interesting. Andrew, do you have uh thoughts around data?

Andrew Martin 32:12

Yeah, again, I mean I think the the problem certainly working in an academic setting is getting hold of large enough data sets, particularly of things that have failed. And you know, all these companies are sitting on huge amounts of data related to molecules that they haven't then taken forward. But those data are actually invaluable for making predictions and things. So uh obviously the data on things that have succeeded is is also hugely useful. But you know, those data do tend to be published somewhere, uh, as Konrad was saying, you're collecting from patents and so on. But for the things that have failed, that's much harder to find.

Andrew Buchanan 32:59

So on the publishing fail data, there are one or two journals now that are committing to publish essentially it's to deal with the reproducibility crisis. So there are ways to do it, but again, people don't do it.

Andrew Martin 33:10

Yeah, people don't see a great advantage in doing it, but it's much more of an advantage actually to the whole community. But again, a some sort of federated database to store all these things would be absolutely fantastic.

Peter Tessier 33:25

I oh we need to transition here. How about a 30-second comment? We're gonna transition.

Norbert Furtmann 33:30

I don't want to make a comment, but maybe if there are any quick thoughts on synthetic data generation. I mean, we have all thinking about the gaps is like collecting the data, generating the data. But I think, like, for example, uh no not putting out an opinion here, but the advancements in structure prediction tools. I mean, there's also gap, how slow structures are still growing to get to better structure prediction, but with the power of alpha for like structure prediction tools. So any thoughts on like augmenting experimental data with synthetic data here?

Peter Tessier 33:58

Anyway, yeah, and let's let's uh we're gonna sort of get there. So we need to we need to get to the elephant in the room,

De Novo Design Without The Hype

Peter Tessier 34:04

right? Everybody knows this is an AIML panel, and the elephant in the room is you know, where are we in de novo design? Zero shot predictions. How impactful is this today? Are you using it? Is it being used as one of your main tools in parallel with other antibody generation methods, in parallel with other antibody optimization methods? Is this a good use of our resources right now, or is it proving to be a distraction? Konrad, can we start with you?

Konrad Krawczyk 34:39

All right, so I'm slightly biased here because you know I do work with companies and we do consult on actually like a usage and deployment of such AI tools. Oh, we'd like to hear your bias. Yeah, so my bias is essentially that those tools can work sometimes, yeah, but like they do not just like work if you just download them and press the button. Okay, yeah.

Peter Tessier 35:02

So should should the people in this room be using them after your experience and all you've seen? Should they be being used today?

Konrad Krawczyk 35:09

I would definitely say they should be explored, yeah, because in certain cases, like you know, they can produce answers, like they can produce binders faster, or at least hypotheses faster than your, let's say, like, you know, a big library campaign. Yeah. On the other hand, usage of such tools, like you know, should not actually stop you from doing what you can do in the lab. Yeah, so like this is actually a conversation that like we have with teams, yeah. Like we're looking at their workflows, and if using such tools would actually prevent them from doing what they have been doing always, yeah, then we just say, like, look, that's perhaps not for you. Yeah. Okay. So looking for opportunities.

Peter Tessier 35:51

Okay. Norbert?

Norbert Furtmann 35:53

I mean, we are very much committed making it work. I mean, like as a to to give like an insight. I mean, the majority of our programs is driven by traditional discovery approaches. But I mean, we are very much committed. We set up a specific group just focusing on the novo and we use it in parts of our programs. I don't see it like at the moment that the novo will replace all of the other discovery technologies, but it will be another tool in a toolbox. So you might have like even like in past programs, that maybe immunization is not working, and you go with synthetic libraries. So now you have maybe immunization, synthetic libraries, and de novo. So multiple tools, and I would definitely recommend. I mean, we see the future that this technology will evolve, that it will generate value. It's simply like a completely different approach how to tackle to tackle the problem. But we are not convinced that it's like the magic button which you have been mentioning. So I think you should build know-how around it. And instead of thinking about you have a pipeline, and that pipeline works for each and every target, for each and every epitope, you should build it in Novo Toolbox with different complementary methods, which like where one method maybe can like overcome the bias from another method, and kind of you build expertise, you have the experts in-house, you have exp you have access to the right tools, and then you customize it to the problem.

Norbert Furtmann 37:08

So we are very much committed, we see it as a future, but I think we also like would like to have a realistic view and that it's like a complex, tricky technology where you need to build the expertise, where you need to know what method to use when, and then yeah, we we see successes. So, I mean, to be to get very concrete, I mean, we have targets where we struggle to to let's say generate binders with traditional approaches, and we were successful with the Novo approaches. We could like evolve the binders, so it's not like zero shot and you have like a highly affine binder, but you get to weak binders, you evolve them not only in terms of binding, but also in terms of function, and you end up in a functional building block originating from the Novo, where you could close a gap where traditional approaches fail. On the other hand, the novo also fails on target, so it's not like it's a magic bullet and it always solves the problem.

Peter Tessier 37:56

Melody?

Melody Shahsavarian 37:58

I essentially agree with everything that Norbert said. Uh it's early days. Uh, we also have, we see, you know, early positive signal. It's improving, but you know, we're we're also using it as another tool in the toolbox along with uh, you know, other uh traditional methods. There's case studies where, you know, similar to what Norbert said, the the Genovo uh has worked, but where other you know other sort of platforms have failed, hasn't been zero shot, not yet, at least. And in programs that we use it, definitely it's uh you know, it it's it leads to having more diverse, you know, it gives you other different options than other other platforms that we run. So at the end we get a more diverse uh panel of hits.

Peter Tessier 38:40

I mean to me, this seems like a very, very important point. That are you seeing successes in cases where conventional methods are not succeeding, even if it's only hit generation that needs to be opt-further optimized. And you're saying both of you, you've seen that before. That's correct. That seems like a very important moment in the field, you know, when you start to see that. Andrew.

Andrew Buchanan 39:02

Uh epitope-specific design is the holy grail for this space, which a few years ago was nonsense, but now it is actually it can work, as you say, alongside other methods that you choose. It has an appropriate use. I don't use it, so I'll not cover it anymore.

Peter Tessier 39:16

Okay. Bernhardt, I wondered if we could rephrase this a little bit, you know, from a zero shot. Often we're thinking about affinity prediction, but you know, in the developability space, you could think in the same way. Suppose you have a problematic molecule. How close are we to you know being able to predict the sort of you know, a small panel of things that that solves that problem? Do you have thoughts on that?

Bernhardt Trout 39:42

Yeah, so again, right now we're at the stage where we can kind of screen for potential problems as far as actually solving those problems. I actually think that the technology is there in terms of just broadly the descriptors, the the machine learning methods. The issue is again the data generating and then training models. So we're not really we're far away from the point of training the models.

Peter Tessier 40:13

Okay, but probably more case studies and you know, testing and implementation.

Bernhardt Trout 40:18

Well, even within a company, there's lots of information that needs to be gathered together. But that's a project in itself. Okay. Andrew, do you have thoughts around this area?

Andrew Martin 40:27

Yeah, again. So I uh agree very much with uh what others have said, but uh I've always been very skeptical about the uh de novo idea. You're gonna have to do some testing. Uh just relates to a question that came up earlier. Are you going to need to test? Well, yes, clearly you are. And a lot of the papers that have been published on de novo methods uh essentially produce better libraries, which I think is is is good, but it's it's not really de novo. So, you know, there are a couple of papers now that do seem to uh succeed for or much more closely succeed for de novo work. But it's very interesting that uh other panel members have said that it works for things that they haven't been able to raise antibodies against through conventional uh methods because uh the papers that have been published are fairly sort of standard targets. And also taking those sorts of ideas back to what happened early on in 3D protein structure prediction in general, there were a lot of cases where people published things that seemed to work fantastically well. But that's because they tried a hundred different examples and they published the one that worked, the protein that they could work. So it it's difficult to tell with the papers that are really sort of saying we're doing really well to know uh how many failures they've also had. But it's it's very interesting to see that it is providing some success for people. But I don't think it would be a first route of doing anything for for quite some time, yeah.

Norbert Furtmann 42:13

I think you made a very good point. I mean, lots of the things are maybe it's how you define de novo, right? Because some things could be libraries. If you see how many molecules are tested, you could argue like if it's like a hit by chance or if it's a hit by design, if you go those library approaches. And I mean, we could fully agree, and it's also a question, I mean, how hard you try, right? Like, and then I mean, even if something works, like it's still a different if it works or it gives you an edge about the traditional of the traditional discovery technologies. And there I think it's still the early stages, but worthwhile to invest as another component within a toolbox.

Andrew Martin 42:46

Uh absolutely about about the libraries. I mean, there have been lots of companies for some time working on improved libraries. So I was a consultant for one company who wanted to do de novo design, and I said, don't do that, do better libraries. And that's what they did, and they've been very successful. So uh, you know, I think that's it's a differentiation.

Peter Tessier 43:07

Short response.

Andrew Buchanan 43:08

Yeah, I I think these structure-guided informed or structure-enriched libraries are super helpful.

Peter Tessier 43:13

Okay.

Hard Targets And Epitope Specific Design

Peter Tessier 43:14

Okay. I want to change this a little bit from thinking about thinking about targets for a minute. You know, I think that when we hear about de novo and we think about the potential, you know, the question is there are high-value targets. Now, there's some difference of opinion about how much demand there is for these high-value targets, but it's certainly been, you know, there's certainly these attractive membrane proteins, GPCRs, ion channels, and so on that have been hard to target with conventional methods. I guess the question that I have is for those of us that are using this, that are practitioners, are you seeing success against hard targets? Since this is often being portrayed as, you know, the The killer application of de novo is against things that the ant it's an antigen problem in many ways. Very difficult antigen, very difficult to get immune responses, difficult to analyze, or or it's the nature of the target, it's an agonist or something where it's just it's it's complex. I guess are you seeing practical successes there? And maybe some of us that have commented specifically, Norbert, can we start with you?

Norbert Furtmann 44:25

Sure. I mean, I cannot share too much about the the targets in particular, but I mean the example I just just made a few minutes ago. I mean, it depends how you define a challenging target. And I mean, the target, the example I was uh referring to is kind of a target like where it was hard to get an immune response with the traditional approaches, and that could be solved via de novo. I mean, we have experience with more than a single target, so with a panel of targets, but I think we are not there that I could give you a statistically uh significant response. So I mean, we pick the targets not based on how challenging they are, but with relevance for our portfolio programs. And sometimes, like, it's a target because of a challenging uh immune response. Sometimes it's a target because where you know you need to have a challenging mode of action, like argonism. And for this argonism, you know you need to like address a specific epitope. And with the no-hu, maybe that's one of the advantages. I mean, you don't fish randomly, but you can you can go against that specific epitope. But maybe also the success rates as a last comment. So what we see, but also what you see in the literature, is a bit defined like by the properties of the epitope. So it seems to be a bit more tricky addressing hydrophilic epitopes and addressing like epitopes, let's say, where you need to design charged or hydrogen bond-based interactions, then you address hydrophobic, hydrophobic interfaces. True interesting. Melody, do your thoughts?

Melody Shahsavarian 45:45

Yeah, I mean it it's early days, but I would say that the response to your question is yes. We have seen success against GP at CRs, let's say. Again, you know, early days, but there's promise there.

Peter Tessier 45:57

Exciting. Comment?

Andrew Martin 45:59

Yeah, in my working for a previous big pharma, they they also see success, but it it it's again, they're early leads.

Peter Tessier 46:08

Okay. So we're gonna take uh questions from the audience. So if anyone wants to come up, please come to the mic while we have sort of our last question for the panel.

Where To Invest Next

Peter Tessier 46:18

Our last question for the panel is really thinking about sort of, you know, where are we going? What does the future look like? So I'd like like a 30-second sort of response to keep it short. If you could reset the field's priorities for AI, ml and biologics, or if you could just emphasize from your opinion, you know, where do you think the where do you think the most important investments are or should be made right now? You could also go as far if you want to be controversial as what should we stop doing as well, if you have any opinions like that. Okay. But just like what is a takeaway you have? What what would you say for practitioners, people in the trenches, where where should where should they invest or their their company invest, and maybe even where should they stop investing? Konrad.

Konrad Krawczyk 47:10

Yeah, so altogether I would say that investment in looking into like how those computational AI methods can help with the workflows is very important because otherwise, like in there are many tools coming out every week. You know, some of them might be downloaded, some of them might be internalized, but they are essentially gathering dust. Yeah. So if you essentially like you make an effort into like you know, looking at how those methods can improve your workflow, then this is going to be something but but still for the practitioners in the room, that's not easy, right?

Peter Tessier 47:43

Like, how do you actually do that? Because it's it's so overwhelming with the pace of things. So work before you can run, yeah.

Konrad Krawczyk 47:49

Like and try not to do everything, but just like you can do something. Yeah, so we do have methods that even like in very, very simple models. If you have few data points for lead optimization, just like you know, looking at certain readouts, you can make a model out of that. Yeah, it already generates hypotheses, yeah.

Peter Tessier 48:07

Okay, okay. Norbert?

Norbert Furtmann 48:09

So maybe picking picking picking up. I mean, you covered a bit like how to get into it, right? And I mean how how we kind of did it. I mean, talking to the wet lab scientists and identifying what's the biggest gap and where could we help with bringing in silicon approaches and then like prioritizing is that realistic that we could solve that via computational approaches or not? But besides that, like uh as a second take, I mean, I feel at the moment, like we are trying to augment wet lab-driven workflows with computational tools to make kind of like the traditional discovery pipelines more efficient, faster with Inselico tools. If you think de novo and the evolution, how where drug discovery might go, you might rethink kind of that that whole approach and think, okay, maybe it's in the future workflow might be computational driven, AI driven. And how do you build that lab in the loop kind of concept that you customize your wet lab to be the perfect counterpart to the Inzillico tools? And this is where I would invest. Like it's it's two different concepts. And I think we need to rethink how we run our value chain or how would we like to run our value chain in future, considering the advancements in AI and computational tools?

Peter Tessier 49:15

You know, we didn't talk about this a lot in this panel, but you know, the panel had a very strong feeling about this idea, you know, of this lab in the loop of data-driven AI usage, right? And sort of that was a common theme that implementation of AI needed to be paralleled with strong data generation. Melody, do you have any takeaway message or any suggestion for the audience and where to put their efforts?

Melody Shahsavarian 49:42

So I I I guess I mean I've said this like five times. I think I already sort of have a biased uh view, but uh from from my perspective, investing more in complex molecules and data generation is a part of it, but also you know, starting to work on models that will address uh these are you know, Melody was emphasizing the fact that you know often information about the monospecifics is not predicting certain aspects of the biospecifics or multi-specifics. So actually directly generating data on multi-specifics is really important.

Peter Tessier 50:14

Andrew?

Andrew Buchanan 50:15

Generate wet lab data that's relevant to your kind of drug target profile. There was a brilliant example of a paper from a group from Sewell, where they have a new platform you can measure church expression and affinity. They used fairly standard model, machine learning models with structure, and they could predict things that you that would beat an empirical scientist day in and day out. So I think web lab generated data will transform the actual use of this kind of tech because it's data constrained.

Peter Tessier 50:45

And can we just push you on that? Do you see that as it's going to accelerate? I mean, do you see this as a declining enterprise where you know data generation is very important but will become progressively less important as the models get better? You don't agree.

Andrew Buchanan 51:01

No wet lab way that data is going away. If you're an early career person, become good at data science, talk to them. But if you like the lab, the lab is not going away.

Peter Tessier 51:11

And will it diminish some, or you'd push back on that too?

Andrew Buchanan 51:14

I'm also gonna push back on that. Okay, there's some brilliant folks that show you examples that you're gonna push this molecule through. It's a multi-specific biology hasn't seen it before, a choke cell hasn't seen it before. You don't know what's gonna happen. You can't triage your way through it.

Peter Tessier 51:26

Okay.

Bernhardt Trout 51:28

I'll answer your question for those who are developing machine learning methods. We need new descriptors. Currently, our chemistry is hydrophobic, hydrophilic, charge-charge, and various structure factors. Chemistry is probably a lot more complicated than that.

Peter Tessier 51:44

And what what what's it gonna take to do that? It does feel like we need the disruption. You know, you have experience in fields outside of this field. Can we learn something from others, you know, in catalysis? Can we bring things, you know, how how do we get out of the same uh sort of feature perspective we've had for years? Thoughts on that?

Bernhardt Trout 52:06

I think it's uh bringing ideas from other fields is always uh helpful. I I think it's uh being creative, it's in a way the opposite of implementing machine learning is thinking about the chemistry and developing new theoretical approaches to chemistry.

Bernhardt Trout 52:23

Makes sense. Okay. Andrew?

Andrew Martin 52:25

Just a couple of words really that I think the the the strength, the value is in developability. You know, it's it's triaging your your data so that you don't waste time down the pipelines. That's that's really the I think the biggest advantage.

Peter Tessier 52:46

Okay. Anybody else have a short comment, Bernhardt?

Bernhardt Trout 52:49

I would always go with structure-based methods if I could. Okay.

Andrew Martin 52:54

Yeah, I would say echo what's been said already, that it depends, but for a lot of things, the power of protein language models and the antibody protein language models really encode structural data in some sort of hidden way within the sequence information. So I think that can be useful for an awful lot of things. But for certain things we find that using structure in prediction, so things like epitopes and affinity, I think you need structure.

Peter Tessier 53:25

Thank you. Well, unfortunately, we've come to the end of our time. And you know, I think it would be nice if we all thank the this very thoughtful panel of scientists. Thank you so much for inspiring us.