Technology

Not so elementary, Watson: the roadblocks for AI in pharma

The high-profile flaws shown by IBM’s Watson for Oncology in 2018 have dented the reputation of AI-driven systems for healthcare and pharma R&D. So what are the roadblocks standing in the way of finding success in using machine learning to drive clinical research and drug discovery? Chris Lo reports.

Credit: Elsevier

Wherever you look in today’s rapidly evolving pharma industry, the shadow of artificial intelligence (AI) isn’t far away.

The high failure rate for experimental drugs, coupled with the sheer cost and time commitment required to back a drug candidate through R&D and commercialisation, makes the promise of AI all the more enticing to drug developers, particularly in data-heavy applications such as drug discovery.

AI brings with it the prospect of using complex machine learning algorithms to screen for disease targets and drug candidates with a speed and accuracy that would be impossible for human researchers, potentially saving pharma and biotech firms billions in drug development.

Elsevier’s consulting director of text and data analytics Jabe Wilson. Image courtesy of Elsevier

The AI hype bubble

The attractiveness of the proposition has been borne out in the stacks of pharma and biotech investment that has been flowing towards AI drug discovery tech and machine learning-focused start-ups in the last few years.

From Merck’s AI partnerships with Numerate and Atomwise to GSK’s $43m collaboration with Exscientia and the rise of AI-centric scientific innovators such as BenevolentAI, pharma AI has become a lucrative business, even before substantial evidence of its impact on drug discovery has been fully explored.

As with any exciting up-and-coming technology, AI in pharma has been prone to overhype, with the complex realities of using machine learning models in the drug development process still unable to compete with the extravagant promises coming from the tech world.

Pharma AI has become a lucrative business, even before substantial evidence of its impact on drug discovery has been fully explored

“A little knowledge is a dangerous thing,” says Elsevier’s consulting director of text and data analytics Jabe Wilson, a 30-year veteran in the AI field. “I think some of the generic AI systems have not really reached their potential in some cases. I know some stories about pharma companies that have worked with different platforms, which have then found out they’ve had to do a great deal of work in curating the information themselves to feed into the platform.

“There’s a lot of hype being talked in the business literature about AI tools. They certainly have potential to speed up the performance of looking for patents, of sifting vast amounts of data. There’s the potential there, and then where that hype meets the road is when you have to put teams together to really create and tune the tools for the context and the use case. That’s where there’s been a challenge.”

Pharma AI: no free lunch

In recent months, there has been an increasing tendency towards events in healthcare AI that prick the hype bubble. At the beginning of 2018, mathematician and founder of AI expert network Startcrowd Mostapha Benhenda published a piece on Medium’s AI Lab that criticised overhype surrounding AI systems for drug discovery, arguing that “pretty often, AI researchers overhype their achievements, to say the least”. The piece presented examples of “overhyped” AI research from AstraZeneca, Harvard and Stanford universities and Insilico Medicine, all of which, he argued, contained flaws limiting their impact for drug discovery.

And then, of course, there was the high-profile failure of IBM’s Watson for Oncology application, a cognitive computing cloud platform designed to sift through patient data and medical studies to provide treatment recommendations for cancer patients. As initially reported by STAT News in July 2018, internal IBM documents revealed that the system had a tendency to return “unsafe and incorrect treatment recommendations”. The brunt of the blame for the failures was placed on the raw data fed to Watson for training purposes, which included hypothetical patient data rather than real-world cases.

The worst thing is that you can end up with biased data that leads you to make biased predictions

“I think that’s the critical piece,” says Wilson. “Where these generalist systems can fail is not having the components. It could be not having the dictionaries and the ontologies necessary to exert the semantic data that you need, or not having enough of the content to process through those ontologies to get your semantic data. We’re [also] very keen to try and help our customers be aware of the bias in terms of the data that’s input to the models. That’s really critical, because it’s one thing to have bad data leading to you not being able to make predictions. But the worst thing is that you can end up with biased data that leads you to make biased predictions that negatively impact certain populations.”

It’s an issue that brings to mind Wolpert and Macready’s ‘no free lunch’ theorem in machine learning, which states that “any two optimisation algorithms are equivalent when their performance is averaged across all possible problems” – in other words, no general AI system – like IBM’s Watson – offers a short cut to solve all problems, and will be outperformed by models designed specifically for a specialist purpose.

“One of the things I’m hearing more and more as I speak to people in the industry is that an important aspect of working with machine learning models is choosing the right architecture, choosing the right type of machine learning model, as well as the training data,” Wilson says. “People are interested in whether their partners – the suppliers or whoever – can help them choose the right machine learning model for their particular problem.”

The human element in the AI system

These sorts of issues have been in the front of Elsevier researchers’ minds as they developed the company’s own Entellect system, a cloud-based data platform launched this year, designed to bring together clinical data from thousands of unstructured sources before adding context and connecting drug, target and disease data to give AI-enabled research teams a leg up in drug discovery and R&D.

“[Entellect] comes as logical outcome from our heritage,” says Wilson. “We’ve been creating these databases for a long time, and working with customers on using them as tools. We created this tool for our own products and professional services, and that is then something that we can make available to our customers in the life sciences process.”

By focusing the Entellect project on open design and data curation and governance, Elsevier is hoping to empower clinical research teams to make their own decisions on what data to trust and how best to move forward. This emphasises the importance of getting dedicated data scientists together with pharma subject experts to achieve what Wilson calls “informed, subject-focused outcomes”.

“We’re not expecting these AI systems to be able to replace people,” he says. “You really need these systems within the context of a workflow. You need biologists, pharmacologists, and to bring those together with the data scientists. I like the analogy of Lego blocks; you’re building this system, this toy, and you need the data to plug together, you need the people to plug together, to get this system that you can then answer questions with.”

We’re not expecting these AI systems to be able to replace people

It’s a philosophy that has driven Elsevier’s recent ‘data-thon’, which brought together data scientists, pharma groups and clinical consultants to work on drug repurposing opportunities for treating chronic pancreatitis, a rare inflammatory condition that affects an estimated five to 12 people per 100,000 in developed countries. The data-thon allowed the data scientists to use their most favoured tools – from the Jupiter Hub notebook to programming languages such as Python and R – to develop functions based on the data, with pharma experts on hand to advise on relevance.

“What’s been so lovely about this data-thon is seeing people come together, sparking each other’s insights and interests, and being able to work together on these machine learning models,” Wilson enthuses. “By the end of the year we’ll have the outcomes validated, and then potentially with our partners, we might even be able to move to some clinical trials to see where we can build on the value. You’re using predictive tools to create new knowledge. It’s like an art form, so you need the subject matter experts and the data scientists working together on platforms like Entellect, which has all the Lego blocks that you can then use to build your predictive tool.”

AI tools for the future

For all the hype in the industry, it’s clear we’re still a long way off from achieving the incredible potential that AI offers to pharma R&D and drug discovery. The way forward can be achieved by moving “quickly but carefully” – as espoused by Benhenda – investing in the right machine learning models to solve particular problems and building the interdisciplinary teams necessary to validate and make the most of the data.

Data discipline will also be incredibly important as the pharma industry builds its R&D tools for the future. Wilson believes data auditing – essentially checking the workings of a given machine learning model – will become increasingly vital.

“I think in the future we’re going to see that if a machine learning model has been used in defining an outcome – like a drug and a treatment – then it will be necessary to audit that model to ask, ‘How did you come up with this answer?’ You can do that by circling back to the other known scientific data. So I think that’s certainly one of the avenues that we’re looking at very actively right now.”

High-profile failures and a certain scepticism around the reliability of AI-generated drug discovery conclusions may have scuffed pharma AI’s gleaming reputation, but any damage caused is superficial. Buffing it out will involve a relentless focus on putting the right algorithm in the right hands, and for the right application.

“There’s a lot of hype, and these generic systems potentially are not able to deliver if they don’t have the background and the insights baked into them,” says Wilson. “But when you do have that, there really is huge opportunity there.”

Go to top

Share this article

02/24/2024 19:44:07