Lessons Learned
Creating a new drug discovery software platform is a complex process. We’ve spent the last 15 years creating new and innovative tools for drug discovery companies – everything from custom reagent management systems, to kinase profiling tools, automated chemistry, protein production and DEL-based drug discovery platforms.
In addition to managing the day-to-day aspects of research, these tools have resurrected drug discovery programs that were previously thought to be beyond salvaging, generated new libraries, and identified new drug candidates. Work that we hope will eventually make a difference in the lives of patients.
In this blog post we’ll take a closer look at each of the lessons learned along the way and discuss how they influenced the design of the Pipeline Drug Discovery Platform.
Modularity
In many of our custom software projects, we were often called in to build an application to support the middle of a drug discovery process. This meant that there were applications either upstream or downstream from our application that we either needed data from, or to which we had to supply data. It also meant that these upstream and downstream applications were often given short shrift. Moreover, it highlighted the need for customers to be able to focus on the modules whose functionality would support the highest priority parts of their process.
So how did this affect Pipeline’s design? We knew that we needed to create a modular platform. One that allowed companies to prioritise where they spent their money. We also realised that there were common bits of functionality that multiple modules could take advantage of. For example, a customer might need a way to manage assay requests, and chemistry requests, and these modules might share some basic features like the ability to configure, the fields for different types of requests, calculations, security, notifications
Moreover, these modules all depended on common concepts like Projects and Targets. So we created a Foundation module that incorporated not only tools for managing Projects and Targets, but also tools for identifying targets from literature.
At this point in their evolution, companies had often already purchased different point solutions for registration or inventory, and needed to be able to integrate these applications together. This meant that although we might create a default implementation for registration, we might need to create an adapter so that it could interact with the company’s official registration system. More pointedly, we needed to treat our own default implementations as just another plugin, that could be replaced by whatever system the customer already had in place.
Seeing Beyond The Horizon
One of the hardest things for scientists to do is to see beyond a set of experiments that they’re working on or beyond the present workflow; but they will be the first to admit that change is a constant. So how would we accommodate changes that we couldn’t predict?
After a particularly telling conversation with one of our customers, we realised that what they really needed was a workflow-based system, where every step in the workflow was configurable. The fields of data that needed to be collected at each step needed to be configurable, not hardcoded.
For example, in our Protein Production module, customers wanted to be able to create custom expression-system dependent workflows, and to tie those workflows to a request. A chemist might put in a request for a particular protein target form, and the protein scientist would review it, and perhaps assign 1 or more workflows to be executed in parallel. Those workflows might include steps performed by CROs who would be responsible for the gene synthesis or expression work. When the work was completed, notifications would be sent to the requesting party along with the information needed to register the resulting clones, biomass, or purified protein. The shipping, receiving, reconciliation, registration and inventory steps would flow together naturally.
In our Chemistry module, customers wanted workflows to manage internal synthesis processes. This meant having a means to support importing library design information from spreadsheets, or SD files; sourcing reagents from on-site or off-site reagent inventories, managing purification queues, along with the logistical steps (such as shipping, receiving, etc) described previously.
No Company Is An Island
Since the early 2000s, there has been an increase in the number of programs that rely on collaborations between drug discovery companies and CROs. A study from Grand View Research showed that over the next decade the drug discovery chemistry market will grow by 5-10% annually and is predicted to reach $17.7B by 2025.

To streamline the planning and tracking of CRO requests, we developed the PharmExchange platform. It integrates with Pipeline, enabling seamless data exchange with CROs. With PharmExchange, biotech companies can import CRO request queues into their workflows, making it possible to foster a more integrated relationship between customer and service provider.
Previously, sharing data with CROs and service providers required exporting metadata along with the data itself. This meant including the definition of columns so that recipients could interpret the data accurately.
However, sending just the data and metadata was insufficient. To ensure a comprehensive understanding, we needed to publish the protocol, which clarified not only what the data represented but also how it was generated.
The Importance of Good UX Design
Traditionally User Experience (UX) design in the life sciences is an area that was given short shrift – both by software companies and internal developers. It’s a long-standing problem that the Pistoia Alliance’s User Experience for Life Science (UXLS) group has been working to address.
One of the first lessons that I learned in this industry, is that it has a higher preponderance of people with colour blindness. This meant that when it came to using colour throughout the application, we needed to make sure that the palette we were using would fit our users’ needs.
Traditional web applications often relied on a limited number of elements such as text fields, radio buttons, and the like. These elements have been a part of the HTML standard for web development since the inception of the web. However, scientific applications needed elements whose functionality exceeded simple web forms. We needed reusable components that could be used to represent a plate, the 3D structure of a protein, or the structure of a compound.
In 2013, Google created the Polymer project as a means of jumpstarting the effort to create reusable web components. The Polymer project and the subsequent adoption of the W3C standard for Web Components revolutionised the development of web applications. Polymer introduced the concept of reusable, encapsulated HTML elements with their own styles and behaviors, greatly simplifying the process of creating complex and interactive UIs. This approach aligned with the W3C’s goal of modularizing and componentizing the web, enabling developers to create custom elements that could be easily shared and reused across different applications. The combination of Polymer and Web Components provided developers with a powerful toolkit for building scalable, maintainable, and consistent web applications, ultimately improving the overall development experience and the quality of web applications.
As the web component standard wended its way through the W3C standards body, the standard for web components shifted and continued to evolve. Google created the Lit framework – a lightweight framework for creating web components. In addition, they created the Material Web Component (MWC) library. This library contained a number of common web components backed by hundreds of hours of UX research for both web and mobile development. Around this time libraries of components similar to MWC began to emerge such Vaadin and the Salesforce Lightning Component library.
Similar component based frameworks began to emerge like React, Svelte, and Vue. Each of these purported to be better than the others, but the problem that they all shared was that they were not standards-based. If you had asked me what framework to choose for front-end development in 2003, I probably would have said JSP tags or similar tag libraries used for server-side components. In 2010, perhaps Angular might have been the choice.
Changing from one framework to another though would have meant a massive amount of software engineering. And while in the tech world this might be justifiable at some level, in most businesses, it’s not. Not only is there the cost of implementing the new framework, but the opportunity cost of not being able to support new science during the time that you make that switch. This made the choice clear – we needed to use a standards-based framework.
At Aspen Biosciences, our own experience with web components started with the BioPolymer project – an open source project that we created to experiment with web components and share them throughout the community.
Beyond these early experiments though, it became readily apparent that we needed an easy way to inject metadata into the web components themselves. If a customer changed the name of a field, we didn’t want to have to go to every page where that field was and relabel it. This meant that we also needed a single common repository for metadata that those components could use. That metadata would include information like:
Moreover, since we were using the Material Design Guidelines from Google, we knew that our UI designs would be clean, and user friendly. Many of the components themselves would be familiar to users since they were by that point fairly ubiquitous in many web and mobile applications. From a practical standpoint, this meant that by adopting these components and standards, we were adopting a UX that was already well understood by scientists, and therefore would not require a lot of training in order for them to use Pipeline.
Flexibility
Each company had different database standards that our applications had to comply with. Some customers used Oracle, others PostgreSQL, MongoDB, GraphQL or a data lake or some combination of these.
Customers had different cloud computing standards – some used Google, others AWS, or Azure and some required on-prem installations. When it came to cybersecurity, here again customers used a variety of solutions, and we had to be able to adapt to them all – whether it was Okta, or Azure AD, Amazon Cognito or Google Authentication.
All of these variables had to be taken into account.
Configurability
The tools that they had often lacked the flexibility needed to support changes in workflows. This meant that if there were a change in the workflow, or in the data collected at a step in the workflow, it often meant code changes were required. This would create both a real cost (the cost of doing the code change) and an opportunity cost (the delay before the scientist could actually start using the system).
But configurability can be a two-edged sword. Customers wanted the flexibility that came with configurability, but also wanted rational default options and templates to cut down on the amount of work required to adapt the system to their processes.
To address these issues, we ensured that many of the common configuration activities (be they workflows or request queues or simply default configuration options) had templates or preloaded data sets.
This meant that in the Chemistry Module we created default chemistry workflows that incorporated everything from creating library requests, to managing purifications, registration and inventory.
In Protein Production, we created workflows for e. coli and baculovirus expression systems, and a library of common sequence elements that could be used to ensure consistent protein construct design.
In Inventory we created a library of common container types for plates and racks, as well as common hierarchy templates to shortcut the process of creating an inventory.
Computability
Sometimes customers needed to be able to make on-the-fly changes to the business logic for parts of the application, but lacked the expertise, staff or time to be able to do it. And with the increasing use of new technologies like generative AI, there needed to be a way for informatics teams to quickly add new features to the application without interfering with future upgrades.
Importing data often meant writing code to support the bulk uploading of data, and then augmenting that data from public sources like PubMed, the PDB, or UniProt. Changing the data import format often meant writing more code.
To enable this level of computability, we created a feature called the Calculation Engine. Originally designed to allow informatics teams to perform property calculations on compounds or proteins, the Calculation Engine is now being integrated into most of the modules in Pipeline. The Calculation Engine allows developers to create either compiled calculations, or scripted calculations in any language supported by the Java Virtual Machine, including Java, Groovy, Python, Javascript and more.
In addition, the APIs that we use for our web services are also accessible, making it possible to use KNIME protocols, Apache Zeppelin, or Jupyter Notebook scripts with Pipeline.
No More Gantt Charts
Perhaps one of the common themes we heard from Leadership Teams, Project Leads and Discovery PMs, was “No More Gantt Charts”. The problem with Gantt charts is that they often presented the process of drug discovery as this linear approach that only goes in one direction. But the reality was that the process is often cyclic as the answer to one question creates 3 other questions.
The scientists who were often tasked with managing these drug discovery programs were often tapped to simply “pick up some PM skills and get on with it” – with little formal training. Or someone from the development PMO might be tasked to help out the discovery organisation. This latter case often came about when a program was “thrown over the wall” to the development team.
Our initial releases of Pipeline included support for Task Planning and Assay Planning. The latter allowed the scientists both to plan the work that they needed to do, but also to manage assay requests, track progress, provide feedback, and get alerts whenever significant events happened. The tasks and assays could be viewed as a list, a Gantt chart, a Kanban view, or a calendar view depending on the needs of the user. As these tasks were completed by various members of the team, they were automatically marked as completed, thereby reducing the workload of the project lead or PM.
Projects could be visualised in one or more pipeline diagrams. This would allow the Oncology Therapeutic Area team to view just the oncology programs, but would allow the leadership team to view the entire corporate pipeline.
With the addition of the Inventory module to the Pipeline platform, a project lead could look at all of the planned assays for a project, and determine if the reagents necessary to perform those assays were in stock, or if there were sufficient amounts of a given set of compounds to run the screens. It could also check for the availability of protein necessary for those assays.
References
[1] Drug Discovery: Collaborations between Contract Research Organizations and the Pharmaceutical Industry – ACS Med Chem Letters
[2] Drug Discovery Outsourcing Market Size, Share & Trends Analysis Report By Drug Type – Grand View Research
—
If you’d like to find out more about Pipeline, contact us at info@aspen.bio for a demonstration.
