How can we help?
Avoid blind spots in your market. Get the insights you need to accelerate innovation.
Network graphs have always amazed me. This concept has been around for a long time now and has been leveraged by scientists to explain a wide variety of phenomena across disciplines such as psychology, economics, physics & biology. My interests in network analysis stems from the desire to understand how “science” spreads. How do new concepts, technologies, and products gain adoption in scientific communities and what factors can influence this rate of adoption? This sub-segment of network analysis termed bibliometric analysis — aims at using quantitative and statistical methods to describe publication patterns within a given field or body of literature.
Several studies have demonstrated that a researcher’s “centrality” has a direct correlation with the impact of their research. As life science consultants, we hypothesized that we could extrapolate interesting signals from these networks and combine them with our domain knowledge to develop a data-driven approach towards trend predictions.
Data required to conduct a network analysis needs to be extremely well curated. Existing public scientific datasets such as PubMed, NIH reporter, CMS, etc. are semi-structured at best and are plagued with several data integrity issues. We had to spend considerable resources in developing a scalable workflow to digest and clean this data. It took us ~3 months but now we have a solid framework in place that can create a curated dataset in a matter of minutes.
We started by extracting ~3500 publications over the past 2 years using more than 30 different keyword queries associated with liquid biopsies.
A sample list of keyword queries used to extract liquid biopsy publications
These publications represented the research output from over 25k researchers spanning over 5000 institutions across the globe. Researchers from U.S. institutions produced the highest volume of publications ( 55%) followed by China (20%) and UK (10%). In terms of impact, ~5% of these articles were published in journals with an impact factor (H-Index) greater than 300 . Additional segmentation will be included in future posts on the topic.
This dataset included ~ 25k authors (nodes) and over 150k connections (edges). As depicted in the network analysis, researchers from MD Anderson and MSK are at the core of this community, producing some of the most influential research driven by extensive collaborations. The graph also highlights researchers from other influential academic institutions such as NCI, Massachusetts General Hospital, and Mayo Clinic. It is also interesting to explore and contrast collaboration networks of diagnostic companies such as Foundation Medicine & Guardant Health v.s. that of a pharma company like AstraZeneca.
Over the next few weeks, we will be taking a deeper dive into each of these sub-networks by overlaying other secondary datasets to answers questions like:
Disclaimer: Some of the companies listed above may be DeciBio clients or customers.
Pranay Madan is a Data Product Manager at DeciBio Consulting where he develops and curates business intelligence products and services for companies in the research tools, clinical diagnostics, and health technology markets. He holds a Bachelors of Engineering from Panjab University and a Masters of Business & Science (MBS) from the Keck Graduate Institute
Connect with Pranay today!