This week in lieu of a status update for my week 15, I present to you an article I’ve been working on for the past month or two.
Officially, the funding of my PhD program designates me as part of an ‘Innovative Training Network (ITN)’ in epigenetics, systems biology, and stem cells. I am comfortable enough with the topics of epigenetics and stem-cells, but I can’t help but feel a little hesitant to speak about my relationship with systems biology. Perhaps its role here in my program’s description is to address the use of bioinformatics that I’ll likely encounter. And additionally, despite being on the periphery of the ‘data deluge’ and a rise of ‘systems thinking in biology’, I can’t really say that I truly understand what systems biology actually means in a practical sense and how it would be relevant to my project or the way I think about my research.
In pursuit of some form of self-education, I first turned to Wikipedia, which describes ‘Systems Biology’ as such:
“Systems biology is the computational and mathematical modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach (holism instead of the more traditional reductionism) to biological research”
From this description I gather that most of the applicability of systems biology comes from consideration of biological data in a more objective manner. And in this way, models are then created with established computational/mathematical techniques in an attempt to say something more about the data or to make predictions.
In search of a real-world example of this kind of approach, I came across an interesting piece published a few months ago* titled ‘The Crisis of Reproducibility, the Denominator Problem and the Scientific Role of Multi-scale Modeling’ by Gary An, a general surgeon at the University of Chicago. His paper provided something in the way of an articulation of how and when a ‘systems approach’ could provide a means of organising data in a way that predictions can be made in a more reliably representative fashion.
An implicated the so-called ‘Denominator Problem’ as an under-appreciated culprit of the major disappointments of biomedical research and the biological “scientific” process in general. The Denominator Problem is ostensibly an iteration of the age-old problem of induction. Here, the robustness of the ‘denominator’ is questioned, which, in the case of biological research, is the biosystem being studied – or more specifically, ‘the population distribution of the total possible behaviour/state space… of that biosystem’.
The paper argues that any biological system—which is fundamental to all biological endeavour—must invariably be wrapped up in multiple layers of uncertainty and variability and therefore must be treated formally as a ‘system’ rather than simply a conglomerate of disparate sources of error. Scientists use statistics to reconcile this inherent messiness of biology—a methodology that may eventually be considered crude in the face of the profound unknowability of complex systems. The paper asserts that scientists draw conclusions based on ‘microstate characterizations’, which increasingly involve the outputs of ‘-omics’ technologies—the generation of vast amounts of data that capture a biological state in time.
Although analysing and generating data and conclusions in this way has proven profoundly beneficial, it does not have robust explanatory power. The dilemma arises because we don’t know the true nature of the entire population in question. And precisely because of this we cannot know if our assumptions about the generalisability of certain phenomenon are generalisable in the way we anticipate. For instance, say a certain variable like cell-membrane rigidity is linearly correlated to temperature under the conditions of an experiment. You repeat this experiment multiple times and you find statistical significance of this phenomenon to a confidence level of p = 0.05. However, the cells of this particular biological system only behave in this way for a set number of rounds of sub-culturing; after which point the linear relationship becomes exponential. Now, depending on what is known initially about your biological system, it may or may not occur to you to run the same experiment with cells that have undergone many more rounds of cell-division – naively believing that this parameter would not affect your findings. This is the crux of the Denominator Problem. When we perform experiments and record observations, we make a whole set of assumptions many of which are implicit and do not consider the complete set of “biological possibility”. The following diagram taken from the paper shows in a graphical way how the ‘true’ nature of a biological system (A) can be poorly represented by empirical sampling (B) and even data derived from well-designed experiments (C1 & C2).
These questions and concerns of this nature occasionally bubble to the surface when theories are put to the test, real-life feedback in the form of clinical trials is perhaps the most brutal of these reality-checks. Translational Dilemma or the so-called “Valley of Death” occurs when drug discovery efforts move into the territory of preclinical therapeutic application. However, in the basic sciences, the ’practical applicability or ‘generalisability’ of a phenomenon is often only an afterthought, to be tagged on the end of overly-thin discussion sections or exaggerated in grant proposals. What is ‘good-enough’ for basic science relies solely on the realm of statistics to provide legitimacy.
So how can systems biology solve this predicament?
Returning our discussion to the paper, An proposes a solution. He suggests that use of ‘multi-scale models’ expose or at least subdue the effect of hidden variables inherent in biological systems. An describes these MSMs as follows:
“When used to represent complex biological objects by mapping in a modular fashion to the multiple levels of organisation seen in those objects, multi-scale models (MSMs) are able to encapsulate what is conserved from one biological instance to another.“
By ‘biological instance’ An refers to the the ‘microstate characterisations’ described above, to remind you, these are simply outputs from data-heavy analyses like RNA or ChIP-seq captured at a point in time. Movement from one microstate to another can be formally defined by a function. It is these functions that collectively can be used to determine a more accurate account of the behaviour of a biological model. The functions may then be refined and cross-validated across numerous populations no matter how heterogeneous. One can imagine that with further increases in computational power and technical capability, we can come close to modeling with precision, the dynamic nature of complex biological systems.