Introduction
Modeling is the process of developing a model, which has been traditionally used in the field of engineering. As its analytical power and effectiveness interlock with biological complexity, modeling has been used for understanding biological systems (Kitano, 2002b). Because we can define a model as an artificial construct using mathematics to represent biological phenomena, it has been used to simulate scenarios in target systems under various conditions which are not accessible through experimental approaches (Voit, 2013b).
Systems biology is a recent field of study that attempts to solve problems in complex biological systems by integrating independent studies. As this field requires database processing and analyzing interactive systems, such as genetic circuits, cells, biochemical pathways, and metabolism, modeling is actively used through so-called computational biology (Kitano, 2002b). Metabolic reconstruction, recently receiving high attention, is an example of domain that intensively uses computational modeling (Saha et al., 2014). Metabolic reconstruction is an organism-specific process based on high-throughput genome-scale and “-omics” data to investigate
the properties of a metabolic network and its components (Feist et al., 2009). A typical protocol that describes the procedure for reconstructing metabolism networks has been reviewed (Thiele and Palsson, 2010). This review highlighted use of the constraint-based reconstruction and analysis (COBRA) approach (Schellenberger et al., 2011) that is used to study a wide range of organisms from microorganisms (Famili et al., 2003; Raghunathan et al., 2009) to humans (Duarte et al., 2007; Mo et al., 2007; Ruppin et al., 2010). Plants and animals have also been studied as target systems of systems biology. Because of the great complexity in metabolic pathways and the highly compartmentalized structure of plants, modeling has just started being applied, and possible applications have been reviewed (Baghalian et al., 2014). Animals have been a source of target systems in computational biology as well. Specifically, animal cells, generally from rats, have been studied using mathematical models to elucidate the functions and mechanisms of corresponding cells and organs in human (Finegood et al., 1995; Heethaar et al., 1973; Pandit et al., 2001).
The field of food and nutrition has traditionally used modeling to evaluate the metabolism of nutrients, such as absorption and digestion (Bastianelli et al., 1996). Among nutrients, metal-nutrients, e.g., calcium, have been studied by actively employing modeling techniques to estimate their flows which cannot be directly measured in an experiment. Calcium kinetic modeling is a popular technique used for exploring calcium metabolism differences under various conditions (Smith et al., 1996; Wastney et al., 1996). Moreover, there are reviews available for calcium kinetics and their modeling in terms of introducing theories, methodologies, types, and previous notable research (Lee and Cho, 2012; Lee and Cho, 2013; Weaver et al., 2002).
When we look at agriculture and food systems-related fields, one of the research areas which has traditionally used modeling is computer-aided process design. This technique has been used to simulate operations in food and bio-processes which are hard to adapt to an experimental design due to cost and size matters (Petrides, 2001; Shanklin et al., 2001). Also, there are some studies that use computational analysis to investigate the complex phenomena arising during the storage of agricultural products (Beukema et al., 1982), food processing (Fryer and Robbins, 2005; Otero and Sanz, 2003), heating (Shim et al., 2010), and drying (Mayor and Sereno, 2004). In addition, computational fluid dynamics has been used to evaluate the efficiency of greenhouse ventilation (Mistriotis et al., 1997). Another research applied mathematical modeling for the characterization of agricultural sprays (Farooq et al., 2001). However, those studies only used agricultural systems as a background while they focused more on heat and fluid movements. Quality measurement and sorting of agricultural and food products are fields which actively use mathematical models. As spectrometers have been applied to post-harvest processes as non-destructive tools, its data, i.e., spectra, which represent characteristics of target material, have been statistically modeled to sort and evaluate the quality of the product. This technique has been utilized for many agricultural and food products, such as fruits (ElMasry et al., 2008; Lee et al., 2014; Xing and De Baerdemaeker, 2005), crops (Nicolaï et al., 2007), vegetables (Pereira et al., 2008), animal products (Cho et al., 2009; Lohumi et al., 2016), and even seeds (Ambrose et al., 2016). However, these studies did not use mathematical models for agriculture and foods themselves, but to analyze spectral data. Another topic for classical usage of modeling is population dynamics, a branch of ecology (Voit, 2013b). In particular, a software-based prediction of the distribution of insects, plants, animals, and even disease has been studied in response to environmental conditions (Sutherst, 2013; Sutherst and Maywald, 1985). Based on this software, the risk to the agricultural sector caused by invasive pests has been assessed and biological control has been tested. This supports the application of software-based predictive modeling in the field of agriculture (Park et al., 2014).
There exist various research targets in agriculture and food systems which are as complex as biological systems. In addition, these systems produce and require storage of as many data as biological systems. Aforementioned researches related to plants and animals use them as model systems instead of humans for studying biochemical, genetic, and metabolic components (Faraji et al., 2015; Lee et al., 2012; van Milgen, 2002). Therefore, these types of studies may be closer to the field of systems biology than agriculture. Thus, there is a large possibility for applying modeling techniques into the fields of agriculture and food in order to effectively analyze target systems in the same way systems biology researchers have attempted (Kitano, 2002a). For example, it is expected that the model-based approach will be used for helping the development of crops and foods which meet the demands of current issues such as climate change response, massive production, and genetically modified organisms.
This study aimed to introduce modeling methodology by reviewing contemporary modeling studies in the field of biological, food, and agricultural science. In more detail, modeling methodology can be categorized into the following 5 sub-sections: modeling procedure, types of modeling, types of model analyses, types of model formulation, and software used in modeling studies with a review of corresponding noteworthy literature.
Modeling methodology
Modeling procedure
Modeling procedure has been reported in many references and is slightly different depending on sources, but the basic order is very similar. Based on the results of Voit’s study (2012), the modeling process is composed of 5 steps: selection of goals and objectives, model selection, model design, model analysis and diagnosis, and model use and application. A textbook written by Haefner (1996) also describes a classical view of the modeling process divided into the following steps: selecting objectives and hypothesis, mathematical formulation, verification, calibration, and analysis and evaluation. The present study proposes a similar procedure to those in previous studies consisting of 3 large categories and 6 detailed steps (Fig. 1).
Fig. 1. The modeling procedure is considered to be composed of 3 processes and 6 sub-processes, and includes system reconstruction and system analysis besides model development. |
The first category is defined as knowledge mining which identifies and reconstructs a target system based on previous research. In this step, it is required to set objectives and hypothesis to be tested by the model and to map the target system. The second category, named model development, is the core process of modeling. Model development is composed of 3 steps: model design, parameter estimation, and model validation. Model design selects a model structure and develops a mathematical formulation using physical, chemical, and biological theories. Parameter estimation is the process that determines parameters in the formulated equation. This step demands time- and data-intensive work, which needs many experimental, or previously published, data sets. Once parameters are estimated, it is necessary to verify the model (model validation). In this step, data sets which were not used for developing the model and estimating the parameters have to be used to avoid false accuracy (Sutherst et al., 2015). In case the available data is quite limited, the model should at least be able to generate a known system behavior. Sometimes, a model developed without enough data is called theoretical model (Lee et al., 2013; Parfitt et al., 1996). After finishing the model validation step, it is possible to say that model development is completed, but often, modeling is still on-going. In bio-systems including both biological and agriculture/food systems, a model should be used to find meaningful results, test hypothesis, or achieve objectives. Hence, the final category is system analysis performed by model simulation and model-based analysis. Model simulation finds the solutions of the developed model. It is generally performed by numerical analysis with computational tools. Model-based analysis includes sensitivity, stability and robustness tests (Voit, 2013b), and model-driven system discovery (McCloskey et al., 2013). In these steps, the most powerful function of modeling is used, which allows us to investigate various scenarios and predict system responses using cost and input in an effective manor (Kreutz and Timmer, 2009). Finally, it should be noted that every step in the modeling procedure is important, suggesting that a model can be wrong if only a step is incorrectly processed. For this reason, modeling procedure is a circular process which we have to check repetitively at each step.
Types of mathematical modeling
Types of models can be classified by various criteria such as basic language (e.g., descriptive model, diagrammatic or mathematical model), model state (e.g., dynamic vs. static model), inclusion or exclusion of random events (e.g., stochastic vs. deterministic model), characteristics of time-series data (continuous vs. discrete model), and so on (Haefner, 1996). An example is a review study which categorized the modeling used for calcium metabolic studies into 3 classes (Lee and Cho, 2012). Because there are many reviews and textbooks that categorize modeling, the present study will only focus on mathematical modeling and will briefly introduce its two types: statistical and mechanistic modeling.
Using statistical modeling is popular in various studies. Because this type of modeling generally focuses on the relationship between dependent variable and independent variables, it is sometimes called a correlative model (Voit, 2013b). This modeling is simple, but powerful for predicting and quantifying the relationship between variables. Hence, it requires experimental data and uses well-known statistical methods including regression analysis and multi-variate analysis. As an illustration, non-linear regression has been used to optimize the synthesis of cycloamyloses from sucrose (Kim et al., 2011). Multi-variate analysis has been the main tool suitable for developing a model using spectrometry data produced for targeting agricultural products. Because numerous wavelengths result from spectrometry, it is required to find the optimal wavelength which can explain the targets. Lots of recent studies regarding non-destructive sorting of agricultural products are based on spectrometry and image analysis, and consequently have used multi-variate analysis-based modeling (Baek et al., 2014; Barthus and Poppi, 2001; Guo et al., 2016; Zude, 2003). However, correlative models do not account for underlying mechanisms causing biological and agricultural phenomena.
In contrast, a mechanistic model, which is defined as an explanatory model, explains biological processes and mechanisms that drive the target phenomena (Voit, 2013b). That is, this type of model generally has explicit mathematical representations of mechanistic processes such as regulatory networks in gene circuits and metabolism (Haefner, 1996). This model has the advantage that it can scrutinize how the system works under given conditions, and allows us to simulate various scenarios in order to study the effect of system perturbations. For this reason, most of mechanistic modeling is coded by differential equations. Consequently, this modeling is much more complex and has more variables and parameters than the statistical model. In other words, mathematical formulation is explicitly complex, and intensive parameter estimation is required.
Types of analyses in mechanistic modeling
According to the modeling state and types of data, mechanistic modeling can perform 2 types of analyses: static and dynamic (Voit, 2013b). Static analysis generally deals with the steady-state of the target system which means that components of interest in the target system do not change with time. Hence, it is sometimes called steady-state analysis and estimates flux distributions or degree of gene expression after reaching the steady-state. This analysis is relatively simple compared to dynamic analysis in terms of data required for modeling, mathematical formulation, and solving the system. Petri net analysis is a modeling approach used to describe physical locations where the metabolic reaction occurs (Chaouiya, 2007; Voit, 2013b). This modeling has been generally used to graphically represent either a genetic network (Steggles et al., 2007) or a metabolic network (Genrich et al., 2001). The most popular analysis under steady-state is flux balance analysis (FBA). FBA assesses the flow of metabolites through a specific network, particularly a biochemical network, with assumption that the flows are under steady-state. The core feature of FBA is the stoichiometric matrix, which is a numerical tabulation composed by coefficients of each reaction, while the core tool for solving the stoichiometric system is linear programming (Orth et al., 2010). Because of its simplicity and power for predicting flux distributions in the target network, a wide range of applications has been fulfilled for microorganisms (Kim et al., 2008; Lee et al., 2009), animals (Hanigan and Baldwin, 1994), and plants (Lee and Voit, 2010). Another illustration for steady-state analysis is calcium kinetic modeling which is represented by physical compartments and estimates calcium fluxes among them (Wastney et al., 1999). As mentioned in the introduction, the calcium kinetic model has been intensively used in the field of food and nutrition to study the differences in calcium metabolic fluxes caused by various conditions (Lee et al., 2011; Wastney et al., 2000; Weaver et al., 2009).
In contrast, dynamic analysis is supposed to study change of the target system through time, and thus is generally written with a series of differential equations. As dynamic modeling follows system responses over time, parameter estimation and model validation in this approach require time-series data which means that intensive data acquisition and modeling work are inevitable. In general, this modeling is used to investigate a system response or behavior under specific conditions including artificially manipulated metabolic perturbations, biologically testable circumstances, metrological changes, and nutritional interventions. A good illustration is given in a textbook (Voit, 2013b), showing an infectious disease problem (called SIR model) (Kermack and McKendrick, 1927). This model explains population changes of immune, infected, and susceptible individuals over time with respect to the rate of birth, immigration, death, vaccination, and initial number of individuals. This modeling example also shows that the structure of the model, i.e., the terms in mathematical formulation, is one of the key factors affecting the results. Numerous studies using dynamic modeling have been conducted in various fields, and numerous models in current biotechnology and metabolic research have intensively employed dynamic models. E. coli are a popular target of mathematical modeling as their genetic information and metabolism have been fully reconstructed (Feist et al., 2007; Overbeek et al., 2000). The genetic toggle switch has been mathematically modeled, and the expression of green fluorescent protein has been studied through time (Gardner et al., 2000). Lactose operon and its feedback regulation are historically famous in mathematical modeling (Vilar et al., 2003; Wong et al., 1997; Yildirim and Mackey, 2003). Besides the aforementioned classical usages of dynamic modeling in systems biology, many other targets have been dynamically analyzed by mathematical modeling, such as calcium homeostasis (Peterson and Riggs, 2010; Raposo et al., 2002), calcium metabolic changes due to perturbations (Lee et al., 2011), specific types of cells including bone cells (Komarova, 2005; Lemaire et al., 2004; Pivonka et al., 2010) and mammary gland cells (Hanigan and Baldwin, 1994), lignin biosynthesis in plants (Faraji et al., 2015; Lee et al., 2012), and population dynamics with consideration of climate (Park et al., 2014; Poutsma et al., 2008; Sutherst and Maywald, 1985).
Finally, it should be noted that it is not a matter of which is better or not. The model structure, formulation, and type of analysis must be decided based on the study objectives and the characteristics of target systems.
Types of model formulations
As one of the most effective modeling theories, biochemical systems theory (BST) has been applied to develop a model for systems composed of a series of biochemical reactions, i.e., metabolism in general. The BST is a canonical model whose explicit formulation is already determined as a series of power functions (Savageau, 1969; Voit, 2000). There are two types of modeling frameworks: general mass action (GMA) and S-system. They are very similar and share similar principles for the development of a biochemical model. The S-system focuses on pools by representing all the fluxes into entering and living variables, while GMA focuses on fluxes that contain all terms for influxes and out-fluxes (Voit, 2013b). Detailed aspects of theories are reviewed (Voit, 2013a) and published as a textbook (Voit, 2000). There are a few notable studies that have applied BST for modeling target systems. A mathematical model of lignin biosynthesis in a plant has been developed using BST (Lee and Voit, 2010). In this study, the developed BST model for lignin biosynthetic pathways in Populus xylem was used to identify enzymes that control lignin synthesis, in order to optimize biofuel production since lignin was reported to suppress it. The two following studies from the same laboratory applied BST for lignin metabolism in different plants: Medicago (Lee et al., 2012) and Panicum virgatum (Faraji et al., 2015). Both studies use BST for the similar purposes of unveiling unknown regulatory mechanisms in the lignin biosynthetic pathway that was reconstructed by integrating pieces of information. These three studies show an effective use of BST on model-driven analysis of biochemical reactions in series. Another interesting application of BST in biological systems is for modeling interactions between two main bone cells which govern bone metabolism (Komarova, 2005; Komarova et al., 2003). Those studies coded the two main bone cells (osteoblasts and osteoclasts) using two ordinary differential equations composed of power functions. Furthermore, they evaluated autocrine and paracrine controls of bone cells to show endocrine actions of regulating hormones, i.e., parathyroid hormone (PTH), in the bone remodeling process. The power function-based model designed by Komarova and colleagues was expanded by incorporating it into the mechanics of bones in response to external loading (Hambli, 2014), and by associating it with myeloma bone disease (Ayati et al., 2010). Interesting aspects of the above study are the power function model con-nected to the finite element analysis and partial differential equations for considering the spatial distribution of bone cells.
In contrast to canonical model like BST, a non-canonical model does not require a standardized formulation, which has been observed in many types of modeling studies. Instead, the model directly develops a suitable formula which can explain the mechanisms embedded in target systems, generally based on either chemical or physical theories. The law of mass action is a famous example which explains kinetics of chemical reactions in the equilibrium system and estimates the rate of reaction at a given time (Chellaboina et al., 2009). Because of the flexibility in mathematical formulation, a non-canonical model has been widely used in various levels of systems from genes to population dynamics, and thus a term of mathematical modeling often tends to indicate non-canonical modeling. However, it is more complex than canonical modeling in terms of mathematical design, parameter estimation, and solving the model. Transcriptional regulators are of great importance in the field of biology. After identifying network motifs and when architecture has been identified in a transcriptional regulatory network in eukaryotes (Lee et al., 2002), some notable mathematical models have been developed to explain changes in lac operon expression in response to concentrations of glucose and lactose (Wong et al., 1997) and the dynamics of lac operon in response to feedback regulation (Yildirim and Mackey, 2003). At the level of metabolism, recent calcium metabolic studies have actively used a mathematical model for time-dependent response of calcium and bone metabolism, while kinetic models were the main tools in past decades (Wastney et al., 1999). As aforementioned, calcium homeostasis which balances calcium metabolic pathways to maintain serum calcium concentration has been targeted. Raposo et al. (2002) derived a minimal but comprehensive model for calcium homeostasis considering the effect of the two regulating factors: PTH and calcitriol. The developed model is composed of non-linear differential equations based on mass balance, which represents a change in calcium concentration in a compartment by influxes minus out-fluxes and by activation/inactivation of regulators. The model used for simulating the effect of external perturbations on calcium homeostasis in order to in silico test of calcium-related diseases such as hypo/hyper-calcemia, and renal failure. This model was expanded by incorporating the model for bone metabolism (Peterson and Riggs, 2010), showing the ability of model assembly for the larger target. Bone remodeling process has been mathematically modeled by BST as introduced in canonical modeling, but other modeling studies used non-canonical forms of bone cell modeling. Lemaire et al. (2004) developed a model to explore the interactions between osteoblast and osteoclast in response to the core regulating mechanism called Receptor Activator for Nuclear Factor κB (RANK)-RANK Ligand (RANKL)-osteoprogerin (OPG) pathway. The interesting point was that they developed differential equations for the action of regulators based on the binding reaction scheme. A similar theoretical model was developed to investigate the role of RANK-RANKL-OPG pathway in detail (Pivonka et al., 2010). In recent years, the models by Lemaire et al. (2002) and Pivonka et al. (2010) were expanded upon by embedding the insulin-like growth factor I (IGF-1) that had been indicated as an another critical factor for regulating bone cell interactions, particularly during adolescence (Lee and Okos, 2016).
Types of modeling software
There is notable software available for modeling. In modeling studies, software may be divided into two types. The first type is universal software such as MATLAB, Mathematica, Mathcad, COMSOL multi-physics, and SAS statistical package. This type of software is like a programming language, and thus we may call them platform-type software, having a high degree of freedom in designing the system but requiring a specific grammar for coding. Compared to general programming languages, they possess internal functions which conveniently operate designated functions without further coding process. For this reason, they are popular to help solve a developed model that is generally written by differential equations. For example, a function named ode45 in MATLAB numerically solved a series of ordinary differential equations (Shampine and Reichelt, 1997). Among them, COMSOL is specialized for solving partial differential equations which are generally used for explaining spatial distribution and diffusion in biological compartments (Claussen et al., 2011; Umulis et al., 2006). SAS may differ from aforementioned software because it is a general purpose tool for statistical analysis. Plenty of scientific studies have used SAS to compare control and experimental groups to find correlations between variables and to identify the most critical factors for explaining variations in data. However, we should not overlook the ability of SAS in developing statistical models (or correlative models as previously introduced), generally based on regression analysis (Hill et al., 2008; Lee et al., 2010a; Lee et al., 2010b).
Besides platform-type universal software, there is specialized software used for accomplishing specific objectives. For instance, there is a specialized tool suitable for metabolic engineering, food and bioprocess design, and population dynamics. Some of these tools have certainly been developed based on programming languages or aforementioned platform software. COBRA (Constraint-Based Reconstruction Analysis) toolbox is a MATLAB-based tool which is specialized for metabolic reconstruction and metabolic network modeling. A good review for this toolbox is reported by the research group who originally developed COBRA (Schellenberger et al., 2011). In terms of food and bio processing, computer-aided design has been implemented by software named SuperPro. As food and bioprocessing plants cannot be adapted to experimental design due to size and cost issues, SuperPro has been used to design and simulate processes without any unfavorable interruptions (Flora et al., 1998). For example, wastewater treatment facilities have been evaluated and optimized on computer using SuperPro (Petrides, 2001). For larger target sizes, ecological research has sometimes used CLIMEX software which evaluates potential inhabitation of a specific species in response to climate (Sutherst and Maywald, 1985). In particular, recent highlights of the world’s climate change coupled with intensive interactions between nations have demonstrated the increased risk of introducing invasive species that may lead to disease dispersion and the destruction of local agriculture and ecosystem. CLIMEX is considered to be a powerful tool for risk assessment and for the establishment of a strategy for preventing the above risk (Park et al., 2014; Poutsma et al., 2007).
Conclusion
This study provided reviews of some notable model-based studies in various fields of bio-system research from genetics to population dynamics. Because of its predictive and analytical powers, coupled with low cost and labor, modeling has been widely used in various fields of study. In particular, biology-related studies including genetics, biochemistry, metabolic engineering, and systems and synthetic biology have been intensively using modeling techniques to analyze accumulated experimental data sets and background knowledge which need to be comprehensively analyzed to elucidate complex behaviors in the target systems. This means that the fields of agriculture, food, and animal science also require effective use of modeling to explain system responses and their interactions with environmental conditions. Nevertheless, agriculture and food systems still focus on producing and storing experimental data, and pay less attention to processing and analyzing the system and its data. Therefore, we expect that this study will provide an initial step toward the prospective usage of modeling in the fields of agriculture and food.