Method
Overview
Fluxes (exchange fluxes of a metabolite Mi, qMi ; growth rate, µ), initial concentrations of species (biomass, X ; metabolites, Mi) and possibly other growth parameters (e.g. lag time) are estimated by fitting time-course measurements of metabolite and biomass concentrations, as detailed below.
Flux values provided by PhysioFit correspond the best fit. A global sensitivity analysis (Monte-Carlo approach) is available to evaluate the precision of the estimated fluxes (mean, median, standard deviation, 95% confidence intervals), plots are generated for visual inspection of the fitting quality, a χ² test is performed to assess the statistical goodness of fit and an AIC (Akaike Information Criterion) is calculated to compare the different models.
Models
Models are at the heart of the flux calculation approach implemented in PhysioFit. A flux model contains i) equations that describe the dynamics of biomass and metabolite concentrations as function of different parameters (used to simulate time-course metabolite concentrations) and ii) the list of all parameters (including fluxes) with their (default) initial values and bounds (used for flux calculation).
Different models are shipped with PhysioFit, and tailor-made models can be provided by users, as detailed in the Models section.
Flux calculation
First, PhysioFit construct a model that used to simulate the dynamics of the concentration of biomass and metabolites (substrates and products) provided in the input data. Model parameters (such as fluxes, growth rate, and initial concentrations of biomass and metabolites) are then estimated by fitting experimental metabolite and biomass dynamics. PhysioFit minimizes the following cost function:
where \(sim\) is the simulated data, \(meas\) denotes measurements, and \(sd\) is the standard deviation on measurements.
For this optimization step, PhysioFit uses the Scipy’s Differential evolution method to approximate the solution, and the best solution is polished using the L-BFGS-B method (see scipy.optimize for more information on the optimization process).
Sensitivity analysis
To determine the precision on the fit and on the estimated parameters, PhysioFit performs a Monte Carlo analysis. Briefly, PhysioFit generates several datasets by adding noise to the dynamics simulated from the best fit, and calculates fluxes and other growth parameters for each of these synthetic datasets. This enables PhysioFit to compute statistics (mean, median, standard deviation and 95 % confidence interval) for each parameter. We recommend always running a sensitivity analysis when using PhysioFit.
Goodness-of-fit evaluation
PhysioFit performs a χ² test to assess the goodness of fit. A χ² test describes how well a model fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model used in PhysioFit (see Flux calculation). It is calculated as the sum of differences between measured and simulated values, each squared and divided by the simulated value. A good fit corresponds to small differences between measured and simulated values, thereby the χ² value is low. In contrast, a bad fit corresponds to large differences between simulations and measurements, and the χ² value is high.
The resulting χ² value can then be compared with a χ² distribution to determine the goodness of fit. The p-value of one-tail χ² test is calculated by PhysioFit from the best fit and is given in the log file (have a look to the Tutorial section). A p-value close to 1 means poor fitting, and a p-value close to 0 means good fitting (keeping in mind that a p-value very close to 0 suggest that standard deviations might be overestimated). A p-value between 0 and 0.05 means the model fits the data good enough with respect to the standard deviations provided (at a 95% confidence level). PhysioFit provides an explicit message stating whether the flux data are satisfactorily fitted or not (at a 95% confidence interval).
Model comparison and selection
PhysioFit calculates the Aikake Information Criterion (both classical and corrected, AIC and AICc) to help users compare different models and select the most appropriate one for their data. The AIC and AICc values can be found in the statistical output file or directly in the graphical user interface. Briefly, the AIC is a statistical metric that measures the explanatory power of a model with respect to a given set of data. The model with the lowest AIC value is thus considered the best model.
The AIC is calculated as follows:
where \(k\) is the number of parameters in the model (plus 1), \(n\) is the number of data points, and \(residuum\) is the residual sum of squares (see Flux calculation). For datasets with a low number of measurements (typically less than 40 data points), it is recommended to use the AICc (corrected AIC), which is calculated as follows:
In practice, because the AICc approximates the AIC for large sample sizes, it’s often advised to use AICc as the default.
To identify the best model, different candidate models that differ in terms of structure or complexity can be used to fit the data and then compared based on their AIC. The model with the lowest AIC value is considered the best-fitting model among the candidates and should thus used to fit the dataset. However, it is crucial to consider the differences in AIC values between models, as models with low ΔAIC values (typically < 2) are considered to have similar support from the data.
Detailed information on the AIC can be found in the original publication by Akaike (1974), and a practical guide (“what it is, how and when to apply it and what it achieves”) has been published by Symonds and Moussali (2010).