Introduction
Edible oils are subjected to various processes to ensure their suitability for human consumption. Edible oils have a critical effect on the taste and mouth-feel of foods, whilst enhancing the nutritive value of the food (Hong et al., 2018). Sesame oil have a mild odour, pleasurable taste and as such used as a natural salad oil requiring little or no gear up. Sesame oil is described to be the high nutritive and biological values as well as excellent quality taste (Park et al., 2013). It is widely used as a cooking oil, in shortening and margarine, as a soap fat, in pharmaceuticals and work as reactionary for insecticides (Budowski and Markely, 1951). While, cottonseed oil has long been considered to be a good vegetable oil for frying, in part because it tends to impart a toasted aroma to fried products (Dowd et al., 2010). Chemical oil extraction, which uses a solvent during oil extraction, is popular commercial procedure because it produces high yields in fast and inexpensive way. In a previous study, Jomtib et al. (2011) used hexane, benzene, and toluene as co-solvents to determine the effect of adding co-solvents to the oil in various concentrations (10 - 50% v/v) on the formation of methyl esters. Benzene was used during the oil extraction procedure because it can extract a higher quantity of oil than other solvents. Because of the carcinogenic property of benzene, n-hexane is globally preferred solvent because of its extraction efficiency, easy availability, high stability, low evaporation loss, low corrosion, low greasy residue, and better odour and flavour (Saxena et al., 2011). The use of hexane as a solvent during oil extraction may also contribute to the occurrence of benzene in food (Masohan et al., 2000). The low boiling point of benzene compared to edible oil suggested that some residues remain in the oil after extraction. Recently, benzene residues were found in cottonseed oil and these remnants provide the motivation to identify and quantify benzene in edible oil because its presence is directly related to consumer health.
Benzene, a volatile organic compound has been classified as human carcinogen by Environmental Protection Agency (EPA) that can form when benzoate and ascorbic acids were present under the influence of heat, UV light, and metal ions as catalysts (Styarini et al., 2011). For general studies, level of benzene in drinking water (10 and 5 µg/L) is usually adopted by WHO and United States Environmental Protection Agency (USEPA) for references because there is no specific limit of benzene in food and beverages (Aprea et al., 2008; Vinci et al., 2012). Also the maximum limit for benzene concentration based on toxicity has been set in Europe at 5 mg/L (Atkinson et al., 1995). Most cases of benzene toxicity have been reported in Italy (Vigliani and Saita, 1964; Vigliani, 1976), and Turkey (Aksoy et al., 1972; Aksoy et al., 1974), which together had the highest rate of chronic myelogenous and myelogenous leukaemia.
Several analytical techniques have been applied to the detection of benzene in edible oil samples. Masohan et al. (2000) estimated the residue of benzene in crude and refined samples of rice bran and soy oil, and the oil-extracted cakes using gas chromatography and UV spectroscopy. In another study, Styarini et al. (2011) detected benzene in beverages using headspace gas chromatography. Furthermore, solid-phase micro-extraction and gas chromatography was used for the detection of benzene in beverages i.e., soft-drink, juice and tea samples (Sanchez et al., 2012). Within this context, it is evidently necessary to develop analytical techniques to make it possible to identify the solvent residue in edible oils as these methods required sample preparation and the use of chemicals which is destructive, leading to the end use of samples.
Fourier-transform infrared (FTIR) spectroscopy is a method used to determine the structures of molecules by their characteristic absorption of infrared radiation and the resulting molecular vibrational spectra. Spectroscopy is regularly used for both the qualitative and quantitative analysis of agricultural and food products and presents an alternative to time-consuming, wet-chemical, and destructive techniques (Lim et al., 2017; Mo et al., 2017; Qin et al., 2017; Ning et al., 2018). A key advantage of this technique is its high-speed operation; a sample can be analysed in seconds, and the spectrometer simultaneously collects all light frequencies that are transmitted or reflected from the sample. FTIR spectroscopy measurements are also non-destructive, making them successful in evaluating the quality of agriculture products and beverages.
FTIR spectroscopy has previously been combined with discriminant analysis and partial least-squares (PLS) analysis, and has been successfully used to quantify adulterants, such as refined oils and different types of vegetable and nut oils, in extra virgin olive oil (Lai 1995; Marigheto et al., 1998; Kupper et al., 2001). IR studies of edible oils generally use specific absorbance bands to evaluate traditional indices and other parameters of interest in relation to the composition of edible oils (van de Voort et al., 1992; Che Man and Setiowaty, 1999; Che Man and Mirghani, 2000; Setiowaty et al., 2000). For example, a common fraud is the adulteration of Moroccan olive oil mixed with other edible oils of lower commercial value (Flores et al., 2006). IR spectroscopy and PLS analysis were used to quantify the percentage of adulterants such as soybean oil, pure tea seed oil and sunflower oil in virgin walnut oil (Liang et al., 2013).
Currently, a wide range of vibrational spectroscopic techniques in combination with chemometrics have shown potential as sensitive and rapid techniques for the authentication and quality analysis of a variety of food products. Our study devoted to achieve quantitative detection of benzene residues in edible oil using FTIR spectroscopy. We specifically attempted predict the benzene concentration in edible oils using FTIR spectroscopy with an integrated PLS model. We conducted spectral analysis of five concentrations of benzene in edible oils, which were chosen as 0, 100, 200, 300, 400, and 500 mg/L. Based on the results, the different concentrations were identified and categorized using a multivariate analytical method.
Materials and Methods
Sample Preparation
Commercial samples of two different edible oils (cottonseed oil and sesame oil) were purchased from a market in the South Korea. Benzene was purchased from Sigma Aldrich (St. Louis, USA) which is essentially used to extract cooking oil from seeds. The edible oil samples were spiked with benzene at various concentrations (100, 200, 300, 400 and 500 mg/L). The spiked oil samples were placed in snap-cap vials and shaken with a high-speed shaker (Vortex-Genie® 2, Scientific Industries, Inc., Model G560, USA) for over 2 min. Ten samples of each of pure and benzene-spiked edible oil were used for FTIR data analysis; therefore, a total of 120 samples (60 samples for each edible oil) were used to measure their spectra by FTIR.
Spectral Measurements
The sample measurements were performed using a Nicolet 6700 (Thermo Scientific Co., Madison, USA) FTIR spectrophotometer was configured with an attenuated total reflectance (ATR) accessory, a deuterated triglycine sulphate (DTGS) detector, and a KBr beam splitter controlled by OMNIC software. The spectra were measured separately for each sample between 4,000 and 650 cm-1, and a total of 32 successive scans were collected from each sample with 4 cm-1 intervals. For measurement, a drop of oil was deposited on the surface of the diamond crystal sampling plate using a 1 mL syringe. After measuring a sample, the oil was removed with a dry tissue and then the surface was rinsed firstly with alcohol and then with water before moving to another sample. Finally, the surface was dried with a clean tissue. As, benzene is a highly volatile compound, we believe that there are no traces remain in the ATR after cleaning. Before recording the sample spectra, a background scan was obtained once for pure oil and once for adulterated oil samples with an empty sample plate.
Data Pre-processing
Spectral data pre-processing is one of the most critical steps in a data mining process that deals with the preparation and transformation of the initial dataset. Several pre-processing methods have been proposed to model the effect of multiplicative light scattering (Chen et al., 2006). Multiplicative scatter correction (MSC) is a widely used technique (Geladi et al., 1985). In our study, MSC was used to remove the non-linearity in the data caused by scattering from the samples. The MSC operation undergoes into two steps: estimation of the correction coefficients,
(1)
and correction of the spectra
(2)
where the b variables are the correction coefficients, e is the modelled part and xorg is the original spectra measured by IR instrument, xref is the reference spectra which is the average over a set of samples and xcorr is the corrected spectra, respectively (Rinnan et al., 2009).
Multivariate Analysis
Chemometrics and multivariate data analysis provide the solution to many problems in qualitative and quantitative analysis and are especially useful in adulteration and quality assessment of food products (Muick et al., 1998). The more frequently used method is multivariate analysis, which is a collection of methods that can be used when several measurements are made on objects. Multivariate linear regression is an extension of multiple linear regression to model multiple responses (Jung and Park, 2015). This method is concerned with data sets that have more than one response variable for each observational or experimental unit. We can perform a certain measurement and store the value for a given phenomenon in a univariate or multivariate variable called y = (y1, y2, …, ym)T where m is number of independent variables (Ami et al., 2000).
Principal component analysis (PCA) and PLS are useful multivariate tools for spectral data analysis because of the quality of their calibration model and their ease of implementation (Goodarcre et al., 2003; Tapp et al., 2003; Wang et al., 2003). These methods are reliable by generating components as a new input variable to linearly compose original input variables for multivariate data analysis and modeling (Yang et al., 2015). Generally, the first few transformed variables are sufficient to account for most of the variations (e.g., PCA) or to maximize separability (e.g., PLS) of the whole data. In our data analysis, PCA was carried out on the MSC-processed FTIR spectra because it can be readily applied to spectroscopic data to perceive the nature and scattering level of the data. PCA is a well-known method in multivariate analysis that is frequently used to maximize the variance of a linear combination of the variables. This method uses sophisticated underlying mathematical principles to transform several possibly correlated variables into smaller number of variables (principal components) (Richardson, 2009). The principal components (PCs) are orthogonal and the first few principal components (i.e., PC1, PC2, etc.) provide most of the information about the material. The linear combination created by principal components can be expressed in the form:
(3)
where PCA treats the peak positions as vectors (x1, x2, …, xn) and forms a linear combinations of the vectors by assigning a weight (a1, a2, …, an) to each vector in the spectra (Rusak et al., 2003). When predictors are reduced to a smaller set of uncorrelated components partial least-squares regression (PLSR) can be used on these components rather than on the original data. PLSR is especially useful when predictors are highly collinear, or when there are more predictors than observations. PLSR provides information about the correlation structures of the variables and about their structural similarities or dissimilarities. In this study, PLSR was developed for preprocessed spectral data to predict the content of benzene residues in the edible oil samples. The PLSR equation is given as:
(4)
(5)
where a spectral data matrix X is decomposed into the score matrix T, loading matrix P, and error matrix E, and the reference values matrix Y is decomposed into the score matrix U, loading matrix Q, and error matrix F. The basis of present study mainly focus on constructing and selecting the subsets of features that are useful to build a good predictor. This approach contrasts with the problem of finding or ranking all potentially relevant variables. Selecting the most relevant variables is usually suboptimal for building a predictor, particularly if the variables are redundant. Conversely, a subset of useful variables may exclude many redundant variables. The quality of the calibration model is described by the squared coefficient of determination (R2) and the standard error of prediction (SEP). The best calibration model for prediction was the one with the highest value of R2 and the lowest SEP value.
Variable Selection
Selecting the most relevant variables is usually suboptimal for building a predictor, particularly if the variables are redundant. The variable importance in projection (VIP) includes a measurement of the variable dependency, which is considered as a benefit of this multivariate filter method. VIP calculates how much a variable contributes to the description of the dependent or reference data sets (Y) and the independent or spectral variables (X) (Lohumi et al., 2015). The VIP score of variable j is calculated by the following equation:
(6)
where Wjf is the weight value for component f of variable j, SSYf is the sum of squares of explained variance for the fth component, J is the number of variables, SSYtotal is the total sum of the squares for the dependent variable, and F is the total number of components. A variable with higher VIP score is more relevant to the prediction of the response variable. Normally, the average of the squared values of the VIPs is equal to 1 (Cho et al., 2002). The criterion of VIP value with greater than 1 is then often used as a cut-off point for variable selection (Lazraq et al., 2003; Chong and Jun, 2005).
The selectivity ratio (SR) denotes the ratio between the explained and the residual (unexplained) variance for each variable in the target projection (Farres et al., 2015). A high value denotes a variable with good predictive performance (Anderssen et al., 2006). This method essentially visualizes the important variables of a multivariate data set in the prediction of a property (Rajalahti et al., 2009a). The target projection model that calculates the explained and residual variance for each variable can be written as:
(7)
where tTP is the target-projection score, PTP is target projection loading and ETP is target-projection residual (Lohumi et al., 2015). All multivariate analysis was performed using MATLAB software version 7.0.4 (The Mathworks, Natick, USA).
Results and Discussion
Spectral exploration
The raw FTIR spectra of both sesame and cottonseed oils are shown in Fig. 1. The raw spectra revealed some peaks in both the fingerprint and functional group regions for sesame and cottonseed oils. However, some parts of the functional group region such as the region 4000 - 3156 cm-1, which is assigned to the hydroxyl stretching band, no notable difference were observed in intensity of spectra of oil with different concentration of benzene, therefore, we discarded this region. The absorption in the 2700 - 3000 cm-1 region is associated with methylene stretching (McMullin et al., 2015), and we obtained a small peak in the region from 2435 - 2246 cm-1 because of the effect of background CO2. The variations in these spectral regions were not attributed to changes in the sample composition. Therefore, only the remaining spectral region was used for further data analysis to minimize the influence of these regions on model development.
The peak at ~ 3009 cm-1 was assigned to the C=CH (cis) stretching vibration mode, and the band at ~ 1742 cm-1 was related to the stretching mode of –C=O bonds in ester groups, which are found in samples with a high content of saturated fatty acids (Lerma-Garcia et al., 2010; Rohman and Man, 2010). The peak at ~ 1461 cm-1 and ~ 1378 cm-1 represented the –C–H (CH2) bending (scissoring) mode of vibration and the –C–H (CH3) bending symmetric vibrational mode, respectively. In the fingerprint region, the peaks at 1235 and 1161 cm-1, and 1118 and 1098 cm-1 were related to the C–O (ester) and C–O stretching mode of vibration. Trans –CH=CH- out of plane bending peaks were observed at ~ 964, 914, 871 and 844 cm-1, while the peak at ~ 721 cm-1 is related to the –C=O stretching mode. The functional groups and vibrational modes in the FTIR spectra of edible oil were similar those reported previously (Lerma-Garcia et al., 2010; Rohman and Man, 2010).
Principal component analysis interpretation
The selected FTIR data were preprocessed using an MSC method before conducting the multivariate data analysis and then PCA was applied on the edible oils data to check for both possible outliers and natural data groupings. The purpose of the PCA method is to concentrate the source of variability in the data into the first few PCs by decomposing the data matrix (Hori and Sugiyama, 2002). The score plot is a projection of data onto a subspace that is used to interpret the relations between observations. The resulting scatter plot of the PC scores showed one outlier (marked with a blue box in Fig. 2) from each oil group. In Fig. 2, the representative points of the sesame oil (Fig. 2a) and cottonseed oil (Fig. 2b) samples are mapped in the space spanned by the first two principal components. These score plots showed that a reasonable clustering was present for the different concentrations of benzene added to both edible oils.
Further, we attempted to interpret the first three PC loadings (PC1, PC2, and PC3), explaining about 98% of total variance in terms of chemical composition. These loadings give a correlation between a component and a variable that estimates the information they share. Using these plots can extract information about which variable have the largest effect on each component. In addition, these loading plots are helpful for characterizing each component in terms of variables. The loading of PC1 (Fig. 3a) shows a small peak at around 3000 cm-1, a distinct peak around 1500 cm-1, and an upwards trend at the end. In addition, PC2 (Fig. 3b) shows two strong negative peaks at ~ 2000 cm-1 and a small positive peak at ~ 1600 cm-1, while PC3 (Fig. 3c) shows a negative peak in the same region (1600 cm-1) caused by the variation in benzene concentration among the samples shown in Fig. 3d.
PLSR Model
A PLSR model was employed to develop a predictive model for detecting the added benzene concentration in edible oils. After discarding the two outlier samples (sesame oil: 400 mg/L; cottonseed oil: 0 mg/L), the samples were categorized based on the adulterant concentration. Totally 118 samples (after the removal of two outliers) from both edible oils was divided into calibration (70 samples) and validation (48 samples) set in a ratio of 6 : 4 to evaluate the accuracy of the model. The PLSR model was developed using the MSC-preprocessed spectra of pure and adulterated oil samples. In the multivariate analysis, two data sets were used for calibration: X (independent variables, i.e., spectral data) and Y (dependent variable, i.e., adulteration percentage), and regressed to develop the model for prediction. The validation set, which was not used in model development, was used to test the predictive accuracy of the developed model.
The calibration model gave a very high correlation value (R2) of 0.99 with a standard error of calibration (SEC) of 15.1 mg/L. However, the R2 and SEP for prediction were 0.95 and 38.5 mg/L, respectively (Table.1). Fig. 4a shows the actual and predicted concentrations of benzene in edible oils by the PLSR model for the validation set. We also determined the optimal number of factors based on the lowest root mean square error (RMSE) in the validation process, and seven factors were selected.
Two model-based variable selection methods, i.e., VIP and SR were executed on PLS-based results to select the optimal variables. The VIP measurement includes the variable dependency which is a key benefit of the multivariate filter method. However, the SR is used to avoid model overfitting and improves the predictive competence. SR is usually applied to filter out irrelevant variables (Kvalheim and Karstang, 1989; Rajalahti et al., 2009b). By assigning a threshold value 1.2 for VIP and 0.03 for SR, we selected a total of 166 and 141 variables for pure and adulterated oil samples, respectively. Then, the PLSR model were developed for both variable selection methods. A summary of the results is shown in Table 1. All the values found for the parameters in Table 1 suggested that the model developed with selected variables afforded either higher or similar prediction accuracy compared to the PLS model developed with whole corrected variables. However, the highest coefficient of determination (Rp2 = 0.96) with standard error of prediction (SEP = 33.7 mg/L) was obtained using the VIP variable selection method. Fig. 4b and 4c show the excellent prediction ability of the PLSR model developed with selected variables.
The VIP and SR selected variables are represented against the spectra of pure benzene in Fig. 3d. The spectra showed that most of the selected variables by VIP method were related to the benzene-sensitive bands while SR selected variables are dissimilar to those selected by VIP. A simple visual comparison of the variables selected using these two different methods suggested that the VIP selected variables were more genuine than SR selected variables when compared with the pure benzene spectrum. This improved performance in VIP could be because SR was limited by the selection of a reliable threshold for assessing the significance of a selected discriminating variable. The selected variables for VIP and SR were 15.9% and 13.5% of the total variables, respectively. Visual inspection of the beta coefficient from PLSR model (Fig. 5a) showed that certain peaks within certain spectral regions were important for differentiating between pure and adulterated oil samples. These distinct peaks are influenced by the different benzene concentration of the other group of samples. However, minor peaks can be caused by the spectral variations between the two different kinds of edible oils.
Residual plots which illustrate the residual against the corresponding fitted values or the explanatory variables have been widely used to diagnose the regression model in terms of model structure such as numbers and types of variables, inclusion or exclusion of interaction terms, and necessity of higher order terms or non-linear terms (Kutner et al., 2008). An increasing trend in the residuals plot suggests that the error variance increases with the independent variable; while a decreasing trend indicates that the error variance decreases with the independent variable. Also, one of residual plots showing the standardized residuals vs. the predicted values is useful in detecting violations in linearity (Stevens, 2009). Fig. 5b shows a residual plot against the sample number to study the relationship between the different concentrations of benzene and the values predicted by the whole variable, PLS-VIP, and PLS-SR models for edible oil samples. The obtained residual figure shows an identical pattern for both whole variables, and PLS-VIP as these methods get negative values for low concentration. On comparing with these two methods, PLS-SR extract more negative values for all concentration which gives a small change in the random pattern of the residuals. Thus, it shows an agreement between actual and predicted values for benzene concentration and provides a decent fit for a linear model.
In this study, FTIR spectroscopy combined with PLS multivariate analysis was demonstrated to be capable of detecting trace amounts of benzene in edible oils. Variable selection methods were additionally adopted to select the informative variables and avoid model over fitting and they improved the model accuracy developed by PLS. Also, the selected variables were authentic by showing distinct peaks in the same spectral regions when compared to the pure benzene spectrum. The results of this study indicate that specific FTIR spectral regions are effective for the determination of benzene traces in edible oils. Our approach highlights that FTIR spectroscopy is a rapid technique that can be performed with no sample preparation, and thus has a potential to be an effective analytical tool for the detection of benzene trace in a variety of edible oils.
Acknowledgment
This work was supported by Korea Institute of Planning and Evaluation for Technology in Food, Agriculture, Forestry and Fisheries(IPET) through Agriculture, Food and Rural Affairs Research Center Support Program, funded by Ministry of Agriculture, Food and Rural Affairs (MAFRA), Republic of Korea (No. 717001071WT111).
Authors Information
Ritu Joshi, Chungnam National University Department of Biosystems Machinery Engineering College of Agricultural and Life Science, Ph.D. student
Byoung-Kwan Cho, https://orcid.org/0000-0002-8397-9853
Santosh Lohumi, Chungnam National University Department of Biosystems Machinery Engineering College of Agricultural and Life Science, Postdoctoral researcher
Rahul Joshi, Chungnam National University Department of Biosystems Machinery Engineering College of Agricultural and Life Science, Ph.D. student
Jayoung Lee, Chungnam National University Department of Biosystems Machinery Engineering College of Agricultural and Life Science, Master student
Hoonsoo Lee, https://orcid.org/0000-0001-8074-4234
Changyeun Mo, https://orcid.org/0000-0002-9088-5978