You are on page 1of 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/284195568

A structural weight estimation model of FPSO topsides using an improved


genetic programming method

Article  in  Ships and Offshore Structures · November 2015


DOI: 10.1080/17445302.2015.1099246

CITATIONS READS

3 337

4 authors, including:

Sol Ha Myung-Il Roh


Mokpo National University Seoul National University
72 PUBLICATIONS   179 CITATIONS    180 PUBLICATIONS   862 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Myung-Il Roh on 06 July 2017.

The user has requested enhancement of the downloaded file.


Ships and Offshore Structures, 2017
Vol. 12, No. 1, 43–55, http://dx.doi.org/10.1080/17445302.2015.1099246

A structural weight estimation model of FPSO topsides using an improved genetic


programming method
Sol Haa , Tae-Sub Umb , Myung-Il Rohc,∗ and Hyun-Kyoung Shind
a
Department of Ocean Engineering, Mokpo National University, Muan-gun, Republic of Korea; b Maritime Research Institute, Hyundai
Heavy Industries Co., Ltd., Ulsan, Republic of Korea; c Department of Naval Architecture and Ocean Engineering & Research Institute
of Marine Systems Engineering, Seoul National University, Seoul, Republic of Korea; d School of Naval Architecture and Ocean
Engineering, University of Ulsan, Ulsan, Republic of Korea
(Received 2 June 2015; accepted 21 September 2015)

The weight information of an FPSO (floating production, storage, and offloading) plant is one of the important data needed
to estimate the amount of production material (e.g., plates) needed and to determine the suitable production method for its
construction. In addition, the weight information is a key factor that affects the building cost and the production period of
the FPSO plant. Although the importance of the weight has long been recognised, the weight, especially of the topside, has
been roughly estimated using the existing similar data as well as the designer’s experience. To improve this task, a weight
estimation model for FPSO plant topsides was developed in this study using the improved genetic programming (GP) method.
For this reason, various past records on the estimation of the weight of the FPSO plant were collected through a literature
survey, and then the weight estimation model using GP was established by fixing the independent variables based on these
data. In addition, correlation analysis was performed to make up for the weak points of genetic programing, which is apt for
inducing overfitting when the number of data is relatively smaller than that of the independent variables. That is, by reducing
the number of variables through the analysis of the correlation between the independent variables, an increase in the number
of weight data can be expected. Finally, to evaluate the applicability of the suggested model, it was applied to an example of
the weight estimation of the FPSO plant topside. Compared with the results of the multiple nonlinear regression analysis that
was conducted in the previous study, the results showed that the suggested model can be applied to the weight estimation
process of the FPSO plant at the early design stage.
Keywords: weight estimation; topsides weight; FPSO plant; genetic programming; correlation analysis; optimisation;
statistics

1. Introduction rial that needs to be procured, to manage the stability of the


The engineering of FPSO (floating production, storage, and platform, and to estimate the total cost and construction pe-
offloading) plants is divided into two phases: the front-end riod of the project. If the topsides weight can be accurately
engineering design (FEED) phase and the detailed engi- estimated at the FEED phase, the weight can be efficiently
neering phase (Hwang 2013). Of these two phases, the controlled, and the material cost can be kept stable. Figure 1
FEED phase is more critical for determining the feasibility shows the concept of weight control in FPSO plant design.
of specific well area development. An economic analysis of It is not easy, however, to accurately estimate the weight
the development of a specific well area is performed based of the FPSO plant topsides at the early design stage. Espe-
on the outputs of the FEED phase. The final outputs of the cially, when parent or similar FPSO plants are not available,
FEED phase are the weight, the total costs, and the layout it is necessary to choose a reliable method for weight es-
of an offshore plant. timation. Thus, this study proposes a method that can be
It is essential to accurately estimate the weight of the used to develop a weight estimation model for FPSO plant
FPSO plant topside design at the FEED phase in terms of topsides at the early design stage. Many theories, such as
both cost management and performance satisfaction. Ac- the optimisation and statistical methods, can be used for this
curately estimating the topsides weight of the FPSO plant purpose. In the authors’ previous study (Ha et al. 2015), a
as early as possible is critical in controlling the costs and simplified model was proposed for the weight estimation of
schedules of building these facilities. Furthermore, weight FPSO plant topsides using the statistical method. This study
estimation of the FPSO plant topside is necessary to pro- used variable transformation to consider a nonlinear form
vide the information required for hull structural design, to of independent variables, and it used correlation analysis
estimate the equipment to be built, and the amount of mate- and multiple regression analysis of the statistics to generate


Corresponding author. Email: miroh@snu.ac.kr


C 2015 Informa UK Limited, trading as Taylor & Francis Group
44 S. Ha et al.

neering fields. Koike and Minoura (2011) applied a sta-


tistical method to predict ship performance using onboard
measurement data. Hussin et al. (2012) presented a system-
atic methodology for analysing the maintenance data of an
offshore system to gain insights into the system reliability
performance and to identify the critical factors influencing
the performance based on the statistical method. Kaiser and
Snyder (2013) developed a linear regression model to pre-
dict the rig weight using the hull length and breadth (width),
Figure 1. Concept of weight control in FPSO plant design. water depth capability, designer, environmental class (harsh
vs. moderate), and building years as predictor variables.
Ha et al. (2015) tried to apply nonlinear multiple regres-
a simplified model for the weight estimation of FPSO plant sion analysis to the weight estimation of the FPSO plant
topsides. This method has the advantage of obtaining the topsides.
result in a short time, but the quality of the estimated weight If an engineer uses the statistical method, she/he will
might be low compared with the actual data. Therefore, this definitely obtain the result in a short time, but if she/he
study proposes a new method involving the use of both the does not have the needed past records of the target system,
optimisation and statistical methods to solve the accuracy the estimated data can be affected by some errors com-
and time consumption issues. pared with the actual data. The genetic programming (GP)
approach can address this weakness and can thus help im-
prove the estimation result.
2. Related studies Some researchers applied the GP approach to the do-
There are many methods of estimating the weight of a sys- mains of naval architecture and ocean engineering. Gaur
tem such as a ship and an offshore plant. Among these, the and Deo (2008) used GP to forecast wave in real time.
methods based on the particulars of the system (particulars- They used the samples for a period of 15 years, and also
based methods) represent a simple approach based on the tested the samples for the last 5-year period. Charhate et al.
assumption that the weight of a system is related with the (2009) used GP to forecast offshore wind in real time. They
particulars of the system, such as the length, height, volume, suggested that the GP approach can be used to predict the
and density. An example of this approach is the volumet- wind speed and direction at two offshore locations along the
ric density method, which is used to estimate the detailed west coast of India over the future time steps of 3–24 hours
weight group by multiplying the space volume by the bulk based on a sequence of past wind measurements made by
factor (density). For example, the detailed weight of a sys- floating buoys.
tem can be expressed as the multiplication of the space All of these researches using GP claimed that they could
volume and the bulk factor. Bolding (2001) used this bulk come up with a better estimation model for each domain,
factor to estimate the weight of an FPSO plant’s topsides. but it is time-consuming to draw the proper model in accor-
The parametrics method is a method of representing the dance with its options, such as the population, independent
weight with several parameters, an essential prerequisite of variables, and generations. Thus, this study combined corre-
the following ratiocination. The weight of the hull struc- lation analysis as a statistical method with the GP approach
ture, for instance, can be estimated as L1.6 (B + D) (Lee to reduce the number of independent variables, and the re-
et al. 2001), and the tubular weight for carbon steel in a sulting approach showed good effects on the calculation
wind farm is defined as W = 24,660(D − n)nL (Kaiser time.
and Snyder 2014).
These methods, however, are based on the domain
knowledge of the target system, and as such, it is not easy 3. Weight estimation model using GP
to determine which parameters are dependent on the sys- In this study, a weight estimation model was developed us-
tem weight. The statistical or optimisation method can help ing GP. To improve the calculation time, correlation analysis
solve this problem. It can be used when developing a weight was performed prior to GP. In this chapter, a brief descrip-
equation from the analysis of various past records, and then tion of each of such analyses is given.
estimating the weight using the equation. It helps in esti-
mating the relationships among the variables of a system,
and includes many techniques for modelling and analysing 3.1. Overview
several variables. Figure 2 shows an overview of the weight estimation model
Based on the statistical method, some researches related proposed in this study. First, the past records of FPSO plants
to the development of the model for weight estimation have were inputted, and the initial variables for weight estima-
been conducted in the naval architecture and ocean engi- tion were selected. Correlation analysis was performed to
Ships and Offshore Structures 45

Figure 2. Overview of the weight estimation model using GP. (This figure is available in colour online.)

choose the dependent and independent variables based on were also verified by comparing the results of the nonlin-
any statistical relationship. Here, correlation refers to any ear multiple regression analysis performed in the previous
of a broad class of statistical relationships involving depen- study (Ha et al. 2015).
dence. Correlation analysis reduces the number of indepen-
dent variables and thus shortens the calculation duration of
GP. Next, analysis was again done, this time using GP. In 3.2. Genetic programming
this step, GP was performed with the variables chosen in GP was developed by Koza (1992, 1994), with the original
the correlation analysis. It uses an evolutionary algorithm- idea inspired by evolution to automatically develop com-
based methodology such as crossover, mutation, and repro- puter programs without programming them. Essentially, GP
duction. Finally, the output model for weight estimation was is a set of instructions and a fitness function for measuring
obtained. The following sections will describe each of the how well a computer has performed a task. It is a machine
aforementioned steps in detail. learning technique used to optimise a population of com-
A computational program for generating a model for puter programs according to a fitness landscape determined
weight estimation was developed in this study through the by a program’s ability to perform a given computational task
procedure described in Figure 2. Compared with the con- (Banzhaf et al. 1997).
ventional program, the program developed in this study can GP is a specialised genetic algorithm (GA), where each
perform the procedure in Figure 2 automatically, and can individual is a computer program, and as such, it has many
also derive the model for weight estimation. The program features in common with GA. The main difference between
has the functions for correlation and GP, and as such, it is GP and GA is the representation of chromosomes. Table 1
necessary to verify its results. For this reason, the results shows the difference between the two in this regard. While
of the correlation analysis in this program were compared GA uses fixed-length-string-based chromosomes, GP uses
with the results obtained from Microsoft Excel, a well- tree-based chromosomes with variable sizes and shapes. Its
known conventional program. The applications to FPSO tree-based representation makes GP flexible, but unfortu-
plant topsides, which will be discussed in the next chapter, nately, it is not very efficient. Thus, GA is used for the task
46 S. Ha et al.

Table 1. Difference between genetic algorithms and genetic programming.

Algorithm Genetic algorithms (e.g., binary-string coding) Genetic programming

Expression Binary string of 0 and 1 Function


String Tree
Fixed length Length variable
Main operator Crossover Crossover

Structure 1010110010101011

of optimising parameters for solutions when their structure reduce the overfitting problem and to improve the perfor-
is known, while GP is more often used to learn and dis- mance of GP, this study used correlation analysis for sta-
cover both the contents and structures of solutions. It has tistical analysis, which can check the dependence among
produced many novel and outstanding results in areas such the variables and can reduce the number of independent
as quantum computing (Spector et al. 1998), electronic de- variables.
sign (Koza et al. 1997), game playing (Alhejali and Lucas In the field of statistics, dependence means any sta-
2013), sorting (Wagner et al. 2015), and searching (Vidal tistical relationship between two random variables or two
et al. 2012) due to the improvements in the GP technology sets of data, and correlation refers to any of a broad class
and the exponential growth of computing power. of statistical relationships involving dependence. There are
The main genetic operators in GP are reproduction, several correlation coefficients (often denoted as r) that
crossover, and mutation, which are similar to those in GA. measure the degree of correlation. The most common of
Figure 3 shows the GP cycle using these operators. They these is the Pearson correlation coefficient, which is sen-
change subtrees in the chromosomes. For instance, the sitive only to a linear relationship between two variables
crossover operator changes the subtrees of two chromo- (which may exist even if one is a nonlinear function of the
somes if they can be attached to the opposite tree. The other). It is obtained by dividing the covariance of the two
genetic operators in GP change not only the values in the variables by the product of their standard deviations, as in
tree but also the structure of the tree. As such, compared to Equation (1).
GA, GP has many operators, which more diversely affects
the individuals existing in GP. Pearson s correlation
 coefficient
 
n Xi Yi − Xi Yi
=      . (1)
3.3. GP and correlation analysis n Xi2 − ( Xi )2 · n Yi2 − ( Yi )2
GP is a domain-independent problem solving method, simi-
lar to GA. The fact that these stochastic, genetically inspired In Equation (1), r is the correlation coefficient, Xi is an
algorithms perform a global search and are robust can be independent variable, Yi is the dependent variable, and n is
regarded as both their advantage and their disadvantage, de- the number of data.
pending on the type of problem being solved (Takač 2003). The correlation coefficient between two variables indi-
GP does not know the domain of the problem to be solved cates the degree of the said variables’ correlation with each
and may thus generate an overfitted solution. Similarly, one other. Table 2 shows the relationship between variables ac-
of the disadvantages of GP would be the time required to cording to the correlation coefficient.
find a solution. The efficiency of the evaluation function In statistical significance testing, the p-value is the prob-
greatly impacts the efficiency of the whole algorithm, and ability of obtaining a test statistic at least as extreme as the
therefore also the application of GP. For this reason, it is one that was actually observed, assuming that the null hy-
important to implement fast evaluation of individuals. To pothesis is true (Goodman 1999). A researcher will often
Ships and Offshore Structures 47

Figure 3. Cycle of GP and its main operators: a, reproduction; b, crossover; and c, mutation.

Table 2. Relationship between variables according to the corre- vey (Kerneur 2010; Clarkson 2012). Table 3 shows such
lation coefficient. records.
Correlation coefficient In Table 3, L, B, D, T, DWT, SC , OP , GP , WP , CREW,
(absolute value) Relationship WD , and TLW T are the length, breadth, depth, draft, dead-
weight, storage capacity, oil production capacity, gas pro-
1.0–0.7 Strong relation duction capacity, water production capacity, complements,
0.69–0.4 Moderate relation
0.39–0.2 Weak relation
well depth, and light weight of the topsides (simply, top-
0.19–0.0 No relation sides weight), respectively. N/C refers to whether the FPSO
was newly built or was converted from other ships or off-
shore structures, and TM/SM indicates whether the FPSO’s
reject the null hypothesis when the p-value turns out to be mooring type is turret mooring or spread mooring. MMBBL
lower than a certain significance level, often 0.05 or 0.01 means million barrels; MMBOPD refers to million bar-
(Dallal 2012). Such a result indicates that the observed re- rels of oil per day; MMCFPD is million cubic feet per
sult would be highly unlikely under the null hypothesis. To day; and MMBWPD refers to million barrels of water per
calculate the p-value, the following formula can be used: day.
   √  To make the simplified model for this FPSO example,
1  1 + r  n−3 37 records in Table 3 were used as the sample data (training
p-value = erfc ln · √ . (2)
2  1−r 2 set), and the others (test set) were used as the validation
data for testing the applicability of the model.
In Equation (2), erfc refers to the complementary error
function.
4.2. Selection of initial variables
From Table 3, 11 initial variables for estimating the topsides
4. Application of the weight estimation model to weight (TLW T ) were selected. As independent variables,
FPSO plant topsides the principal dimensions (L, B, D, T, and DWT), capacities
To examine the applicability of the proposed method for (SC , OP , GP , and WP ), and others (CREW and WD ) were
developing the weight estimation model, it was applied to initially selected, and the dependent variable to be estimated
examples of the weight estimation of the topsides of an was the topsides weight (TLW T ).
FPSO plant.

4.3. Generation of weight estimation model


4.1. Collection of past records using GP
In this study, various past records for estimating the weight In this section, the generation of a weight estimation model
of an FPSO plant were collected through a literature sur- for FPSO plant topsides through GP will be presented.
48

Table 3. Principal particulars of FPSOs.

SC OP GP WP CREW WD TLWT
FPSO N/C TM/SM Year L (m) B (m) D (m) T (m) DWT (ton) (MMBBL) (MMBOPD) (MMCFD) (MMBWPD) (person) (mm) (ton)

Captain N TM 1996 214.7 38 23.7 18 88,326 0.56 0.06 12 0.25 50 105 6200
Balder N TM 1996 211.1 36 20.8 14 88,420 0.38 0.083 39 0.085 60 127 3700
Bleo Holm N TM 1997 242 42 21.2 14.9 119,000 0.69 0.1 58 0.135 80 110 9000
Petrojarl Varg N TM 1998 214.7 38.2 22.2 16 60,000 0.42 0.057 53 0.057 77 84 2500
Jotun A N TM 1999 232 41.5 23.5 16 92,800 0.58 0.089 38 0.122 60 126 6000
Northern Endeavour N TM 1999 273 50 28 18.8 180,000 1.4 0.003 35 0.174 84 366 11,000
Asgard A N TM 1999 276.4 45 26.6 19.7 105,000 0.94 0.22 840 0.063 116 320 15,000
Terra Nova N TM 2000 292.2 45.5 28.2 20 120,000 0.96 0.125 300 0.115 120 95 32,000
Girassol N SM 2001 300 59.6 30.5 22.8 343,000 2 0.27 280 0.18 140 1400 23,500
Kizomba A N SM 2004 285 60 32.3 24.4 340,660 2.2 0.25 400 0.525 100 1180 23,000
Kizomba B N SM 2005 285 60 32.3 24.4 340,660 2.2 0.25 400 0.525 100 1010 23,000
Erha N SM 2005 296 63 32.3 24 375,600 2.2 0.21 340 0.15 100 1220 30,000
Dalia N SM 2005 312.4 60 33.2 24.3 329,000 2 0.24 280 0.265 190 1365 30,000
Bonga N SM 2005 305.1 58 32 23.4 312,500 2 0.225 170 0.1 70 1030 22,000
Belanak Natuna N SM 2005 285 58 26 16.7 210,000 1.14 0.1 500 0.06 120 100 25,000
Nganhurra N TM 2006 260 46 25.8 18.5 142,000 0.9 0.1 80 0.1 80 390 8000
Greater Plutonio N SM 2006 319 58 31 23.4 360,000 1.77 0.24 400 0.45 120 1310 24,000
Agbami N SM 2007 320 58.4 32 24 337,859 2.2 0.25 450 0.12 100 1462 35,000
Akpo N SM 2008 310 61 30.5 23.5 321,000 2 0.225 530 0.42 240 1325 37,000
Usan N SM 2011 320 61 32 24.7 353,200 2 0.18 176.6 0.1 180 750 27,700
S. Ha et al.

Petrojarl Foilnaven C TM 1988/1996 250.2 34 19.1 12.8 43,276 0.28 0.14 100 0.12 70 450 4500
Haewene Brim C TM 1996/1997 253 42 23.2 15 103,000 0.6 0.07 110 0.022 55 85 5000
Kuito C SM 1979/1999 334.9 43.7 27.7 21.4 228,033 1.5 0.1 52 0.02 76 383 3500
Perintis C TM 1984/1999 245.4 39.6 20.6 14.7 94,238 0.65 0.035 100 0.018 85 75 1900
Fluminense C TM 1974/2003 362 60 28.3 23 356,400 1.3 0.081 75 0.05 100 700 4500
Searose C TM 1999/2004 271.8 46 26.6 18 150,000 0.94 0.13 150 0.18 80 120 12,000
Mystras C SM 1976/2004 271 44 22.4 17 138,900 1.04 0.08 85 0.032 100 70 5500
Petrobras 43 C SM 1975/2004 337 54.5 27 21 273,191 2 0.15 162 0.2 100 785 14,000
Petrobras 48 C SM 1973/2005 328.6 54.5 27 21 273,622 1 0.15 210 0.251 194 1035 14,000
Global Producer 3 C TM 1999/2006 217.2 38 23 17 85,943 0.45 0.1 75 0.3 90 113 6000
Maersk Ngujjima-Yin C TM 1999/2008 332.9 58 31 22.7 308,492 1.2 0.12 100 0.2 80 420 7000
Cidade de Sao Mateus C SM 1989/2009 322.1 56 29.5 19.8 276,000 0.7 0.035 353 0.01 85 790 20,000
Maersk Peregrino C TM 2008/2010 345.2 58 31 20.6 277,450 1.6 0.1 13 0.023 100 100 6500
Petrobras 57 C SM 1988/2010 318.8 56 29.5 19.8 255,271 1.6 0.18 71 0.232 110 1260 14,500
Pazflor N SM 2011 325 61 32.5 25.6 320,000 1.9 0.22 150 0.382 240 800 32,000
CLOV N SM 2013 305 61 32 24 350,000 2 0.22 250 0.319 240 1200 37,478
Petrobras 63 C SM 1983/2010 346.3 57.3 28.5 22.9 322,911 2 0.14 35 0.325 46 1200 14,000
Skarv N TM 2010 295 50.6 29 19.9 128,000 0.95 0.085 671 0.02 126 350 16,000
OSX 1 N TM 2010 271.7 46 26.6 18.2 148,192 0.95 0.08 53 0.06 89 134 12,000
Glas Dowr C TM 1995/2010 242.3 42 21.1 14.9 105,000 0.66 0.06 85 0.065 96 350 4500
Ships and Offshore Structures 49

Table 4. Examples of parameters for developing a weight esti- Table 5. Three cases of parameters for selecting the maximum
mation model for FPSO plant topsides using. tree depth.

Parameter Value Parameter Value



Function + , −, × , ÷, sin (sine function), cos Function + , −, × , ÷, sin, cos, exp,
(cosine function),
√ exp (exponential), Terminal (variables) L, B, D, T, DWT, SC , OP , GP , WP ,
(square root) CREW, WD , TLW T
Terminal (variables) L, B, D, T, DWT, SC , OP , GP , WP , Population size 100
CREW, WD , TLW T Maximum generation 1000
Population size 100 Reproduction rate 0.05
Maximum generation 10,000 Crossover probability 0.85
Reproduction rate 0.1 Mutation probability 0.1
Crossover probability 0.8 Maximum depth Case 1: 4/Case 2: 5/Case 3: 6/Case 4: 7
Mutation probability 0.1 Range of constants [−20, 20]
Maximum depth 4
Range of constants [−20, 20]
maximum tree depth increased. Both the training and test
sets showed this tendency.
If the maximum tree depth increases, however, the final
The 37 records in Table 3 were used for generating the
model for weight estimation may have a more complicated
model. The results of the generated model could be changed
mathematical expression. The expression for each case is
according to the optimisation parameters of GP, such as
shown in Equations (3)–(6). The expressions were gener-
the number of population and the number of generations.
ated by simplifying the final chromosome in tree structure,
Thus, in this study, case studies were performed and suitable
and as such, they could have some constants outside the
values for each parameter were selected. Table 4 shows
range of −20 to 20. Thus, it is thought that the proper max-
examples of the parameters that can be used for developing
imum tree depth is 6 after compromising the fitness and
a weight estimation model for FPSO plant topsides. The
complexity of the expression.
maximum depth means the maximum tree depth of the
chromosomes in the tree structure. As for the range of 
constants, only the constants within this range can be chosen TLW T = 2.655 · D · SC · cos WP GP
during the process. · (D − 15.88) + 5158, (3)
To determine how accurately the estimation model es-
 √
timates the dependent variable, TLW T in this example, the
fitness of each generation was also calculated during the cal- TLW T = 2.452 SC · GP · eT · cos WP + 4505, (4)
culation. This study used root mean square error (RMSE),
which is generally used in statistics, to check the fitness of TLW T = 0.01182[D 4 − SC (23.22L + DW T · WP )
each generation. + B · SC (T + WP ) (GP + CREW )] + 552.8,
(5)

√  2GP
TLW T = 46, 100 − 4647 D + D · CREW − 12.36 · CREW + SC · GP − 20.8 + eSC + − 133.8. (6)
WD

4.3.1. Case study 4.3.2. Final model generated by GP


To choose the proper value for each parameter of GP, some The optimisation parameters for GP, which were derived
case studies were performed. In this section, an example from the case studies, are summarised in Table 4. For the
of such case studies, where the effect of the maximum tree case of the maximum tree depth, its value was set to 6,
depth on the fitness of the final solution was checked, will as shown in the previous case study. All the values of the
be introduced. parameters were also chosen by performing case studies for
The case studies were performed using the parameter each parameter.
values in Table 5. All the values, except that of the maximum Figure 5 shows the results of GP for developing a weight
depth, were the same for all the cases. estimation model for FPSO plant topsides. At this time,
Figure 4 and Table 6 show the results of the case study full optimisation was performed with GP, as opposed to
for choosing the maximum tree depth. It is indicated that the short optimisation for the case studies. As shown in
these results are from the final model, but they are actually Figure 5a, the best fitness decreased continuously during
from the intermediate model, from which the final model the optimisation. The RMSE of the training set with 37
was obtained through short optimisation. The results show data in Table 3 was 1971.28, and its variation was 96.79%.
that both RMSE and the variance were improved when the Additionally, the RMSE of the test set with three data in
50 S. Ha et al.

Table 6. Results of the case study for selecting the maximum tree depth.

Case 1 (maximum Case 2 (maximum Case 3 (maximum Case 4 (maximum


Item tree depth: 4) tree depth: 5) tree depth: 6) tree depth: 7)

Error 4968.20 4416.08 3802.56 3339.75


Training set (37 records)
Variation 79.60% 83.22% 88.05% 90.78%
Computation time 67 seconds 76 seconds 83 seconds 98 seconds

(a) Maximum tree depth: 4, Calculaon me: 67s (b) Maximum tree depth: 5, Calculaon me: 76s
Best fitness: 4968.2028 found at generation 655 Best fitness: 4416.0838 found at generation 946
8.95 8.9
Best fitness Best fitness
8.9
8.8
8.85

8.8 8.7

Log Fitness
Log Fitness

8.75
8.6
8.7

8.65 8.5

8.6
8.4
8.55

8.5 8.3
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
Generation Generation

(c) Maximum tree depth: 6, Calculaon me: 83s (d) Maximum tree depth: 7, Calculaon me: 98s
Best fitness: 3802.5627 found at generation 963 Best fitness: 3339.7512 found at generation 962
8.9 8.8
Best fitness Best fitness

8.8 8.7

8.7 8.6
Log Fitness

Log Fitness

8.6 8.5

8.5 8.4

8.4 8.3

8.3 8.2

8.2 8.1
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
Generation Generation

Figure 4. Results of the case study for selecting the maximum tree depth: (a) maximum tree depth = 4; (b) maximum tree depth = 5;
(c) maximum tree depth = 6; and (d) maximum tree depth = 7.

Table 3 was 1859.92, and its variation was 84.78%. The The final model has many variables, and as such, it
final expression of the weight estimation model is is suspected that this equation has an overfitting problem,

√

T 24.13·CREW
B · SC GP (B − D) B − D + SC + CREW + 4.439Wcos
P
+ WD sin D
0.0121
TLW T =  − 5256. (7)
B
0.0101 · OP + D − sin B
Ships and Offshore Structures 51

Figure 5. Results of GP for developing a weight estimation model for FPSO plant topsides.

which generally occurred in GP, as mentioned in Section 4.4.2. Final model generated by GP
3.3. Moreover, it is very difficult to understand the relation From the correlation analysis, nine variables are selected for
between the nine variables (B, D, T , SC , OP , GP , WP , GP. Their parameter values were the same as those in the
CREW , and WD ) and the dependent variable TLW T . previous section. The results of the GP including correlation
analysis are shown in Figure 6.
As shown in Figure 6a, the best fitness decreased contin-
4.4. Generation of weight estimation model using uously during the optimisation. The RMSE of the training
GP with correlation analysis set with 37 data in Table 3 was 2258.85, and its variation was
In the previous section, the final expression of the weight 95.78%. Additionally, the RMSE of the test set with three
estimation model generated by GP has too many indepen- data in Table 3 was 2014.85, and its variation was 82.13%.
dent variables for estimating the dependent variable TLW T , The final expression of the weight estimation model is
and thus, the model may have an overfitting problem. This
problem is caused by the careless use of GP with regard to TLW T = 2824 − 0.1489

the dependency of the variables. Thus, this study adopted 2GP 0.3158 · DW T
× − + (D − 17.63) (12.99B − D + GP )
correlation analysis, which measures the relationship be- sin D GP

√ sin (OP · WD )
tween two variables, for statistical analysis. × cos SC − CREW − sin B + sin D + .
0.7049SC + sin WD
(8)
4.4.1. Correlation analysis
Correlation analysis was performed on the independent This model consists of eight independent variables for
variables and the dependent variable TLW T . Table 7 shows estimating the dependent variable TLW T .
the results of the correlation analysis. The results in the table
were verified by comparing them with the results obtained
from Microsoft Excel. 4.4.3. Comparison with the results of GP without
As shown in Table 7, all the independent variables, ex- correlation analysis
cept L and WP , were selected. At this time, the criteria for Compared with the results of the GP without correlation
the selection were the following: (1) a correlation coeffi- analysis, Equation (8) has less variables for expressing the
cient (r) of over 0.5; and (2) a p-value of less than 0.15. In dependent variable TLW T . Table 8 compares the results of
fact, the general criterion for the p-value is that it should the two GP methods. It shows that the results of the GP
be less than 0.05 or 0.01 (Dallal 2012). A higher value was with correlation analysis seem to have less accuracy than
used, however, so that the number of independent variables the results of GP only, but the difference between two GP
to be included in the weight estimation model would not be methods is acceptable when we consider the calculation
small in this example. time.
52 S. Ha et al.

Table 7. Results of the correlation analysis of the independent variables and of the topsides weight of all the FPSOs.

Item L B D T DWT SC OP GP WP CREW WD TLWT Criteria

r 1.00 0.80 0.73 0.75 0.81 0.69 0.39 0.20 0.14 0.39 0.59 0.4158 X
L
p-Value – 0.00 0.00 0.00 0.00 0.00 0.03 0.30 0.47 0.03 0.00 0.0218 O
r 0.80 1.00 0.91 0.88 0.96 0.87 0.62 0.40 0.42 0.55 0.77 0.7176 O
B
p-Value 0.00 – 0.00 0.00 0.00 0.00 0.00 0.03 0.02 0.00 0.00 0.0000 O
r 0.73 0.91 1.00 0.95 0.90 0.88 0.68 0.40 0.45 0.51 0.74 0.7448 O
D
p-Value 0.00 0.00 – 0.00 0.00 0.00 0.00 0.03 0.01 0.00 0.00 0.0000 O
r 0.75 0.88 0.95 1.00 0.92 0.89 0.71 0.37 0.53 0.55 0.78 0.7089 O
T
p-Value 0.00 0.00 0.00 – 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.0000 O
r 0.81 0.96 0.90 0.92 1.00 0.90 0.65 0.31 0.45 0.49 0.83 0.6561 O
DWT
p-Value 0.00 0.00 0.00 0.00 – 0.00 0.00 0.09 0.01 0.01 0.00 0.0001 O
r 0.69 0.87 0.88 0.89 0.90 1.00 0.75 0.35 0.51 0.47 0.80 0.7104 O
SC
p-Value 0.00 0.00 0.00 0.00 0.00 – 0.00 0.06 0.00 0.01 0.00 0.0000 O
r 0.39 0.62 0.68 0.71 0.65 0.75 1.00 0.59 0.60 0.53 0.78 0.7341 O
OP
p-Value 0.03 0.00 0.00 0.00 0.00 0.00 – 0.00 0.00 0.00 0.00 0.0000 O
r 0.20 0.40 0.40 0.37 0.31 0.35 0.59 1.00 0.25 0.39 0.40 0.6330 O
GP
p-Value 0.30 0.03 0.03 0.04 0.09 0.06 0.00 – 0.18 0.03 0.03 0.0002 O
r 0.14 0.42 0.45 0.53 0.45 0.51 0.60 0.25 1.00 0.40 0.56 0.4648 X
WP
p-Value 0.47 0.02 0.01 0.00 0.01 0.00 0.00 0.18 – 0.03 0.00 0.0093 O
r 0.39 0.55 0.51 0.55 0.49 0.47 0.53 0.39 0.40 1.00 0.48 0.7032 O
CREW
p-Value 0.03 0.00 0.00 0.00 0.01 0.01 0.00 0.03 0.03 – 0.01 0.0000 O
r 0.59 0.77 0.74 0.78 0.83 0.80 0.78 0.40 0.56 0.48 1.00 0.6893 O
WD
p-Value 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.01 – 0.0000 O

Table 8. Comparison of the results of the two genetic programming methods.

Genetic programming with


Item Genetic programming correlation analysis

No. of independent variables 9 8


Error 1971.28 2258.85
Training set
Variation 96.79% 95.78%
Error 1859.92 2014.85
Test set
Variation 84.78% 82.13%
Computation time 2519 seconds 2045 seconds

Figure 6. Results of GP with correlation analysis for developing a weight estimation model for FPSO plant topsides.
Ships and Offshore Structures 53

Table 9. Difference between the actual and estimated weights of the FPSO topsides – 37 records.

Estimated by GP with Estimated by


correlation analysis nonlinear model

Actual Weight Difference Weight Difference


Name weight (A) (B) ((A − B)/A) (C) ((A − C)/A)

Captain 6200 3474 0.44 4138 0.33


Balder 3700 3586 0.03 3288 0.11
Bleo Holm 9000 4918 0.45 4355 0.52
Petrojarl Varg 2500 5010 −1.00 4478 −0.79
Jotun A 6000 5910 0.02 4683 0.22
Northern Endeavour 11,000 9323 0.15 9793 0.11
Asgard A 15,000 15,390 −0.03 25,402 −0.69
Terra Nova 32,000 32,760 −0.02 15,348 0.52
Girassol 23,500 23,394 0.00 23,018 0.02
Kizomba A 23,000 29,065 −0.26 27,510 −0.20
Kizomba B 23,000 23,363 −0.02 27,510 −0.20
Erha 30,000 29,209 0.03 26,238 0.13
Dalia 30,000 33,189 −0.11 29,794 0.01
Bonga 22,000 21,771 0.01 19,770 0.10
Belanak Natuna 25,000 24,315 0.03 18,416 0.26
Nganhurra 8000 8586 −0.07 7636 0.05
Greater Plutonio 24,000 26,295 −0.10 23,260 0.03
Agbami 35,000 29,025 0.17 28,263 0.19
Akpo 37,000 37,222 −0.01 39,247 −0.06
Usan 27,700 27,458 0.01 25,324 0.09
Petrojarl Foilnaven 4500 4547 −0.01 4012 0.11
Haewene Brim 5000 6362 −0.27 6013 −0.20
Kuito 3500 7872 −1.25 10,226 −1.92
Perintis 1900 3166 −0.67 5047 −1.66
Fluminense 4500 4877 −0.08 10,884 −1.42
Searose 12,000 10,683 0.11 9738 0.19
Mystras 5500 6029 −0.10 6548 −0.19
Petrobras 43 14,000 13,784 0.02 15,925 −0.14
Petrobras 48 14,000 12,968 0.07 18,113 −0.29
Global Producer 3 6000 7837 −0.31 5630 0.06
Maersk Ngujjima-Yin 7000 7034 0.00 12,949 −0.85
Cidade de Sao Mateus 20,000 19,994 0.00 16,055 0.20
Maersk Peregrino 6500 8381 −0.29 13,293 −1.05
Petrobras 57 14,500 14,869 −0.03 13,488 0.07
Pazflor 32,000 30,212 0.06 32,336 −0.01
CLOV 37,478 34,763 0.07 34,763 0.07
Petrobras 63 14,000 9340 0.33 13,487 0.04

4.5. Comparison with the statistical method regression analysis was introduced in detail in the authors’
In the authors’ previous study (Ha et al. 2015), a simplified previous study (Ha et al. 2015). Nonlinear multiple re-
model for the weight estimation of FPSO plant topsides gression analysis was performed with the 37 training sets
was developed using the statistical method. In this section, presented in Table 3. The final nonlinear form of the weight
to confirm the usability of the proposed method, the results estimation model is
are compared with those of the statistical method, especially
nonlinear multiple regression analysis. TLW T = −764.8522 + 0.3304D 3 + 720.6451SC3
+ 21.2018GP + 0.0009863CREW 3 . (9)

4.5.1. Nonlinear form of weight estimation model The final model in Equation (9) shows that the topsides
analysed using the statistical method weight of the FPSO example can be represented as the non-
In brief, the nonlinear multiple regression analysis in the linear relationship between four independent variables (D 2 ,
previous study consisted of three analysis steps: variable SC2 , GP , and CREW). It also satisfied the F-test and t-test
transformation, correlation analysis, and multiple regres- criteria. In addition, the adjusted R2 of the final regression
sion analysis. The whole procedure of nonlinear multiple model is 0.797. The adjusted R2 is a statistic that gives
54 S. Ha et al.

Table 10. Differences between the actual and estimated weights of the FPSO topsides – three unused records.

Estimated by GP with Estimated by


correlation analysis nonlinear model

Actual Weight Difference Weight Difference


Name weight (A) (B) ((A − B)/A) (C) ((A − C)/A)

Skarv 16,000 18,313 −0.14 24,111 −0.51


OSX 1 12,000 9560 0.20 7891 0.34
Glas Dowr 4500 5434 −0.21 5221 −0.16

some information about the goodness of fit of a model. An (GP). Various past records of FPSO plants were first col-
adjusted R2 value close to 1.0 indicates that the regression lected through a literature survey, and then analysis using
line fits the data well. GP was performed to develop a weight estimation model
for FPSO plant topsides. To improve the computation time
and to overcome the overfitting problem, correlation anal-
4.5.2. Comparison of the results of the two methods ysis was adopted in this study. A comparative test for the
Using the two final models in Equations (8) and (9), the models based on GP and nonlinear multiple regression anal-
actual and estimated weights of the FPSO topsides for 37 ysis was also performed. As a result, the GP-based model
records were compared. Table 9 shows the difference be- showed a better estimation capacity than the nonlinear mul-
tween the actual and estimated weights for the 37 records. tiple regression analysis-based model in terms of accuracy.
The average difference between the actual and estimated Finally, to evaluate the applicability of the developed mod-
weights of the FPSO topsides according to the GP-based els, they were applied to an FPSO example. The results
model was 7.08%, the coefficient of variation (COV) was showed that the developed models can be used to estimate
0.324, and the variation explained (R2 ) was 95.78%. As the topsides weight of future FPSOs.
for the statistical-method-based model, it determined the Furthermore, the overall performances of the developed
average difference to be 16.89%; the COV, 0.566; and the models were shown to depend on the past records collected
R2 , 81.93%. The GP-based model thus more accurately through literature survey. Thus, if there is noise or wrong
reported the average difference between the actual and es- information in the past records, the applicability and relia-
timated weights of the FPSO topsides, and the COV, com- bility of the models can be reduced. In addition, the engi-
pared to the statistical-method-based model. The maximum neering meaning of the developed models should be further
difference according to GP was 125%, and that according investigated, and a parametric test for some variables used
to the statistical method was 192%. in the method should be performed to identify the impact of
For the validation of the two models, the actual and such variables on the developed models. Finally, in the fu-
estimated weights of the FPSO topsides for the three unused ture, the database for the past records of FPSO plants, such
records as a test set, the last records shown in Table 3, were as FPSOs, will be continuously updated and made error-
also compared. Table 10 shows the differences between the free, and the developed models will be improved through
actual and estimated weights of the FPSO topsides for the their application to various examples.
three unused records.
The average ratio obtained by the GP-based model was
4.96%; the COV, 0.221; and the R2 , 86.34%. As for the Disclosure statement
statistical-method-based model, the average ratio that it ob- No potential conflict of interest was reported by the authors.
tained was 10.83%; the COV, 0.427; and the R2 , 71.19%.
From this it can be seen that GP with correlation analysis
can yield a better model than the statistical method. Funding
This work was partially supported by Global Leading Technology
Program of the Office of Strategic R&D Planning (OSP) funded
by the Minister of Trade, Industry & Energy, Korea [10042556-
5. Conclusions and future studies
2012-11]; New & Renewable Energy of the Korea Institute of
In spite of the importance of the topsides weight in the de- Energy Technology Evaluation and Planning (KETEP) funded
sign of an FPSO plant, it has been roughly estimated using by the Minister of Trade, Industry & Energy, Korea [number
the existing similar data and based on the designer’s expe- 20124030200110]; BK21 Plus Program (Education and Research
Center for Creative Offshore Plant Engineers) funded by the Min-
rience. To solve this problem, a weight estimation model istry of Education, Korea; Engineering Research Institute of Seoul
for FPSO plant topsides was developed in this study using National University, Korea; and Research Institute of Marine Sys-
the optimisation method, especially genetic programming tems Engineering of Seoul National University, Korea.
Ships and Offshore Structures 55

References Kaiser MJ, Snyder BF. 2013. Empirical models of jackup


Alhejali AM, Lucas SM. 2013. Using genetic programming to rig lightship displacement. Ships Offshore Struct. 8:468–
evolve heuristics for a Monte Carlo Tree Search Ms Pac- 476.
Man agent. Paper presented at: 2013 IEEE conference on Kaiser MJ, Snyder BF. 2014. Offshore wind struc-
Computational Intelligence in Games (CIG); Niagara Falls, ture weight algorithms. Ships Offshore Struct. 9:551–
ON, Canada. 556.
Banzhaf W, Nordin P, Keller RE, Francone FD. 1997. Genetic Kerneur J. 2010. 2010 Worldwide survey of FPSO units. Houston
programming: an introduction: on the automatic evolution of (TX): Offshore Magazine.
computer programs and its applications (The Morgan Kauf- Koike K, Minoura M. 2011. Application of a statistical prediction
mann Series in Artificial Intelligence). 1st ed. San Francisco method of ship performance by using onboard measurement
(CA): Morgan Kaufmann. data. J Jpn Soc Nav Arch Ocean Eng. 13:51–58.
Bolding A. 2001. Bulk factor method estimates FPSO: topsides Koza JR. 1992. Genetic programming: on the programming of
weight. Oil Gas J. 99:49–53. computers by means of natural selection. Cambridge (MA):
Charhate SB, Deo MC, Londhe SN. 2009. Genetic programming MIT Press.
for real-time prediction of offshore wind. Ships Offshore Koza JR. 1994. Genetic programming. II. Automatic discovery of
Struct. 4:77–88. reusable programs. Cambridge (MA): MIT Press.
Clarkson. 2012. The mobile offshore production units register Koza JR, Bennett FH, Andre D, Keane MA, Dunlap F. 1997.
2012. 10th ed. London (UK): Clarkson. Automated synthesis of analog electrical circuits by means
Dallal GE. 2012. The little handbook of statistical prac- of genetic programming. IEEE Trans Evol Comput. 1:109–
tice [Internet]. [cited 2012 December 31]. Available from: 128.
http://www.jerrydallal.com/LHSP/LHSP.HTM Lee KY, Roh MI, Cho SH. 2001. Multidisciplinary design opti-
Gaur S, Deo MC, 2008. Real-time wave forecasting using genetic mization of mechanical systems using collaborative optimiza-
programming. Ocean Eng. 35:1166–1172. tion approach. Int J Veh Des. 25:353–368.
Goodman SN. 1999. Toward evidence-based medical statistics. 1: Spector L, Barnum H, Bernstein HJ, Swamy, N. 1998. Genetic
the p value fallacy. Ann Int Med. 130:995–1004. programming for quantum computers. Genet Program. 365–
Ha S, Seo SH, Roh MI, Shin HK. 2015. Simplified nonlin- 373.
ear model for the weight estimation of FPSO plant top- Takač A. 2003. Genetic programming in data mining: cellular
side using the statistical method. Ships Offshore Struct. approach [Master’s thesis]. [Bratislava (Slovakia)]: Physics
doi:10.1080/17445302.2015.1038870. and Informatics Comenius University.
Hwang JH. 2013. Selection of optimal liquefaction process sys- Vidal T, Crainic TG, Gendreau M, Lahrichi N, Rei W. 2012. A
tem considering offshore module layout for LNG FPSO at hybrid genetic algorithm for multidepot and periodic vehicle
FEED stage [Ph.D. thesis]. [Seoul (Korea)]: Seoul National routing problems. Oper Res. 60:611–624.
University. Wagner M, Neumann F, Urli T. 2015. On the performance of
Hussin H, Hashim FM, Mokhtar AA. 2012. Systematic approach different genetic programming approaches for the SORTING
to maintainability analysis at operational phase. J Appl Sci. problem. Evol Comput. doi:10.1162/EVCO_a_00149.
12:2562–2567.

View publication stats

You might also like