You are on page 1of 20

University of Waterloo UAE

Process Data Analysis CHE 220 Probability and Statistics CIVE 224
Mazda Biglari
Fall 2011
1

Introduction
CivE 224: Probability and Statistics ChE 220: Process Data Analysis

Better title: The collection, analysis and interpretation of data for decision making in chemical/civil engineering.
2

Introduction
Historical origin of statistics: making and dealing with measurements related to the state; population, number of children, income etc. In general statistics deals with the collection and analysis of data to maximize the information obtained Why do engineers need to study Statistics?
3

The Engineering/Scientific Method

Develop a clear description

Identify the important factors

Propose or refine a model

Manipulate the model

Confirm the solution

Conclusions and recommendations

Conduct experiments

Loop

An important assistance to this method is STATISTICS because: the field of statistics deals with the gathering, analysis, presentation, and use of data An engineer is someone who solves problems of interest to society by the efficient application of scientific principles

Examples


All technical, managerial and financial decisions made by engineers depend upon the interpretation of data:  interpreting the results of experiments conducted in laboratories  Using operating plant data for optimization, troubleshooting or process control  Analyzing density and compressive strength of concrete or timber  Analyzing air quality data to determine if a plant is in compliance  Implementing statistical quality control schemes It is essential that engineers be exposed to statistical 5 reasoning early in their training

Example: Polypropylene Production




Polymer produced by connecting propylene units to form a long chain:

Example: Polypropylene Production




The size of the chain is determined by the molecular weight, which can be inferred from Melt Flow Applications:  Thermoplastics used in automotive parts such as battery cases, child safety seats etc.  Fibres for carpets, rope and cords  Films for packaging  Medical devices such as syringes, implants etc.  Blow-molded bottles Different applications require different melt indicescontrol variable

Stem and Leaf Diagrams




Typically melt index is measured every hour: 1.91, 1.85, 1.93, 1.73, 1.87, 1.96, 1.75, 2.11, 2.21, 1.81 1.72, 1.99, 2.01, 1.88, 1.93, 1.95, 2.13, 2.25, 1.98, 1.82 1.7 1.8 1.9 2.0 2.1 2.2 235 12578 1335689 1 13 15

What are some potential sources that could explain the variability in melt index?

Another Example: Stem and Leaf Diagrams


105 97 245 163 207 134 218 199 160 196 221 154 228 131 180 178 157 151 175 201 183 153 174 154 190 76 101 142 149 200 186 174 199 115 193 167 171 163 87 176 121 120 181 160 194 184 165 145 160 150 181 168 158 208 133 135 172 171 237 170 180 167 176 158 156 229 158 148 150 118 143 141 110 133 123 146 169 158 135 149

Compressive strength (in psi) of 80 alloy specimens

Box plot

Stem-and-Leaf Diagram, Histogram and Box Plot for Compressive Strength Data

10 10

Error or Uncertainty or Noise




In statistics, error is an emotionally neutral term. It refers to often unavoidable variation and does not associate blame Errors or uncertainty or noise in measurements are caused by known and unknown factors Statistical methods help us to describe and understand the variability arising from errors
11

Data Analysis and Inference




Frequency histogram based on 200 observations gives us an estimate of the shape and location of the distribution

12

Questions?
1. 2. 3. 4.

5. 6.

7.

What value represents the truth and what do we mean by the truth? What is the probability of getting an Melt Index > 2.1? How do we characterize the distribution of Melt Index? How should be design our collection scheme to ensure that our sample is representative? How many measurements are required? How do we compare the data against a pre-specified target value? What about variability? How can we quantify it? How can we reduce it? How do we know that we have reduced it?
13

Comparing Two Samples

1. 2.

How do we compare two samples? Are the two lines producing significantly different Melt Indices? What is the best way to collect data to compare two samples? Does it matter?

14

Modeling: Regression Analysis

Suppose we know that the hydrogen flow rate has an effect on Melt Index and we have collected some data 1. How do we develop a model to represent this data? 2. Given a mechanistic or empirical model: we estimate them?
1 MI ! k1 [H ]2
k2

, the
15

model contains some unknown constants. Given data how do

Experimental Design


Suppose you want to determine the effects of several variables like hydrogen flow rate, impurities, temperature, catalyst concentration, production rate etc., on Melt Index

1. How should your design can carry out your study; observational versus designed experiments? 2. Can you minimize the number of experiments while maximizing the information gained? 3. How can you use this data to build a model?

16

Summary: Why Statistics?




Decision-making in the face of uncertainty! Q: Why do we have to be able to do this? A: Because any real world method of measurement is incapable, in principle, of giving the correct or true values; i.e. a measurement is always subject to uncertainty

17

Measurement Model


One very useful way of looking at measurements is to hypothesize the following model: y= where y = measured value = true value to be estimated = error +

The error cannot be predicted exactly, but can be dealt 18 with using probability theory

Extra slides

19

Box Plots

The box plot is a graphical display that simultaneously describes several important features of a data set, such as center, spread, departure from symmetry, and identification of observations that lie unusually far from the bulk of the data (mild and extreme outliers.)

Description of a box plot


20 20

You might also like