You are on page 1of 16

1 Question 1 (15 points)

Software effort estimation, measured in number of hours required to develop software, is an important activity associated with any software development company. It is used for investment planning and pricing of the software development. One approach usually used for software effort estimation is through Function Point Analysis (FPA). First made public by Allan Albrecht of IBM in 1979, the FPA technique quantifies the functions contained within software in terms that are meaningful to the software users. The measure relates directly to the business requirements that the software is intended to address. It can therefore be readily applied across a wide range of development environments and throughout the life of a development project, from early requirements definition to full operational use. Other business measures, such as the productivity of the development process and the cost per unit to support the software, can also be readily derived. Data is collected from all software projects completed by AT &T data center from 1986 through 1991. The data contains 104 observations and 5 variables; namely 1. Number of Worker Hours, 2. Values for function point count, 3. Operating systemused,4. Database management system and 5.Programming language.Variables, function point counts, operating system used, database and programming language are often used to help predict the number of work hours that will be required to complete a proposed software project. Data description: S.No 1 2 3 Variable Name Number of Worker Hours Functional Point Counts Operating System Variable Type Continuous Continuous Categorical 0: Unix 1: MVS Categorical 1: IDMS 2: IMS 3: INFORMIX 4: INGRESS 5: Other Categorical 1: COBOL 2: PLI 3: C 4: Other Code used in Regression NWH FPC 0 = Unix 1 = MVS D1: IDMS D2: IMS D3: INFORMIX D4: INGRESS

Database Management System

Language

L1: COBOL L2: PL1 L3: C

A simple linear regression is carried between number of worker hours (Response) and functional point counts (predictor) using SPSS. The SPSS output for the model NWH = 0 + 1 x FPC is shown in tables 1.1 -1.3.

Table 1.1 Descriptive Statistics N Function Points Number of Hours Valid N (listwise) 104 104 104 Minimum 102 283 Maximum 3472 72219 Mean 620.89 9976.23 Std. Deviation 639.324 11944.580

Table 1.2 Model Summary

Adjusted R Model 1 R .810


a

Std. Error of the Estimate 7046.957

R Square

Square

a. Predictors: (Constant), Function Points b. Dependent Variable: Number of Worker Hours

Table 1.3 Coefficients


Unstandardized Coefficients Model 1 (Constant) Function Points B 585.664 Std. Error 965.521 1.086

Dependent: Number of worker hours Use tables 1.1- 1.3 to answer questions 1.1 - 1.3. 1.1 Is there a statistically significant (assume = 0.05) relationship between the function points and the number of worker hours? (2 points)

1.2 What is the rate at which the number of worker hour changes when there is a change in the number of functional point counts?(1 points)

1.3 For software with functional point count of 2000, what is the maximum number of worker hours required to develop the software at 95% confidence level? (3 points)

Table 1.4 shows the regression output between the number of worker hours and the predictors. Answer questions 1.4-1.5 based on the output provided in table 1.4

Table 1.4Coefficients Unstandardized Coefficients Model 1 (Constant) Function Points D1 D2 D3 D4 L1 L2 L3 Operating System B -1054.810 14.101 6718.913 10043.06 -1239.219 -589.396 2400.876 185.927 -279.390 -3368.162 Std. Error 2788.973 1.025 1941.627 2419.735 2197.356 2942.136 1745.254 2432.629 2755.877 2998.553

Standardized Coefficients Beta T -.378 .755 .271 .279 -.041 -.012 .095 .007 -.006 -.140 13.761 3.460 4.150 -.564 -.200 1.376 .076 -.101 -1.123 Sig. .706 .000 .001 .000 .574 .842 .172 .939 .919 .264

a. Dependent Variable: Number of Hours

1.4 On an average, how many additional worker hours is required if the software is developed using the database INFORMIX instead of INGRESS? Explain. (2 points)

1.5 Among the predictors, which predictor has the least influence on the number of worker hours? (2 point)

A stepwise regression is carried out between number of worker hours (response variable) with functional point counts and the operating system as predictors. The results are shown in tables 1.5 and 1.6. Use tables 1.5 and 1.6 to answer question 1.6.
Table 1.5Model Summaryc Adjusted R Model 1 2 a. Predictors: (Constant), Function Points b. Predictors: (Constant), Function Points, operating System c. Dependent Variable: Number of Worker Hours Table 1.6. Co-efficient Unstandardized Coefficients Model 1 (Constant) Function Point counts 2 (Constant) Function Point counts Operating System -1303.13 14.546 3614.936 B 585.664 Std. Error 965.521 1.086 1179.658 1.062 404.753 0.810 0.240 0.811 0.254 0.787 0.149 0.810 0.810 0.810 T Zero-order Correlations Partial Part R .810
a

Std. Error of the Estimate

R Square .655

Square .652

7046.957 6848.999

1.6 Which of the following statements is true? Justify your response based on regression models 1 and 2. (1 point) (A) Unix has more functional points than MVS. (B) MVS has more functional points than UNIX

A stepwise regression output after including all the predictors in the model building is shown in the following table (table 1.7)

Table 1.7 Coefficients(dependent number of worker hours) Standardized Unstandardized Coefficients Model 1 (Constant) Function Points 2 (Constant) Function Points D2 3 (Constant) Function Points D2 D1 B 585.664 15.124 -202.474 15.101 6418.217 -1958.301 14.382 8628.863 5413.766 Std. Error 965.521 1.086 956.404 1.040 2000.313 998.000 .989 1951.181 1368.884 .770 .240 .218 .808 .179 .810 Coefficients Beta t .607 13.926 -.212 14.524 3.209 -1.962 14.548 4.422 3.955 Sig. .545 .000 .833 .000 .002 .053 .000 .000 .000

1.7 At 95% confidence level, test whether using the database IMS, requires on average at least 2000 worker hours more than the base category. (2 points)

1.8 When all the predictor values are zero, the regression model gives a negative value for worker hours, how can you explain negative value for constant? (2 point)

Question 2 (10 points) Highly publicized CEO salaries in the US have generated sustained interest in the factors related to their compensation packages. The data on 164 CEOs total compensation in dollars, defined as the sum of salary plus any bonuses including stock options, in the financial sector along with some potential explanatory variables is collected and the details are provided in the following Table: Variable Name MBA Age (in years) Years in firm (in years) Return Over 5 years (%) Sales (in millions of dollars) Variable Type Categorical ( 1 for MBA and 0 for No MBA) Numerical Numerical Numerical Numerical

Several models are developed for analyses, which, along with related information are given below: Model 1: R square = .439
Coefficients
a

Model

Unstandardized Coefficients

Standardized Coefficients

B Std. Error Beta t Sig. 1 (Constant) 10.443 .325 32.098 .000 ln(Sales) .523 .047 .662 11.253 .000 a. Dependent Variable: ln(Total Comp) b. ln(sales) is the natural logarithm of sales and ln(total comp) is the natural logarithm of total compensation.

Model 2:R square = .441


Coefficients
a

Model

Unstandardized Coefficients B Std. Error (Constant) ln(Sales) MBA 10.420 .522 .093 .327 .047 .118

Standardized Coefficients Beta .661 .047

t 31.865 11.213 .790

Sig. .000 .000 .431

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients Beta .661 .047 t 31.865 11.213 .790 Sig. .000 .000 .431

B Std. Error 1 (Constant) 10.420 .327 ln(Sales) .522 .047 MBA .093 .118 a. Dependent Variable: ln(Total Comp)

Model 3: Model

R square = .463 Coefficientsa Unstandardized Standardized Coefficients Coefficients B Std. Error Beta 10.575 .380 .500 .094 .054 .046 .633

(Constant) ln(Sales) MBA

t 27.834 9.183

Sig.

lnSales_MBA .085 .040 a. Dependent Variable: ln(Total Comp) b. lnsales_MBA is an interaction between ln(sales) and MBA, that is, ln(sales)*MBA 2.1Which variables in Model 3 have a significant relationship with Total Compensation? Clearly state any hypotheses used and assumptions made to draw your inference. (2 points)

2.2 From the models given above what can you conclude about having an MBA - does it or does it not have a significant impact on Total Compensation? If so, what is the impact? Support your inference with adequate explanation/work. (2 points)

2.3 Is Model 3 above better than Model 1? Why or why not?Use appropriate test(s) and give adequate explanations in support of your answer. (2 points)

Model 4: Model Coefficientsa Unstandardized Standardized Coefficients Coefficients B Std. Error Beta 2330831.84 450243.626 7

(Constant)

t 5.177 1.98771

Sig. .000 .048

MBA 276331.922 139020.2 a. Dependent Variable: Total Comp

10

2.4 Using Model 4 can we conclude that the average total compensation for CEOs who have an MBA is at least 5% more than those who do not have an MBA? State the hypotheses clearly and show all work. (2 points)

For question2.5, use the following information: The Stepwise Method was used to develop a model for predicting total compensation of CEOs using the data and independent variables described above. The SPSS output obtained is as given below:

Model

Unstandardized Coefficients B Std. Error .325 .047 .322 .045 .003 .522 .045 .003 .008 .548 .045 .003 .009 .005

Stand Coeff Beta .662 .673 .194 .652 .199 .136 .662 .211 .208 -.143 t 32.098 11.253 31.753 11.784 3.389 17.694 11.451 3.523 2.389 16.105 11.724 3.776 3.194 -2.196 Zeroorder

Correlations Partial Part

1 2

(Constant) ln(Sales) (Constant) ln(Sales) Return over 5 yrs

10.443 .523 10.223 .532 .010 9.233 .516 .010 .020 8.826 .523 .011 .030 -.011

.680 .156 .258 .671 .156 .228 .268 .186 .681 .156 .228 .070 .287 .246 -.172

.672 .193 .644 .198 .134 .651 .210 .177 -.122

(Constant) ln(Sales) Return over 5 yrs Age

(Constant) ln(Sales) Return over 5 yrs Age YearsFirm

11

ANOVAe Model 1 Regression Residual Total Regression Residual Total Regression Residual Total Regression

Sum of Squares 62.816 80.357

72.883

Residual 70.290 Total a. Predictors: (Constant), ln(Sales) b. Predictors: (Constant), ln(Sales), Return over 5 yrs c. Predictors: (Constant), ln(Sales), Return over 5 yrs, Age d. Predictors: (Constant), ln(Sales), Return over 5 yrs, Age, YearsFirm e. Dependent Variable: ln(Total Comp) 2.5 Determine the value of the coefficient of determination for Model 3 above. (2 points)

Question 3. (10 marks) With agriculture in crisis, the issue of crop insurance has become more important of late. Since it is difficult to verify the crop yield for each farmer, rainfall based insurance has been introduced on a pilot scale in selected Districts. About 100 km from Bangalore, this was introduced in the low and erratic rainfall District of Anantapur. Given the poor soil conditions, only groundnut is grown there. Insurance payments to farmers in a district are based on the rainfall recorded there. The ABC Insurance Company wanted to come up with a model and see how the total production

12

depends on the rainfall. The complication is that it also depends on various factors like the total acreage under irrigation. After some trial regressions, the company analysts settled on a model with the following variables: PROD IRR NON RAIN The total production in thousands of tons Total irrigated area in thousands of hectares Total non-irrigated area in thousands of hectares Total rainfall in millimeters.

The regression output is given below.


Table 3.1 Model Summary Model
b

Change Statistics Degrees Degrees Std. Error of R R Square .801 Adjusted R Square .787 the Estimate 703.6283 R Square Change .801 F Change of of Sig. F Change .000

freedom freedom SSR 3 SSE 44

dim ensi on0

.895 a

a. Predictors: (Constant), RAIN, IRR, NON b. Dependent Variable: PROD

3.1 If stepwise regression was used to arrive at the table above (table 3.1), how many models did SPSS consider? Give reasons. [1 point]

3.2 How many observations were included in the regression?

[1 point]

13

Table 3.2 ANOVA Model Regression Residual Total a. Predictors: (Constant), RAIN, IRR,NON b. Dependent Variable: PROD Sum of Squares 8.762E7 Df

Mean Square

Sig. .000a

3.3 Fill in the blanks in the ANOVA table (table 3.2) above for Sum of Squares,Df, Mean Square and F value. [2 points]

Table 3.3 Coefficientsa Model Unstandardized Coefficients Std. B 1 (Constant) Error Beta t -.847 Sig. .401 Standardized Coefficients 95.0% Confidence Interval for B Lower Bound Upper Bound - 1281.728 3140.948 IRR NON RAIN 2.281 .622 2.112 .192 .159 .670 .809 11.906 .266 .213 3.906 3.152 .000 .000 .003 1.895 .301 .761 2.667 .943 3.462 .979 1.021 .977 1.023 .992 1.008 Tolerance VIF Collinearity Statistics

-929.61 1097.238

a. Dependent Variable: PROD

3.4 Does the regression exhibit any collinearity? Give reasons

[1 point]

3.5 The Normal and Residual Plots are given below. Which assumptions of regression are tested by them, and are they satisfied? [1 points]

14

The residual plot is given below:

15

Selected data from the SPSS is output is given below: Table 3.4 SPSS Output on influential observations for portion of the sample
Prod '000 Ton 2930 3450 4250 3860 4370 4710 5180 4560 4810 4990 5060 5300 6000 4260 4410 5730 4630 5130

Year 195253 195354 195455 195556 195657 195758 195859 195960 196061 196162 196263 196364 196465 196566 196667 196768 196869 196970

ZRE 0.05093 0.28396 1.18262 0.07316 0.15086 0.211 1.1752 0.21501 0.32379 0.45626 0.14592 0.74572 0.81488 1.15514 2.07939 0.40227 0.96035 0.33885

MAH 9.3422 14.4868 5.1603 5.0852 6.0918 1.1081 1.3166 1.4856 1.0865 1.9852 2.9114 1.6166 3.6257 5.6126 4.1774 2.4797 1.0713 0.8750

COO 0.0002 0.0147 0.0604 0.0002 0.0012 0.0005 0.0186 0.0007 0.0013 0.0037 0.0005 0.0086 0.0200 0.0633 0.1496 0.0035 0.0110 0.0012

LEV 0.1988 0.3082 0.1098 0.1082 0.1296 0.0236 0.0280 0.0316 0.0231 0.0422 0.0619 0.0344 0.0771 0.1194 0.0889 0.0528 0.0228 0.0186

DFF 10.0838 -97.9940 125.0308 7.6260 -18.7976 6.8998 42.4644 -8.3727 10.4735 21.6117 -9.2660 30.6734 62.2774 132.5887 180.3088 22.4848 -30.8239 -9.7921

DFB0 30.8270 160.9821 421.8451 22.6753 -17.8381 8.5996 101.3696 1.6727 11.8070 -2.3511 27.0517 -32.8552 198.7153 208.5132 527.7487 -58.1897 83.6995 25.1432

DFB1 0.0019 0.0090 0.0426 0.0022 0.0047 0.0064 0.0372 0.0068 0.0098 0.0147 0.0054 0.0246 0.0290 0.0454 0.0528 0.0102 0.0228 0.0071

DFB2 0.0036 0.0297 0.0409 0.0035 0.0046 0.0000 0.0064 0.0006 0.0003 0.0066 0.0050 0.0122 0.0311 0.0551 0.0634 0.0133 0.0170 0.0056

DFB3 0.0101 0.0715 0.1986 0.0041 0.0307 0.0007 0.0259 0.0116 0.0027 0.0383 0.0038 0.0258 0.0662 0.1539 0.3594 0.0209 0.0143 0.0000

16

3.6 Identify observations that are leveraged and/or influential in the table (table 3.4). Explain clearly. [2 points]

Year 1952-53 in the table has the following values for dependent and independent variables: Year IRR NON RAIN PROD* 195253 57.6 4742.4 351 2894.16551 PROD* is the predicted value of PROD using the model with all observations 3.7 If the regression model was developed without this observation (1952-53) in the sample how much would the predicted value for this observation change? [2 points]

You might also like