Please reload this page in your browser (F5 in my browser) to be sure you're seeing the latest version. ---------------------------------------------------------------------- Notation: "_" indicates a subscript, and "^" indicates a superscript. For example, "x_i^2" means "x sub i, squared". ---------------------------------------------------------------------- 9.4 #9 Here's the book's answer: Any value of MSE satisfying 5.099 < MSE < 6.035 I think this should be: Any value of MSE satisfying 5.099 < MSE < 6.11 ---------------------------------------------------------------------- 9.4 #1 Since there's only one observation per treatment (#replicates K=1), follow the advice on the bottom of the 9.3 (part 2) lecture notes: use SSAB for SSE and MSAB for MSE. (This advice is also on p. 435 of the textbook.) ---------------------------------------------------------------------- 9.3 #5 I suggest a first step of tabulating the data as we did on p. 2 of www.stat.wisc.edu/~jgillett/224/notes/9.3.pdf: | B | 25 | 37 ----------|-------------------------------------------|----------- A NaCl | 138.40 130.89 94.646 96.653 116.90 88.215 | ... | | ----------|-------------------------------------------|----------- Na_2HPO_4 | ... | ... | | ----------|-------------------------------------------|----------- Then find the sample means in this order: -- cells -- rows and columns -- grand Then do the ANOVA table calculations. Here are the data in case you want to do arithmetic in Calc or Excel: Solution Temperature Yield Stress (MPa) NaCl 25 138.40 130.89 94.646 96.653 116.90 88.215 NaCl 37 92.312 147.28 116.48 88.802 114.37 90.737 Na_2HPO_4 25 120.18 129.43 139.76 132.75 137.23 121.73 Na_2HPO_4 37 123.50 128.94 102.86 99.941 161.68 136.44 ---------------------------------------------------------------------- 8.3 #18 Feel free to make a spreadsheet like 8.3.xls, which is posted in the 8.3 line of the syllabus, to ease the calculations. Remember that to use "=LINEST(...)", you need to press "Ctrl-Shift-Enter" (Macintosh: "Command-Enter") (not just "Enter) after entering the formula so that it will be treated as an array formula. [ I want you to use OpenOffice's Calc instead of Excel to avoid confusion, but here's a tip for Excel users. In Excel (but not in Calc), you need to pre-select space for the entire array output. This is 5 rows and (p+1) columns, where p is the number of x variables: one column each for beta_0 and beta_1 through beta_p.) Then click in the formula bar above the spreadsheet and enter the =LINEST() formula. Then type "Ctrl-Shift-Enter" (Macintosh: "Command-Enter"). ] Here are the data for #18, all columns at once: y x_1 x_2 x_3 730 152 198 91 760 173 201 81 850 166 202 69 840 161 202 72 720 152 198 91 730 153 205 91 840 166 204 70 730 157 204 90 650 136 172 47 850 142 218 59 740 151 207 88 720 145 209 60 710 147 190 63 Here they are again, one column at a time: y 730 760 850 840 720 730 840 730 650 850 740 720 710 x_1 152 173 166 161 152 153 166 157 136 142 151 145 147 x_2 198 201 202 202 198 205 204 204 172 218 207 209 190 x_3 91 81 69 72 91 91 70 90 47 59 88 60 63 ---------------------------------------------------------------------- 8.3 #11 (b) The book's answer, ".3521", is a typo for ".3512". (e) The "MS" column refers to "mean square", which is the "SS" ("sum of squares") column divided by the "DF" ("degrees of freedom") column. ---------------------------------------------------------------------- 8.2 #3 There's a typo in the answer to (d): "The model ln(y) = ..." should be "The model y = ..." Here are the data as two columns: t y 0.0 1.000 0.1 0.999 0.2 0.998 0.3 0.997 0.4 0.996 0.5 0.995 0.6 0.993 0.7 0.992 0.8 0.990 0.9 0.988 1.0 0.987 1.1 0.985 1.2 0.983 1.3 0.981 1.4 0.979 Here are the data set two separate columns: t 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 y 1.000 0.999 0.998 0.997 0.996 0.995 0.993 0.992 0.990 0.988 0.987 0.985 0.983 0.981 0.979 ---------------------------------------------------------------------- 8.1 #9 Feel free to make a spreadsheet like 8.1.6.xls, which is posted in the 8.1 line of the syllabus, to ease the calculations. Its key formula is "=LINEST(B2:B11;A2:A11;1;1)", which I entered in cell E2. Then I pressed "Ctrl-Shift-Enter" (Macintosh: "Command-Enter") (not just "Enter) so that it would be treated as an array formula. Note: the Ctrl-Shift-Enter seems sometimes to fail to give array output; that is, the output is sometimes only a single cell, which isn't enough. You can instead use the menu choice "Insert > Function ..." which brings up a dialog box. Check the "Array" box in the lower left corner. Then in the "Formula" text box, type =LINEST( and then use the mouse to select the ranges of cells you want to use as input. Click "OK". The function LINEST(yvalues; xvalues; allow_const; stats) is documented at http://wiki.services.openoffice.org/wiki/Documentation/How_Tos/Calc:_LINEST_function You can read more at http://wiki.services.openoffice.org/wiki/Documentation/How_Tos/Using_Arrays and http://wiki.services.openoffice.org/wiki/Documentation/How_Tos/Calc:_Statistical_functions Here is the data set three times, to accomodate your favorite copy-and-paste method: first, as it appears in the book; second, as two parallel columns; and third, as two separate columns. (1) Here's how the data set looks in the book: Temp Evap Temp Evap Temp Evap Temp Evap 11.8 2.4 11.8 3.8 18.6 3.5 14.0 1.1 21.5 4.4 24.2 5.0 25.4 5.5 13.6 3.5 16.5 5.0 15.8 2.6 22.1 4.8 25.4 5.1 23.6 4.1 26.8 8.0 25.4 4.8 17.7 2.0 19.1 6.0 24.8 5.4 22.6 3.2 24.7 5.7 21.6 5.9 26.2 4.2 24.4 5.1 24.3 4.7 31.0 4.8 14.2 4.4 15.8 3.3 25.8 5.8 18.9 3.0 14.1 2.2 22.3 4.9 28.3 5.8 24.2 7.1 30.3 5.7 23.2 7.4 29.8 7.8 19.1 1.6 15.2 1.2 19.7 3.3 26.5 5.1 (2) Here's the data set as two parallel columns: Temp Evap 11.8 2.4 21.5 4.4 16.5 5.0 23.6 4.1 19.1 6.0 21.6 5.9 31.0 4.8 18.9 3.0 24.2 7.1 19.1 1.6 11.8 3.8 24.2 5.0 15.8 2.6 26.8 8.0 24.8 5.4 26.2 4.2 14.2 4.4 14.1 2.2 30.3 5.7 15.2 1.2 18.6 3.5 25.4 5.5 22.1 4.8 25.4 4.8 22.6 3.2 24.4 5.1 15.8 3.3 22.3 4.9 23.2 7.4 19.7 3.3 14.0 1.1 13.6 3.5 25.4 5.1 17.7 2.0 24.7 5.7 24.3 4.7 25.8 5.8 28.3 5.8 29.8 7.8 26.5 5.1 (3) Here's the data set as two separate columns: Temp 11.8 21.5 16.5 23.6 19.1 21.6 31.0 18.9 24.2 19.1 11.8 24.2 15.8 26.8 24.8 26.2 14.2 14.1 30.3 15.2 18.6 25.4 22.1 25.4 22.6 24.4 15.8 22.3 23.2 19.7 14.0 13.6 25.4 17.7 24.7 24.3 25.8 28.3 29.8 26.5 Evap 2.4 4.4 5.0 4.1 6.0 5.9 4.8 3.0 7.1 1.6 3.8 5.0 2.6 8.0 5.4 4.2 4.4 2.2 5.7 1.2 3.5 5.5 4.8 4.8 3.2 5.1 3.3 4.9 7.4 3.3 1.1 3.5 5.1 2.0 5.7 4.7 5.8 5.8 7.8 5.1 You don't have to use the data directly if you prefer to use these summary statistics I computed (x=Temperature, y=Evaporation): n = 40 mean(x)=21.507 mean(y)=4.480 std(x)=5.244 std(y)=1.699 r=.690 But I recommend using the data directly. ---------------------------------------------------------------------- 8.1 #1 Find s_x from n and the first sum. Find s_y from n and the second sum. Find r from n, s_x, s_y, and the third sum. Then use these numbers, and the formulas in the notes or textbook, to answer the questions. ---------------------------------------------------------------------- 7.4 #14 Here are the data (for pasting into Calc): Automobile Front Rear 1 32.8 41.2 2 26.6 35.2 3 35.6 46.1 4 36.4 46.0 5 29.2 39.9 6 40.9 51.7 7 40.9 51.6 8 34.8 46.1 9 36.6 47.3 ---------------------------------------------------------------------- 7.4 #7,8 In #7, there are 18 cars. Each car goes 1000 miles with one tread type (A or B), and then another 1000 miles with the other tread type. (Presumably with the order of treads is chosen randomly for each car.) In #8, there are 36 cars. Here are the data for #7 (for pasting into Calc): Car Tread A Tread B 1 24.1 20.3 2 22.3 19.7 3 24.5 22.5 4 26.1 23.2 5 22.6 20.4 6 23.3 23.5 7 22.4 21.9 8 19.9 18.6 9 27.1 25.8 10 23.5 21.4 11 25.4 20.6 12 24.9 23.4 13 23.7 20.3 14 23.9 22.5 15 24.6 23.5 16 26.4 24.5 17 21.5 22.4 18 24.6 24.9 ---------------------------------------------------------------------- 6.7 #8: (a) The question is, "what's the power, given that the level is alpha=.05, n=80, and the true mean is 1.6?" One model for the solution is p. 255 Example 6.12. But note that here we have H_0: mu = (mu_0 = 1.5) H_1: mu > 1.5 For the sake of power calculations, we use mu_1 = 1.6. This means that a drawing would have the H_0 solid curve left of the H_1 dotted curve; and we would reject H_0 for bar{X} > x_{critical}, so we care about right tail probabilities. (b) The question is, "what n is required for power=.99 and level=.05, given that the true mean is 1.6?" Two models for the solution are: - the last example in the 6.6and7.pdf lecture notes - p. 256 Example 6.13 ---------------------------------------------------------------------- 6.5 #2 (and others): feel free to make a spreadsheet like 6.5.5.xls, which is posted in the 6.5 line of the syllabus, to ease the calculation of X^2. ---------------------------------------------------------------------- 6.4 #13a Finding "StDev" requires using "SE Mean" in the next column. "SE mean" refers to the "standard error of the mean". The "standard error" of a statistic is its standard deviation, estimated from a sample. So "SE mean" is the standard deviation of bar{X}, estimated from a sample: s / sqrt(n). It's easy to solve this for s (which is "StDev"). ---------------------------------------------------------------------- 6.1 #1 (and others) The book uses null hypotheses of the form H_0: mu <= mu_0, H_0: mu = mu_0, and H_0: mu >= mu_0 I prefer to use only the second form (H_0: mu = mu_0), because it's what both the book and I use to do a probability calculation to evaluate H_0. Therefore I recommend that you understand any of the book's uses of the first or third form as a use of the second form. In particular, for 6.1 #1, I understand the book's "H_0: mu >= 5.4" as "H_0: mu = 5.4". ---------------------------------------------------------------------- 5.3 #1 To match the book's answers for parts b and c, use the "plus-four" interval defined on p. 190 (and in the 5.3 lecture notes). Parts d and e ask for a sample size. The formulas in the book (in the last paragraph on p. 191 and in the box on p. 192) have typos: the author solved for "tilde{n} = ...", but wrote "n = ...". Since tilde{n} = n+4, subtract 4 to get correct formulas for n. (The formula is correct in the lecture notes.) ---------------------------------------------------------------------- 4.7 #5, 6 Regarding constructing a probability plot, here's how to do it with OpenOffice Calc. Here's how to make a plot for #2, which is an example in the 4.7 lecture notes. Copy the data (below) and paste in column A. Use the menu Data > Sort if they're not already sorted. These are {x_i}. Put "1" (no quotation marks) in cell D1 and "2" in D2. Select D1 and D2 and drag the square in its bottom right corner to enclose all of D1-D50. This puts "1 2 3 ... 50" in column D. These are indices {i}. Put "=(D1-0.5)/50" in C1. Copy this formula to C2:C50 by dragging the little square from C1 down through C50. These are {c_i}. Put "=NORMINV(C1; AVERAGE(A$1:A$50); STDEV(A$1:A$50))" in B1. Copy this formula to B2:B50. These are {y_i}. (If X ~ N(mu, sigma^2), then x=NORMINV(p, mu, sigma) sets x so that P(X < x) = p. Note that this step combines the two steps at the top of p. 2 of the 4.7 lecture notes, which calculated z_i and then y_i. Here we calculate y_i directly, because the NORMINV() function gives probabilities for any N(mu, sigma^2), not just for N(0, 1).) Select columns A and B by clicking on A and dragging to B. Select the menu Insert > Chart ... 1. Chart Type: choose "XY (Scatter)" and click Next 2. Data Range: (it should already be "$Sheet1.$A$1:$B$50") click Next 3. Data Series: click Next 4. Chart Elements: Choose appropriate titles and labels, click Finish Right-click on one of the data points on the chart and choose "Insert trend line ..." and select "Linear" regression. Check "Show equation". Click OK. Here are the data for #2 (from lecture 4.7): 11.6 12.6 12.7 12.8 13.1 13.3 13.6 13.7 13.8 14.1 14.3 14.3 14.6 14.8 15.1 15.2 15.6 15.6 15.7 15.8 15.8 15.9 15.9 16.1 16.2 16.2 16.3 16.4 16.5 16.5 16.5 16.6 17.0 17.1 17.3 17.3 17.4 17.4 17.4 17.6 17.7 18.1 18.3 18.3 18.3 18.5 18.5 18.8 19.2 20.3 Here are data for #5 and #6 (from Table 1.2, p. 21): 7.59 6.28 6.07 5.23 5.54 3.46 2.44 3.01 13.63 13.02 23.38 9.24 3.22 2.06 4.04 17.11 12.26 19.91 8.50 7.81 7.18 6.95 18.64 7.10 6.04 5.66 8.86 4.40 3.57 4.35 3.84 2.37 3.81 5.32 5.84 2.89 4.68 1.85 9.14 8.67 9.52 2.68 10.14 9.20 7.31 2.09 6.32 6.53 6.32 2.01 5.91 5.60 5.61 1.50 6.46 5.29 5.64 2.07 1.11 3.32 1.83 7.56 For #6, if you have data {x_i} in column A, you fill column B with {ln(x_i}} by putting "=LN(A1)" (no quotes) in cell B1. Copy this formula to all of column B. Then make a plot as if your data are in column B. I've not looked up how to put two plots in the same spreadsheet. For now, you can save your first spreadsheet with the name "4.7.5". Then use File > Save As ... to save it a second time as "4.7.6", and then put the second plot (only) in this second file. ---------------------------------------------------------------------- 3.3 #9 Integration by parts works for (a). Start with mu_X = E(X) = INTEGRAL_0^infinity t * .1e^{-.1t} dt Let u = t and dv = .1 e^{-.1t}dt => du = dt and v = - e^{-.1t} So mu_X = INTEGRAL u dv = uv - INTEGRAL v du = [-t e^{-.1t} |_0^infinity ] - INTEGRAL -e^{-.1t} dt = ... For (b), it's probably easiest to start with the alternate form for the variance, the book's equation 3.21 (also on the top of page 3 in the handout). So we need sigma_X = sqrt(sigma_X^2) = sqrt( [INTEGRAL t^2 .1 e^{-.1t} dt] - mu_X^2 ) Integration by parts works again, this time letting u = t^2 and dv = .1e^{-.1t}. ---------------------------------------------------------------------- 1.2 #14 (c) This problem is easier if you consider the second version of the standard deviation formula (p. 13 equation 1.5): s = sqrt{ [1/(n-1)] [ sum X_i^2 - n Xbar^2 ] } Let C be the sum of the first nine terms in the sum: C = sum_{i=1}^{9} X_i^2 Then, writing the sum as the first nine terms plus the 10th, we have s = sqrt{ [1/(n-1)] [ C + X_{10}^2 - n Xbar^2 ] } Use the original values of s, X_{10} = 100,000, and Xbar to solve for C. Then use the new values of X_{10} = 1,000,000 and Xbar, along with your value for C, to get the new s.