STATISTICS/ECON 206

COURSE INFORMATION
POLICIES   OUTLINE & CALENDER   FAQ

ASSIGNMENTS
(1), (2), (3) / EXAMPLES & TIPS  /  MINI-ASSIGNMENTS  (ungraded)

MINI-ASSIGNMENTS (see course outline)

Grades (Final Grades Now Posted)

TESTS 
Spring 2005 Test 1
  / Spring 2005 Test 2
Fall 2004 Test 1Fall 2004 Test 2/ Spring 2004 Final

RELATED WEBSITES
Java Applets/ Coin Flipping Page / American Statistical Association/ Journal of Statistics Education/

Back to My Homepage
 
 

Dr. Brian Goff/414 Grise Hall
Phone (270)745-3855/brian.goff@wku.edu
Last Modified: January 4, 2005
Western Kentucky University


COURSE POLICIES

CONTACT INFORMATION
Dr. Brian Goff Grise 414/745-3855
Email: brian.goff@wku.edu / Office Hours for Spring 2005: T 10-11:30; WTh  8:30-10:30; MTW 2-4 
(Appointments & drop-ins welcome at other times)

RESOURCES
Hyperstats  (online textbook by David Lane)  Go to  Atomic Dog Publishing and use Course ID (1516323605010)
to register & purchase.  (Online access = $24.95; Online + paperback = $45.00).  Excel (available on WKUnet)  & SPSS
Website: www.wku.edu/~goffbl/e206.htm

OBJECTIVES FOR STUDENTS
To gain a basic and practical understanding of how to collect and analyze data with emphasis on business applications

GRADING POLICIES
Tests =              67% (3 -- including the final exam)
Assignments =   33% (3)  *****Assignment Grade Capped at 20% above Test Average*******
Mini-Assignments (Highly recommended but not graded)
Total = 100% (Also see classroom policies)
A>=90.0%; B=80.0-89.9%; C=70.0-79.9%; D=60.0-69.9%; F below 60.0%

Tests: You will need a scantron (in good condition), a pencil, and a calculator for each exam. NO early or makeup tests.. If a test is missed due to activities offically sponsored by WKU, an illness, or other special circumstances, I will increase the weight on the final exam to compensate for the missed exam. Discuss these special situations with me in advance if possible. Missing a test without a sufficient reason will result in a zero for that test. If weather (etc.) postpones a test or assignment deadline date, the test or deadline will move to the next class meeting.  Material on tests will reflect class lecutres, mini-assignments, and assignments.  Past tests are provided as aids but not as complete guides to current tests.

Assignments: We will have (3) out-of-class assignments involving application of methods and use of computers. To receive full credit full, work must adhere precisely to the instructions and must be turned in by the stated deadline. Assignments may be turned in early. Assignment deadlines are 5 minutes after class begins on the stated date. A late assignment will have 10% deducted for each business day that it is  late (1 day begins immediately after I collect the assignments).  The computer output will count for 20%, and the written answers will count for 80% of the grade.  After a preliminary grade is established, I will review the report for writing and appearance with 0%-10% deducted from the preliminary grade.   ******Semester assignment averages are capped at 20% above your semester test average ***********

Mini-Assignments:  These are highly recommended but ungraded short assignments, usually done using the computer (Excel, SPSS).    The practice gained on mini-assignments will be useful both on graded assignments and on tests.

MISCELLANEOUS & CLASSROOM POLICIES
Last day to drop with a "W" or change to audit is listed in Course Bulletin. If you have an ADA covered disability requiring special consideration, please register with the ADA Compliance Office, and then see me.
Classroom Policies: Orderly behavior and respect for others who are speaking (including me) is expected. No food or drink permitted. If late, enter with a minimum of disturbance and be seated in the nearest seat. Behavior that is inappropriate or distracting to other students or myself is not permitted. Individuals involved in incidents that significantly violate these policies will receive a warning and then will be notified of a letter grade reduction per subsequent incident.



COURSE OUTLINE & CALENDER(Check back for updates)
(Note:  HS refers to Hyperstats by David Lane)

Weeks 1-2: Learning to Measure Reality
Data & variables (HS 1-0, 1-3, 1-4, 1-5 + Excel/SPSS on transforming variables -- add notes to HS)
Parameters & statistics; Notation and terminology (HS 1-0, 1-1, 1-2)
Data collection methods: sampling methods, surveys, resampling and more (Sampling Methods incl. subsections)
Data entry and manipulation with Excel & SPSS (Mini-Assignment 1)

Weeks 3-4:  Describing Charactistics of a Single Variable
Graphical descriptions of data ( HS 2-0, 2-1, 2-6b, 2-6c, 2-6d + Mini-Assignment 2  Histogram Applet)
Numerical descriptions of data center (HS 2-2 all subsections except 2-2d)
Numerical measures of variability  (HS 2-3, 2-4)
Numerical measures of skew & kurtosis (HS 2-5  Descriptive Statistics Applet)
Some Applications:  Statistical Process Control (Supplement 1)
Assignment 1  Deadline Wednesday, February 2 (beginning of class)

Week 5:  Test 1, Wednesday, February 9

Weeks 6-8: Measuring Relationships between Variables
Scatterplots (HS 3-0, 3-1)
Correlation analysis (HS 3-2, 3-3, 3-4 + Correlation Applet)
Regression analysis (HS 15-0, 15-2 + Regression Applet) Mini-Assignment 3
Contingency tables & qualitative variables (HS 16-3a, 16-3b) Mini-Assignment 4

Week 8-10: Probability Primer
Basic probability concepts (HS 4-0, 4-1, 4-3, 4-4, 4-5 +Intro to ProbabilityBasic Axioms)
Conditional probabilities (HS 4-2 + Let's Make a Deal PPLet's Make a Deal Applet)
Probability distributions (See Wikipedia and particular PDs)
    Normal, Binomial, t-distribution (4-6, 5-0, 5-2 + Excel Mini-Assignment) Binomial Applet;  Normal Distribution Applet
Expected Value (Definition-- first 3 paragraphs)
Mini-Assignment 5
Law of Large Numbers (the tendency for the observed frequency of an outcome to approach its expected value as the number                                         of trials of the experiment increases -- SeeCoin Flipping Applet)
Assignment 2  Deadline Thursday, Wednesday, March 9 (beginning of class)

Week 11: Test 2, Wednesday, March 16

Week 12:  Spring Break, Week of March 20

Weeks 13-15: Evaluating Sampling Errors & Propositions
Parameters & Statistics one more time (HS 1-0, 1-1, 1-2)
Sampling Distributions & Central Limit Theorem (HS 6-0,6-1, 6-2, 6-4 + Sampling Distriubtion Applet)
Estimating sampling error -- standard errors (HS 6-3)
Mini-Assignment 6
Confidence intervals in brief (HS 7-0, 7-1, 8-0, 8-1, 8-2, 8-10)
Evaluating hypotheses with p-values (probability values) (HS 9-0, 9-1, 9-2, 9-3, 9-4)

Weeks 15-17:  Additional Business Applications  +  Statistical "Literacy"
A few more examples of uses of statistics in business
Geometric mean (HS 2-2a);
Regression applications -- Stock market model (in-class); Cost analysis (in-class); Demand analysis (in-class);
Defining probability problems correctly (Birthday Problem Applet )
Translating conceptual ideas to operational measures & pitfalls of not doing  (in-class)
Evaluating Forecasters, Palm Readers, & Such (in-class); Looking out for Junk Science American Council on Science & Health
Assignment 3  Deadline Wednesday, April 20 (beginning of class)

Week 17: Review & Prep for Final Exam

Week 18:  Final Exam -- 8 AM, Thursday, May 5 


ECON 206 -- FAQ
Q: How can I prepare for the tests. How can I do better in this class?
A: Come to class and pay attention. Use the book to clarify topics.  Practice answering past test questions as we cover that material. Do the mini-assignments.  Ask questions of me (especially Friday question sessions).
Q: I had to work late last night, the computer system was down this morning, ... Will you take my assignment late?
A: Yes.  The 10% per day penalty will still apply unless a signficant problem occurs well in advance of the deadline.  Assignments are made well in advance of deadlines.  You may turn them in early, so last-minute problems will make a difference only in the most signficant situations (hospitalization, car accidents, ...)
Q: I will miss a test because of forensics, swimming, golf, .... When can I make-up it up?  May I take it early?
A: No make-ups are given but see me. Normally, I will weight your final exam to compensate.  See me in advance.
Q: I can't understand question x on the computer assignment. Can you help me?
A: Graded assignments are for you to show me what you know and can do.  I will explain ambiguities in words and phrases, but you must use your book, notes, and brain to determine how to complete the assignment. You may also seek assistance from other students (and, of course, your teammate if you have one), but you may not copy the data,  output, or answers of others.  I provide extensive assistance on the mini-assignments.  These mini-assignments help build some of the skills needed on the graded assignments.
Q: I forgot to staple or paper clip the pages of my assignment, I did not understand part of the assignment, ... will you count off for this?
A: Yes.
Q: I'm doing poorly on tests/assignments. Can I do work for extra credit?
A: No. Grades will be determined by the policies stated above.  There is a 1%-2% upward adjustment available.
Q: Do the past tests that are available online cover all material on current tests?
A:  No.  They are intended as one tool to use in preparing for current tests, but the tests change each semester with some material excluded, some included, and some changed a bit.  Warning:  occassionally a past test question is incorrect.
Q:  What do you mean by "behavior that is inappropriate or distracting"?
A:  This includes but is not necessarily limited to profanity, personal conversations, note passing, repeated cell phone interruptions, and other sorts of rude or disruptive activities.
Q:  My final grade is an 89.1, isn't that close enough to an A?  I need it to keep my scholarship.
A:  An A is 90.0 and above, a B is 80.0-89.9 and so on.  I will be glad to correct any error that I make in computing grades, but grades are not negotiable.  Achieving a certain grade for scholarships or other reasons is the student's responsibility.


ASSIGNMENT TIPS

One of the objectives of the assignments is to develop and encourage skills in the clear, accurate reporting of data-oriented reports.  The writing in and appearance of your reports matters.  Here are some tips:
 
Questions:

3a. Briefly describe the data set used in the assignment and measurement of specific variables.
b. Based on your output, what would "typical" kilowatts used per day be? Is the data set symmetric and what are outlying values?
c. Do winter months have higher gas usage than other months? Conduct test where the null hypothesis is that winter month gas usage is the same as other months.

PRETTY GOOD ANSWERS
3. a. The data set consisted of 5 variables related to the monthly electric and gas usage of residential utility customers from July 1990 to June 1998.  The data was obtained from an SPSS data file on WKUnet (m:spss/utility). The variables included were a month identifier, average kilowatt hours used per day, gas thermal units used, average daily temperature, and number of days in the month.

b. Based on the histogram and descriptive statistics, typical kilowatt hours used per day ranged from about 30 to 40 per month. The mean was about 34 and the median was 37 with a standard deviation of 3. The data were skewed to the right with a few large outliers well beyond 50 kw hours per day.

c. The means for winter month gas usage (December, January, February) were about 30 percent higher than for other months. Based on a low p-value (0.005), a hypothesis that winter months and other months have the same mean could be rejected with strong confidence.

POOR ANSWERS
3. a. It had some variables about utility customers.
b. 34. A few big outliers.  s.d. was 3.
c. winter had higher   don't know what p-value means

Why are these poor answers?  Information provided is far too little.  No detail or sources about the variables used are provided in (a). Incomplete sentences used in (b) and (c).  Pronoun (it) used without antecedent in (a).   Abbreviation used for standard deviation in (b) without first defining it somewhere.  Improper capitilization and punctuation in (c).  Use of contractions such as "don't" in (c) is not appropriate in formal reports.



READING SUPPLEMENT 1 -- Statistical Process Control (SPC)

Empirical Rule: a summary of variation in outcomes for data that have roughly a bell-shaped (normal) shape and a means to indentify outliers. The Empirical Rule states: i) about 68 percent of the data will be between +/ 1 standard deviation from the mean, ii) about 95% of the data will be between +/- 2 standard deviations from the mean, and iii) about 99.9% of the data will be between +/- 3 standard deviations from the mean.

Chebeshev's Rule: a summary of variation in outcomes for data regardless of their shape and a means to identify outliers. The rule states: i) about 75% of of the data will be between +/- 2 standard deviations from the mean, ii) about 89% of the data will be between +/- 3 standard deviations from the mean, and iii) about 94% of the data will be between +/-4 standard deviations from the mean.

Standardize Units (z-units, z-values, z-scores): the name given to variable (X) which has been converted so that the mean is zero (0) and the standard deviation is one (1). This conversion is done by the formula Zi = (Xi - Mean)/Std. Deviation, where i refers to each individual item in the data set. This conversion eliminates whatever units were used to measure X, and it allows each data point to be easily evaluated in terms of how much it differs from the mean.

Statistical Process Control (SPC): The phrase and acronym applied to systematic methods of analyzing repetitive production processes using charts that track variation of the process, usually with the goal of monitoring and improving quality. Chapter 18 provides more details.

Process: The forces (inputs) working together to generate the outcomes of a variable; in manufacturing or provision of services these include equipment, tools, materials, people, and "environmental" influences such as weather or other events that determine the characteristics of a good or service being produced; the same kinds of forces generate outcomes personal settings also.

Process Variation: changes in the process that lead to quantitative or qualitative differences in the characteristics of a part, a good, or service being produced

Common Causes of (Process) Variation: sources of variation which are inherent (built-in) to the process design as it is currently configured; usual or normal process variation; sources of variation which have the potential to influence all process observations; variation only eliminated through redesign or improvement of design of the process; random variation

Assignable Causes of (Process) Variation: special or specific sources of variation which are not built into the design of the process; variation which does not influence all observations; variation which can be eliminated without altering the basic design

Control Charts: Graphs which record sample measurements of a process -- usually a repetitive processs. X-Bar Charts are the simplest and monitor variation in sample means of the process. The chart is used if the variation in samples is likely due to assignable variation (high variability or patterns) or merely to common (expected) variation.



ASSIGNMENT 1
Objective: Practice in collecting and assessing key facts about univariate data
(You may work alone or in 2-person teams.  If you work as a team, turn in (1) report with both names on the front.
You or your team may discuss the assignment with others but you may not copy answers or use the same data.)

1. Select a variable that you would like to investigate.  It might be an item for sale such as gasoline prices at different locations or the price for an article of clothing from different online sources. If you are interested in sports you might use home run data or player point values in fantasy football.  Whatever the variable, it needs to be a standardized item (89 octane gas, Levi's 560 jeans, batting average, ...).  Collect at least 30 observations for the variable.  Record the value for each observation as well as an identifier for it.  (See below).  You may collect the data from online sources or you may use locations around town, but you may not use personal surveys.  You must document the source(s) of your data.  If it is an online source, provide the URL.   

2. i) Enter the values for the variable into SPSS along with another variable to identify the source of the price.  Remember to use labels for each variable.  For example, your data spreadsheet might look like

ID              Price   (ID and Price are the names of your variables; the data are entered in the columns below)
SonyCC      500
JVCBB        450
....                ....

ii) Create a statistical summary of your variable along with a histogram and boxplot.
ii) Edit your output.   Place the title, TABLE 1: STATISTICS ON (whatever your variable describes) at the top of your statistics table.   Place the title FIGURE 1:  HISTOGRAM ON (your variable) at the top of your histogram.  Plece the title FIGURE 2:  BOXPLOT ON (your variable) at the top of your boxplot.
iii)  Print both your output and the data spreadsheet..

3. On a separate sheet of paper, print or type answers to these questions in complete sentences. Explain your answers by making specific use of the pertinent statistics and graphs that you generate. 
i) Describe the variable that you collected including the units of measure, your method of sampling, and the source (you do not have to provide a detailed list of multiple sources -- just summarize).
ii) Using statistics that describle the center and variability of your data, explain what would be typical and unusual prices.
iii) Using the skewness and kurtosis statistics as well as the histogram, explain other characteristics of the your price data.
iv)  Describe the information provided by the boxplot.
v) Compute the standardized values for the first two observations in your data set.  Show your calculations.
 

Deadline = Wednesday, February 2, (beginning of class)
Remember: Place answers on a separate sheet at the back.  Staple or clip all sheets together.  Your output should include the specified results -- no more and no less with answers typed or printed very neatly.  Make sure your name is clearly printed or typed on a cover sheet.


ASSIGNMENT 2
Objective: Generating & reporting a regression analysis (you may work alone or in a 2-person team;  teams turn in one report with both names on the front)

1. Collect data on a two quantitative variables that you think would be related to each other in some way.  You should have, at least, 30 observations for each variable.  These variables must be from a documented source (no personal surveys).
Enter values for the two variables that you have collected into Excel, making sure to provide names (labels) for each variable.
2. i) Create a linear regression analysis that includes the regression output tables (remember to check "labels"), the table of predicted and residual values (check "residual values"), and the graph showing the fitted regression line along with the actual scatterplot (check "line-fit plot")..  (Refer to Mini-assignment #3 or Excel Help for assistance).  Move the graph so that it appears below the predicted/residual table (just click on the graph and keep holding down the mouse button as you move the mouse).
ii)  Edit your output in the following way.   Make sure to format the results so they fit within the columns properly.  Provide a Title for the graph such as FIGURE 1: Scatterplot and Regression Line for (your variables).  Change the regression table  title to  TABLE 1: Regression Results for (your variables).
iii)  Print your output. (Before printing, change the Page Setup under "File" to "Landscape")
3.  Answer the following questions on a separate sheet attached to the back of your output.
i)  Describe the objective of your study, the variables in your data set (what they are; what units are used), and specifically document their source.
ii) Neatly write out the regression results in equation form.
iii)  Explain the meaning of the coefficients in this equation and describe, in numerical terms, how well this equation accounts for different values of your dependent variable.
iv) Show how the predicted and residual values are computed for the observation in your data set (use the actual numbers).  Then, briefly explain the meaning of these computations.
v)  Explain the information presented on the graph (your Figure 1).

Deadline = Wednesday, March 9 (beginning of class;  remember to use a staple or clip;  print neatly or type your answers.  Where responses are in words, use complete sentences.)


ASSIGNMENT 3
Objective: Evaluating sampling error and  testing for statistically significant differences (hypotheses testing).

1. Develop a proposition between two variables (measured in identical units) that you can test with data. The claim must be stated as an equality or inequality between the means or proportions of the two variables. These variables must be obtained from an observable source such as a local store, online, SPSS files, ... .  Do not use surveys of people (students, friends, ...) Collect at least 20 observations for each variable.

Example: I could collect 20 fiction book prices and 20 non-fiction book prices from Amazon.com. and test the proposition:          Average Price (in $) of Fiction Books = Average Price (in $) of Non-fiction Books;
(Note: This is the same thing as Avg. Price Fiction - Avg. Price Non-fiction = 0; The test must be between two variables that are measured the same way such as in $, miles, gallons, lbs., ....) Examples of other variables would be financial variables on companies, points or performance measures for sports teams or players, prices on products, ...

2. i) Enter your variables into two columns (with variable labels ata the top) in an Excel spreadsheet.
ii) Obtain descriptive statistics for each of your variables using Tools/Data Analysis/Descriptive Statistics. Select "Summary Statistics," "Confidence Level for the Mean," and "Labels" and click "OK." In the output worksheet, make sure to use Format/Column/Autofit to adjust column widths and Format/Cells to adjust the number of decimal places to four or fewer.
iii) Select Tools/Data Analysis/t-test: two-sample assuming unequal variances and put your two variables in the input ranges, select "Labels," and select a value to use "hypothesized mean difference." (You select this value to fit the proposition that you are testing.  You should choose a value that makes the test "interesting."  For example, selecting 0 for the hypothesized difference between 93 octane and 89 octane gas is so obviously wrong to be of little interest.  A value of 5 cents might be better.)
Click "OK."
iv) In the output worksheet, use Format/Columns/Autofit to make the column width fit your table, and then print this output.  Also print your data sheet.

3. Print or type answers to the following questions in complete sentences on a separate sheet attached to your output with a paper clip or staple:
i) Explain the objective of your study.  Describe your variables, their units and the source(s) from which you obtained them.
ii)  Clearly write out the proposition that you are testing (null hypothesis) as an equation.  Briefly explain your sampling method.  What might be some possible sources of non-sampling error.
iii) Using the descriptive statistics that you generated in 2(ii), briefly explain some of the important characteristics of your two variables, including the estimates of the sampling errors for the sample means (sample proportions).
iv) Write out an equation showing how the (approximate) 95% confidence intervals are computed (use the actual numbers from your output -- your numbers may vary slightly from those reported by Excel because of rounding).  Write out an equation showing how the (aproximate) 99% confidence intervals are computed.   (Again, use actual figures from your output).
v)  What is the actual difference in the sample means (or proportions) for your two variables?  Is this difference large enough for to be considered (statistically) significantly different from the proposition that you are testing? (Explain -- be specific!).  If the means are not significantly different, how might your study be redesigned to increase the likelihood of finding a signficant difference?

Deadline = Wednesday, April 20  (beginning of class)


MINI-ASSIGNMENTS

Mini-Assignment 1

Objective: To gain familiarity generating and interpreting univariate statistical measures & graphics using SPSS
1.  Access SPSS  (For help, see SPSS Help)
2.  Retrieve the file Employee data.sav (m:/spss10/employee data.sav  -- this should be the default directory of SPSS).  It provides data on 475 employees of a particular company.  If you have trouble finding the data, just  employeedata.xls and open into SPSS -- remember to tell SPSS that it is an Excel file.
(Note:  you can see descriptions of the variables by pointing at the variable labels in the top row or by clicking on "Variable View" at the bottom of the spreadsheet).
2.  Use Transform>Compute to create a new variable (Totexp) as is the sum of Jobtime and Prevexp.
3.  Generate a histogram, boxplot, and stem-and-leaf plot for Current Salary (Salary) and Total Experience (Totexp) by clicking Analyze>Descriptive Statistics> Explore.  Place Salary and Totexp in the "Dependent List" box.  Under the "Display" selections, choose "Plots".  Next, click on the "Plots" button (between Statistics and Options) and select "Histogram" along with Boxplot and stem-and-leaf which should already by selected.
4.  In the output window, change the title from  Explore to  Graphs for Salary and Experience and print your output.
(If you have trouble, come by my office).
5.  Answer the following questions:
    a.  What are the characteristics of the histograms for salary and experience (center, spread, skewed which way)
    b.  For salary, what do the stem and leaf numbers under "frequency," "stem," and "leaf" mean?
    d.  For experience, what does the red box in the boxplot indicate?  What does the black lines above and below the red box mean?  What are the values outside of the black lines?
 


Mini-Assignment 2
Objective: To gain practice manipulating data and generating descriptive statistics in Excel

1.  Access Excel.  (Either on your own PC or by logging on to WKUnet and double-clicking the Excel icon on the standard student network desktop.)
2.  Retrieve the file by clicking on this link: s&p1980a.xls   The file contains monthly values for the Standard & Poor’s 500 Index from 1980 through March 2002.  ****** Note that if you can open s&p1980a.xls into Excel by clicking on the link.  If the data analysis features of Excel will not work.  Save the file to a disk, close Excel, reopen Excel from the desktop icon, and then open the file into Excel. ************
3.  Create a new variable, labeled PCT, that will be percentage changes in the monthly S&P 500 Index.
(To do this, type PCT in Cell D1.  Click the cursor on Cell D3 and type the following formula to compute the percent change in the S&P 500 Index -- include the = sign and parentheses
 = ((c3 - c2)/c2)*100
Now, hit enter.  Go back and highlight Cell D3 again, right click the mouse, and click "Copy".  Click on Cell D4 but keep holding down the left mouse button and drag the cursor until Column D is highlighted all the way down to the bottom of the data. Release the left mouse button and right click the mouse and select "Paste."  You should now have a column of numbers showing monthly percentage changes.)
3.a. Compute descriptive measures for PCT. ( Plug in the appropriate column  into the "Input Range," click "Labels", and click "Summary Statistics.")
b. Now, go back to the data worksheet and calculate standardized values for PCT.  To do this with Excel, you will need the mean and standard deviation from the Summary Statistics output.  Then, on the original worksheet, type a label (such as std. values) at the top of an empty column.  Highlight the cell in row 3 under the title and type the formula = (mean - D3)/standard deviation) where  the mean and standard deviation are the numbers from your Summary Statistics output.
c.  Print the output (after you resize the output columns to fit properly and reformat the output column to present only 1 decimal place).
4.  Using the original data on the worksheet and the first 5 observations, calculate the mean, median, standard deviation, and standardized value (for the first observation).  How closely do your calculations match those generated by Excel for all observations (allowing for rounding differences)?
5.  What  do each of the descriptive statistics mean?
6.  Try to draw a histogram (roughly) based on the descriptive statistics.
If you have trouble, save your file on a floppy disk or CD and see me.
 


Mini-Assignment 3
Objective:  Practice with scatterplots, correlation analysis, and regression analysis

1.  Go to the  Correlation Applet   and practice matching the correct correlation coefficient with the appropriate scatterplot.  Also, Try taking a listed correlation coefficient and then drawing a scatterplot that would reflect the correlation coefficient.  See if your scatterplot matches the one provided by the applet for the coefficient.

2.  Open the file airfares.xls into Excel.  The file includes several variables regarding roundtrip air fares to several major cities based on a sample of 21 cities collected on a particular day for morning weekday departures with Saturday night stay-over.  The variables in the file are City, Fare (in $), Distance (to destination city in miles), Direct SWA (=1 if Southwest Airlines flies the route directly), and Fare per Mile (Fare divided by Distance).
2.  Generate a scatterplot between Fare and Distance Click on the chart icon button, select XY Scatter for chart type, and then use the chart wizard to generate a scatterplot that has distance on the x-axis and fare on the y-axis. Print this scatterplot.  Then
    a) Draw a regression line in by hand that would best fit through the middle of the data;
    b) Write out the equation (Fare = intercept + slope*Distance) that would go with your regression line -- this does not have to be exact.
3. Go back to the data worksheet and click Tools>Data Analysis>Regression.  Use the pop-up windonw to generate a regression analysis with Fare as the dependent variable (y-variable) and Distance as the independent variable (x-variable).  Remember to click the "Labels" box and also click the "Residuals" box.
    a) Write out the regression output in equation form.  Does this equation match yours from (2b)?
    b) Calculate the first residual by hand.  Use the Excel-generated residuals to check your calculation (allowing for rounding differences)
    c)  What does a 1 mile increase in distance predict for Fare?  What about a 100 mile increase? What would Fare be if distance were 0?  Why is this really only a "hypothetical" value?


Mini-Assignment 4
 Objective: Generating and interpreting contingency tables (crosstabulations)

1.  Open the data file Accounts.xls . (It is an Excel data file).  Variables are defined in the file.
2.  Generate a crosstabulation with Account Supervisor as the row variable and Account Irregularity as the Column Variable.  .  To do so, you must create a Pivot Table, put these two variable in their proper place,  and then place the  "COUNT" variable in the body of the table (refer to in-class instructions).  Put the new table on a new worksheet.
4.  Print your Excel crosstab and the original data sheet.
5.  Compute expected frequencies for each cell in the table and then compute the Chi-Square statistic for the table.



Mini-Assignment 5
Objective: Practice computing and understanding probability distributions & calculations

1.  Open Excel and compute probabilities for the following situations:
a.  Suppose that accounting errors due to erroneous data entry have a 1% (0.01)  probability on any given entry.  Also, suppose that  data entry errors are a binomial variable (error or no error).  Compute the likelihood of zero errors given 150 entries.
(In Excel, highlight cell A1, click the fx icon on tool bar; select “Function Category” = statistical; “Function name = BINOMDIST;  fill in the following values in the blanks. Number_s = 0, Trials = 150, Probability_s = 0.01, Cumulative = True.

b. Suppose that the average amount of time that customers are kept on hold by AOL customer support is 8 minutes with a standard deviation of 1.5 minutes.  If wait time is normally distributed, calculate the probability of waiting less than 10 minutes.
(In Excel, highlight cell A3, click click fx icon on tool bar; select “Function Category” = statistical; “Function name = NORMDIST;  fill in the following values in the blanks. X = 10, Mean = 8, Standard_Dev = 1.5, Cumulative = True.
Print the Excel Spreadsheet.

2.  Solve the following problems:
a.  If the probability of customers A and B returning an item is 20% (0.20) for each one individually, what is the probability of both customers returning their items (assuming their actions are independent of each other)?
b.  Given the same information as in 1a, what is the probability of  one of the two people returning their item?
c.   Suppose that return - no return is a binomial variable, with a probability of 0.10 of return for any 1 person.  What is the probability that either 0 or 1 people out of 6 people will return their items? (Try to use Excel to compute this answer).




Mini-Assignment 6
Objective: Understanding standard errors, interval estimates, and hypothesis tests

1.   Open the file airfares.xls into Excel.  The file includes several variables regarding roundtrip air fares to several major cities based on a sample of 21 cities collected on a particular day for morning weekday departures with Saturday night stay-over.  The variables in the file are City, Fare (in $), Distance (to destination city in miles), Direct SWA (=1 if Southwest Airlines flies the route directly), and Fare per Mile (Fare divided by Distance).

2. i) Create descriptive statistics including a 95% confidence interval for Fare per mile by selecting  Tools/Data Analysis/Descriptives and put the column for Fare per Mile in the “input range.”
Also, select Labels, Descriptive Statistics, and Confidence Interval for Mean (95%).  Click OK.  Format the output using Format/Column/Autofit.
ii) Repeat the prior step except Change the value in the window of “confidence interval for mean” from 95% to 99%.
iii) Repeat the prior step except using including only Fare per Mile for the first 10 cities.

3.  Create a test of the proposition (hypothesis) that the Fare per Mile for the first ten cities equals the Fare per Mile for the next eleven cities.  Use Tools>DataAnalysis>t-test: assuming unequal variaces.  Put the address first 10 observations for Fare per Mile in the Variable 1 range and the address for next 11 observations in the Variable 2 range.  Make the "hypothesized mean difference" equal to zero.

4.   Print you output and try to do the following:
i) For the output from (2i), circle the standard deviation and standard error of the mean.  Draw line from each out to the margin, and neatly write a brief explanation of each in the margin. Aslo, show how the standard error of the mean is computed.
ii) Using a simple number line illustration,  show the width of the 95% confidence interval for the mean versus the width of the 99% confidence interval for the mean.
iii) Provide equations showing how the 95% and 99% confidence intervals are computed. Why is the 99% interval wider?
iv)  What happened to the width of the interval when only 10 cities are used -- why?
v)   From  (3),  explain the meaning of the "two-tailed p-value";  what does it say about the hypothesis?
vi)  Why is there a t-statistic reported along with the p-value?



ASSIGNMENT 1 (Student Answer -- Score = A+)

3a.  The data we collected was borrowed from the WKUNet SPSS file Cars.sav.  We used the first thirty units of horsepower measured of the 406 entered data.  The data consisted of two variables:  the qualitative number assigned to each car and the quantitative measure of each car's horsepower.

b.  According to our data (Table 1), the typcial horsepower or approximate center is about 160.  The mean of horsepower is 143.4 with the median of the data being 150 horsepower.  This tells us that the data is somewhat symmetrical because the two numbers are close together.  The standard deviation or how different on average the data is from the mean is approximately 48.  Maximum values are 225 horsepower and minumum values are 46.

c.  The descriptives chart produced by SPSS (Table 1) gives a skewness of .053 and kurtosis of -.789.  Skewness, or how symmetric the dati is, if it is between -1 and 1, it is nearly symmetric.  Since this data falls within that range, the data for the cars' horsepower is nearly symmetric.  The kurtosis measures the outliers of the given data.  If the kurtosis is near zero, the graph has normal extreme values.  The horsepower measured, in our assignment, has normal extreme values.  Because of the large "clump" of data shown in the histogram (Figure 1), the horsepower data is unimodal.

d.  The first point on the box plot (Figure 2), or center line, is at 143.4.  The bottom line of the box, or first quartile, is defined at 90; the third quartile, or top line of the boax, is at 170.  The boxplot shows the interquartile range, which are the values of the third quartile -- first quartile is 80.

e.  Standardized values can be computed by the formula (variable value - mean)/standard deviation.  For the first observation, the formula gives
(130-143.4)/48 = -0.28.
For the second observation, the formula gives
(165-143.4)/48 = 0.45.
 
 
 
 
 
 

SPSS Help
 

Accessing SPSS on WKUnet
After you turn on the computer and the log in window will appear.  Fill in the blanks for your WKU Email address and your WKU Email Password.  BEFORE PROCEEDING, click the “Novell Account” button in the lower right hand corner where it says, "The default Novell account is Student.  To Change click Novell Account."  After you click this, two additional blanks will appear for Novell Account and Novell Password.  Change the Novell Account from Student to SPSS.  Leave the Novell Password blank.  When the main desktop appears, double click the SPSS folde in the left window.  Then click the SPSS 10 icon in the right window (not the SPSS production icon).  The SPSS menu and data spreadsheet should appear.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

3.a.  Create a table showing the average pct change in the S&P 500 Index for each year by using Data>Pivot Table and Pivot Chart Report.  When Pivot Table Wizard opens you will see
“Step 1 of 3"Window :  make sure “Microsoft Excel list or database” is checked and click “Next.”
“Step 2 of 3" Window:  click the Range icon, and highlight all three columns data (A, B, C) and then click “Next.”
“Step 3 of 3" Window:  click "Layout", drag the “Year” button into the area that says “Row”, and then click and drag the “s&p500" value into the area that says “Data”.  Double click on the button that now says “Count of s&p 500.”  In the new window, “Options,” then choose Summarize by “Average”, select Show data as “% of”, and choose Base Item as “1980.”  Select OK, OK, select “New Worksheet” and click “Finish.”
b.  Once the pivot table is created, create a chart based on it by clicking on the chart icon in the pivot table dialog box next to the pivot table.  Print the chart and then print the Worksheet containing your new table (File>Print or click printer icon).  If you run into trouble that you cannot solve, save your file to a floppy disk, and come see me.