**Questions**

**Question 1**

The dataset bus.xlsx contains data for the entire Galway City bus fleet. For this question consider the variable referring to the number of kilometres travelled since the last maintenance (Kilometres) and answer the following questions:

(i) Organise the data on kilometres into a frequency distribution and show your results in the form of a table. Make sure to explain how you chose the number of classes, the class interval, and the class limits.

(ii) Construct a histogram of the class frequencies from part (i) in Excel and comment on the shape of the distribution.

(iii) By hand (i.e. not in Excel), draw/sketch a cumulative frequency polygon (including relative frequencies) and paste a picture of it into your Word document. Use it to answer the following:

a. Fifty percent of the buses were driven fewer than how many kilometres?

b. How many buses were driven less than 11,000 kilometres?

(iv) Now refer to the variables relating to bus manufacturer and bus capacity. Generate a pie chart for each variable in Excel and write a brief description of your results.

**Question 2**

The dataset bus.dta contains data for the entire Galway City bus fleet. You can get information on the variables in this dataset by using the command describe. Then use Stata to answer the following:

(i) Present a statistical and graphical analysis of annual bus maintenance costs. In doing so, make sure to address the following:

– Around what values do the data tend to cluster? Specifically, what was the mean maintenance cost last year? What is the median cost? Is one measure more representative of the typical cost than the others? What does the distribution look like?

– What is the range of maintenance costs? What is the standard deviation? What is the coefficient of skewness? About 95% of the maintenance costs are between what two values?

(ii) Again referring to the maintenance cost variable, construct a box plot (either in Stata or by hand). What are the first and third quartile values? Calculate the inter-quartile range. Are there any outliers? Is the distribution heavily skewed? Relate your conclusion to part (i).

(iii) Using the median maintenance cost, develop a contingency table with bus manufacturer as one variable and whether the maintenance cost was above or below the median as the other variable. What are your conclusions?

(iv) What is the correlation between maintenance costs and age of the bus? Construct a scatter plot with a line of best fit for these two variables and comment on the relationship you observe.

**Question 3**

Using the dataset bus.dta and Stata please answer the following questions:

(i) Generate a 99% and a 95% confidence interval for the variable Capacity. Report the commands you use and the resulting confidence intervals.

(ii) Evaluate the hypothesis that the mean number of kilometres since last maintenance is equal to 11,370. Interpret what the resulting p values mean.

(iii) Test the hypothesis that the population mean number of kilometres since last maintenance is the same for petrol and diesel buses. Explain the results.

(iv) Test the hypothesis that the population mean capacity number of passengers per bus is greater for the manufacturer Bluebird than for the other two manufacturers (Keiser or Thompson). Explain the results.

