# statistics

0 / 5. 0 Category: Coursework

Pages: 2

Words: 275

Q1
The purpose of sampling is to form a distribution of individuals or cases that represents the population considered for the study. The sampling methodology that can be applied in this case would be stratified random sampling and multi-stage sampling. Since, 150 individuals are to be taken from the age group of 15 to 19 years, first of all demographic data needs to be analyzed as to what is the proportion of males and females in the entire Australian adolescent population in the age range of 15 to 19 years and if possible in each age category. The proportion of males and females could be like wise taken in the sample. Not only on the basis of gender, the males and females must be selected in the sample so as to represent the proportions of alcoholism, occupation, smoking, body mass index and metabolic rates prevalent in the population under study. This would form a sample that will not only represent the study population effectively but will also reduce the chances of bias that could be associated in the study.
The association between one quantitative and one qualitative (metabolic rate and occupation respectively) variable may be best compared by side-by-side box plots across the categories. Once again we may draw an OLS regression model to determine the strength of association and the direction of association.
The association between two qualitative variables(smoking and occupation) are analyzed through conditional probabilities and the data is graphically represented through contingency tables. Moreover an ordinary least squares regression can also be constructed to find out the association of such variables by marking it as dummy variables. This means assigning a numerical value to the qualitative traits.
The association between two quantitative variables(metabolic rate and lean body mass) can be graphically represented through scatter plots. One variable is called a predictor variable and the other variable is referred as criterion variable. These scatter plot is used to form the simple linear regression model.
——————————————————————————————————-
Q2
The 10th percentile can be found out by the following formula
Z = (Mean-Percentile score)/Standard Deviation (z represents probability)
The z is the probability score below which will lie a certain proportion of population. We fix z as the 10th percentile, which indicates that 10% of the workers will require such time in manufacturing a car and beyond that score (time) 90% of individuals will require time to manufacture a car. Thus the 10th percentile time is 16.859 hours.
Rearranging the following equation
Z = (Probabililiy score- mean score)/Standard Deviation
=(19 -17.5)/0.5
= 1.5/5 = 30% (hence from mean score to 19, 30% probability falls)
On the other hand
Z = (Probabililiy score- mean score)/Standard Deviation
=(18 -17.5)/0.5
= 0.5/5 = 10% (hence from mean score to 18, 10% probability falls)
Thus between 18 and 19 hrs the probability is 30% -10% = 20% will fall.
Replacing the raw score in above equation, the a zcore comes to 0. Hence the probability of completing below 17.5 hours is approximately 50% considering the manufacturing times follow a normal distribution. A normal distribution specifies that mean median and mode have the same z score. However if the distribution is not normal, and we consider that 19 hrs is the maximum time taken, then (100%-30%)=70% will be the probability of manufacturing a car in less than 17.5 hours.

Descriptive Statistics: Size
Total
Variable Data Count Mean StDev Minimum Q1 Median Q3 Maximum IQR
Size Large 11 3.327 2.739 1.100 1.600 2.000 3.600 9.100 2.000
Medium 13 2.869 2.981 0.400 1.000 1.200 4.200 10.500 3.200
Small 15 3.76 4.36 0.30 0.90 2.00 5.90 17.10 5.00
Is the size of an acorn related to the size of the geographic area covered by it species? It has been suggested that the size of a plants seed may affect the geographic range of the plant since larger acorns may be carried away by larger animals which may in turn have wider territorial ranges. Use summaries to comment on the research question ?Since the total count of plants beyond the median range (with 2) for both small and larger zones 7 and 5 respectively ( as total counts are 15 and 11), it may be argued that smaller acorns are more widely distributed than larger acorns which is contrary to the belief that larger acorns may be carried away by larger animals which may in turn have wider territorial ranges. Further the definition of large and small sizes as defined in the experiment may be analysed because the mean size of plants large and small are 3.32 and 3.76 which is not significantly different from each other.

Type in a comment on the association in this graph?There a significant difference between people not suffering from heart diseases compared to heart diseases in individuals who do not consume alcohol and the chances of not suffering from heart diseases are more than suffering from heart diseases. On the other hand significantly more number of individuals suffer from heart diseases compared to individuals who do not suffer heart diseases in individuals who consume alcohol.