Quantitative Zoology:
Revision Notes
Compiled as a finalyear zoology student at the University of Edinburgh, based on the information given in lectures.
Types of distribution

Normal distribution ('bell curve') defined
by mean m and standard deviation s occurs when many factors contribute to a variable; used for
continuous variables such as body size. Sample
means are normally distributed (central limit theorem). Standard normal
distribution (zscores) has m = 0 and s = 1.

Binomial distribution (the 'coin toss
distribution') is defined by p (probability of one outcome over the other) and sample
size n (the 'binomial denominator'); used for ratios/proportions.

Poisson distribution ('falling bombs') defined
entirely by mean; used for counts. In a Poisson process, probability of an
event is independent of probability of previous events. Can be used to work out
the number of unknown items.

Negative binomial distribution defined
by mean and index of clumpiness; used for clumped counts.
Types of test

Assumptions of parametric tests (in
decreasing order of importance): random sampling, independence, homogeneity of
variances, normality.

Assumptions of independence,
homogeneity of variance and normality apply to residuals.

If assumptions are met, no
trend should be obvious when residuals (or raw data) are plotted against order
of measurement, explanatory variables, or fitted/predicted values.

Heterogeneity of variance leads to
excessive type 1 errors (false positives), especially if there are abnormally
large variances in a group, or if sample sizes vary between groups.
Heterogeneity of variance is not a problem if it is just sensible variation, if
it is due to a single small variance, or if the effect is not significant
anyway.

Heterogeneity of variance
tested for with: Bartlett's Test, Cochrane's Test (preferred), Fowler and Cohen 'quick and
dirty' ratio of largest : smallest variance (tested
with F_{max} tables).

Assumptions of nonparametric tests:
random sampling, independence, samples come from the same distribution.

Nonparametric tests are safer
with small samples, and work with ranks, but offer limited ability to control
for confounding variables and deal with multiple factors, and do not allow
calculation of confidence intervals etc.
Study design

Observational studies: main aim is to
avoid bias. Sampling can be systematic or random.

Stratified sampling (random
sampling within subgroups) aims to take out noise (so strata should be as
homogenous as possible). To generalise, it is necessary to know how population
and sample are distributed across strata.

Experimental manipulations of X can
prove causality, but are not always feasible, need to be set in an
observational context, and usually involve artificial situations from which it
is hard to generalise.

Key features of good
experiments: proper controls, proper randomisation, proper
replication.

Measurements are the numbers recorded; units are the independent entities
being used; factors are the
variables being manipulated (which often have levels).

Fully factorial design tests every
possible combination of factors: good for testing interactions, and makes good
statistical analysis easy, but not always feasible. Alternatives include splitplot (nested) designs or Latin squares.

A factor is nested when its levels are not
comparable across levels of another factor (e.g. individual 1 in treatment
block 1 is not related to individual 1 in other blocks).

Nonindependent repeated
measures can be dealt with by calculating a summary measure for each
independent unit (e.g. difference score, mean result, final result, subsequent
test, rate of change, fit curves).

Randomised block design helps eliminate
noise that can't be excluded from experiment. Treatments are randomly allocated
within blocks; comparisons are made between treatments within blocks.

Interactions: the importance of one
factor depending on another factor. Indicated by nonparallel lines on plots.

Replicates should be
independent. Pseudoreplication
(nonindependent replicates) can be due to: time, space, same stimulus, same
individual, genealogy or phylogeny. Can be spotted by ludicrously high n or
d.f.
Interpretation of results

Correlation can be due to:
causation, reverse causation, third variables, or chance.

Causation investigated by: first
principles, controlling for confounding variables, experimental manipulation.

Standard error of the mean (s/Ön) is a measure of the reliability of an estimate of the mean.

95% confidence limits = sample mean ±
1.96 × standard error of mean. For small samples, normal approximation breaks
down; Student's tdistribution used instead: 95% C.I. = sample mean ± t_{.05[}_{d.f.]} × S.E.M.

A test is onetailed if a 'significant' value in one direction is
nonsignificant; requires halving of pvalue.

Degrees of freedom = number of
measurements minus number of parameters estimated from data = n  p

Statistical significance is not
the same as biological significance.
Transformation of data

Transformation of data may deal
with heterogeneity of variances and nonnormality.

Standard error and confidence
intervals can be backtransformed after analysis if appropriate.

Arcsin transformation (angular
transformation) is used on proportions (bounded at 0 and 1); it finds sin^{1}Öp

Square root transformation (ÖX) used on counts close to 0, or Poisson variables.

Log transformation used when data skewed
to right (e.g. growth, size), when variance increases with mean (e.g. large
counts, Poisson variables), ratios (range from 0 to infinity). Log(0) doesn't exist, so it may be necessary to use log(X+1).
Log_{10} and log_{e} have identical effects.

BoxCox transformation is a family of
curves providing a general catchall.

Quick and dirty transformations:
1/ÖX, ÖX, log(X) or 1/X for right skews; X^{2},
X^{3}, etc for left skews.
Testing if two samples differ
Parametric

Ttests can be used on small
data sets (n < 30). Null hypothesis is that there is no difference.

2sample ttest compares means of two groups:
d.f. = n_{1} + n_{2}  2, t = mean
difference / S.E. of mean difference.

Paired ttest compares paired data, d.f. = n  1.

1sample ttest compares sample to
predicted mean: t = (sample mean  m) / S.E.M., d.f. = n  1.
Nonparametric

MannWhitney U test converts data into
ranks and compares medians of two groups: null hypothesis = no difference, test
statistic is U (or W).

Wilcoxon Matched Pairs test converts
paired data into ranks and compares medians of pairs: test statistic is T.

Sign test looks at sign of differences
(+ or ), null hypothesis = centred on zero.
Testing if two variables are related

Correlation  parametric (Pearson's) or
nonparametric (Spearman's rank). Correlation
coefficient ranges from r = 1 (perfect positive correlation) to r = 1 (perfect
negative correlation); pvalue gives probability that a correlation has arisen
by chance (d.f. = n  1).

Least squares regression  parametric;
gives a line of best fit allowing dependent variable to be predicted from
independent one; tvalues and pvalues tell us if m (gradient of line) and c
(intercept) differ significantly from 0.
Testing if two variables are independent

These tests are used for
analysing frequencies or counts, or comparing observed against expected results
('goodness of fit' tests).

Chisquared test: c^{2} = S( (O_{i}E_{i})^{2}/E_{i}
), d.f. = (rows1)×(columns1), expected values should be >5.

Gtest: better, especially with small
expected values.

Fisher's exact test: better for expected values between 0 and 5.
Analysis of Variance (ANOVA)

Sum of squares (sum of squared
deviations from a mean) is partitioned into Sample SS (based on deviations of
group means from grand mean) and Error SS (based on deviations around separate
group means).

Total d.f.
= n  1 = sample d.f. + error d.f.

Sample d.f.
= number of groups  1.

Mean square (SS/d.f.) is independent of
n. Mean squares are not additive
(Sample MS + Error MS ¹ Total
MS).

Test statistic: Fratio = Sample MS / Error MS; d.f. = numerator, denominator.

Twoway ANOVA: Total SS =
Sample_{1} SS + Sample_{2} SS + Interaction SS + Error SS.

If interaction is significant,
both factors must be important, even if main effects are not significant on
their own. If interaction is not significant, it may be dropped from model.
Interactions can only be tested with replication.

Adjusted SS, unlike sequential SS, does not depend upon order in which
terms are added to model. It is important when experimental design is not
orthogonal (fully crossfactored and balanced).

ANOVA is a parametric test: it
assumes random sampling, independence, homogeneity of variance, normality.

Directional heterogeneity tests have
alternative hypothesis m_{A} < m_{B} <
m_{C} etc rather than m_{A} ¹ m_{B} ¹ m_{C};
based on r_{s}P_{c} test statistic where r_{s} is rank
correlation between observed and expected and P_{c} is pvalue from
ANOVA etc.
Regression

Response variable is (in decreasing
order of importance): variable to be predicted, variable theory explains,
variable with most error (if correlation is all that's of interest).

Choice of response variable
does not affect statistical significance, correlation coefficient, or direction
of slope, but does affect slope and intercept of regression line.

Least squares regression minimises
'least squares' (distance of points from regression line). It assumes variation
is in y axis only.

Analysis of covariance (ANCOVA) seeks to
explain a dependent variable in terms of a discrete and a continuous variable.
It fits separate lines to each group; nonparallel lines indicate an
interaction.
General Linear Models

ANOVAs are a particular case of
General Linear Model. Other GLMs can handle: unbalanced designs, more complex
designs, combinations of continuous and discrete
predictors, other error structures.

To construct a GLM: start with
maximal model (within reason), test assumptions, throw out nonsignificant
highestorder interactions, reanalyse, throw out
nonsignificant lowerorder terms (if they aren't part of significant higher
order terms), reanalyse, until minimal model has been reached. Retest
assumptions.
Errors

Type I error (false positive): null
hypothesis wrongly rejected. Probability of Type I error = a = pvalue.

Type II error (false negative): null
hypothesis wrongly accepted. Probability of Type II error = b, which can't be
specified in advance. a and b trade off against each
other.

Liberal tests have high Type I
error rates; conservative tests have high Type II error rates.

Power = 1  b = probability of detecting
an effect when it is actually there (not
making a Type II error).

Power determined by: size of
the effect, variability in the data (due to natural variability or sampling
error).

Power calculations can be used
in planning, to determine sample size required for given level of power.

Confidence intervals can tells
us how big an effect could be and still generate a null result.

Multiplicity is the problem that the
chance of making a mistake is heightened when multiple test are run:
probability of at least one mistake = 1  (1a)^{n}.

Multiplicity can be avoided by:
asking focused questions, using large models where possible (e.g. ANOVA not
multiple ttests), avoiding fishing expeditions, distinguishing between a priori tests (inspired by theory) and post hoc tests (inspired by data), using
statistical procedures such as Bonferroni adjustment (a=0.05/n).
More notes and essays
© Andrew Gray, 2005