Quantitative Zoology:
Revision Notes

Compiled as a final-year zoology student at the University of Edinburgh, based on the information given in lectures.

Types of distribution

Normal distribution ('bell curve') defined by mean m and standard deviation s occurs when many factors contribute to a variable; used for continuous variables such as body size. Sample means are normally distributed (central limit theorem). Standard normal distribution (z-scores) has m = 0 and s = 1.
Binomial distribution (the 'coin toss distribution') is defined by p (probability of one outcome over the other) and sample size n (the 'binomial denominator'); used for ratios/proportions.
Poisson distribution ('falling bombs') defined entirely by mean; used for counts. In a Poisson process, probability of an event is independent of probability of previous events. Can be used to work out the number of unknown items.
Negative binomial distribution defined by mean and index of clumpiness; used for clumped counts.

Types of test

Assumptions of parametric tests (in decreasing order of importance): random sampling, independence, homogeneity of variances, normality.
Assumptions of independence, homogeneity of variance and normality apply to residuals.
If assumptions are met, no trend should be obvious when residuals (or raw data) are plotted against order of measurement, explanatory variables, or fitted/predicted values.
Heterogeneity of variance leads to excessive type 1 errors (false positives), especially if there are abnormally large variances in a group, or if sample sizes vary between groups. Heterogeneity of variance is not a problem if it is just sensible variation, if it is due to a single small variance, or if the effect is not significant anyway.
Heterogeneity of variance tested for with: Bartlett's Test, Cochrane's Test (preferred), Fowler and Cohen 'quick and dirty' ratio of largest : smallest variance (tested with F_max tables).
Assumptions of non-parametric tests: random sampling, independence, samples come from the same distribution.
Non-parametric tests are safer with small samples, and work with ranks, but offer limited ability to control for confounding variables and deal with multiple factors, and do not allow calculation of confidence intervals etc.

Study design

Observational studies: main aim is to avoid bias. Sampling can be systematic or random.
Stratified sampling (random sampling within subgroups) aims to take out noise (so strata should be as homogenous as possible). To generalise, it is necessary to know how population and sample are distributed across strata.
Experimental manipulations of X can prove causality, but are not always feasible, need to be set in an observational context, and usually involve artificial situations from which it is hard to generalise.
Key features of good experiments: proper controls, proper randomisation, proper replication.
Measurements are the numbers recorded; units are the independent entities being used; factors are the variables being manipulated (which often have levels).
Fully factorial design tests every possible combination of factors: good for testing interactions, and makes good statistical analysis easy, but not always feasible. Alternatives include split-plot (nested) designs or Latin squares.
A factor is nested when its levels are not comparable across levels of another factor (e.g. individual 1 in treatment block 1 is not related to individual 1 in other blocks).
Non-independent repeated measures can be dealt with by calculating a summary measure for each independent unit (e.g. difference score, mean result, final result, subsequent test, rate of change, fit curves).
Randomised block design helps eliminate noise that can't be excluded from experiment. Treatments are randomly allocated within blocks; comparisons are made between treatments within blocks.
Interactions: the importance of one factor depending on another factor. Indicated by non-parallel lines on plots.
Replicates should be independent. Pseudo-replication (non-independent replicates) can be due to: time, space, same stimulus, same individual, genealogy or phylogeny. Can be spotted by ludicrously high n or d.f.

Interpretation of results

Correlation can be due to: causation, reverse causation, third variables, or chance.
Causation investigated by: first principles, controlling for confounding variables, experimental manipulation.
Standard error of the mean (s/Ön) is a measure of the reliability of an estimate of the mean.
95% confidence limits = sample mean ± 1.96 × standard error of mean. For small samples, normal approximation breaks down; Student's t-distribution used instead: 95% C.I. = sample mean ± t_.05[_d.f.] × S.E.M.
A test is one-tailed if a 'significant' value in one direction is non-significant; requires halving of p-value.
Degrees of freedom = number of measurements minus number of parameters estimated from data = n - p
Statistical significance is not the same as biological significance.

Transformation of data

Transformation of data may deal with heterogeneity of variances and non-normality.
Standard error and confidence intervals can be back-transformed after analysis if appropriate.
Arcsin transformation (angular transformation) is used on proportions (bounded at 0 and 1); it finds sin^-1Öp
Square root transformation (ÖX) used on counts close to 0, or Poisson variables.
Log transformation used when data skewed to right (e.g. growth, size), when variance increases with mean (e.g. large counts, Poisson variables), ratios (range from 0 to infinity). Log(0) doesn't exist, so it may be necessary to use log(X+1). Log₁₀ and log_e have identical effects.
Box-Cox transformation is a family of curves providing a general catch-all.
Quick and dirty transformations: 1/ÖX, ÖX, log(X) or 1/X for right skews; X², X³, etc for left skews.

Testing if two samples differ

Parametric

T-tests can be used on small data sets (n < 30). Null hypothesis is that there is no difference.
2-sample t-test compares means of two groups: d.f. = n₁ + n₂ - 2, t = mean difference / S.E. of mean difference.
Paired t-test compares paired data, d.f. = n - 1.
1-sample t-test compares sample to predicted mean: t = (sample mean - m) / S.E.M., d.f. = n - 1.

Non-parametric

Mann-Whitney U test converts data into ranks and compares medians of two groups: null hypothesis = no difference, test statistic is U (or W).
Wilcoxon Matched Pairs test converts paired data into ranks and compares medians of pairs: test statistic is T.
Sign test looks at sign of differences (+ or -), null hypothesis = centred on zero.

Testing if two variables are related

Correlation - parametric (Pearson's) or non-parametric (Spearman's rank). Correlation coefficient ranges from r = 1 (perfect positive correlation) to r = -1 (perfect negative correlation); p-value gives probability that a correlation has arisen by chance (d.f. = n - 1).
Least squares regression - parametric; gives a line of best fit allowing dependent variable to be predicted from independent one; t-values and p-values tell us if m (gradient of line) and c (intercept) differ significantly from 0.

Testing if two variables are independent

These tests are used for analysing frequencies or counts, or comparing observed against expected results ('goodness of fit' tests).
Chi-squared test: c² = S( (O_i-E_i)²/E_i ), d.f. = (rows-1)×(columns-1), expected values should be >5.
G-test: better, especially with small expected values.
Fisher's exact test: better for expected values between 0 and 5.

Analysis of Variance (ANOVA)

Sum of squares (sum of squared deviations from a mean) is partitioned into Sample SS (based on deviations of group means from grand mean) and Error SS (based on deviations around separate group means).
Total d.f. = n - 1 = sample d.f. + error d.f.
Sample d.f. = number of groups - 1.
Mean square (SS/d.f.) is independent of n. Mean squares are not additive (Sample MS + Error MS š Total MS).
Test statistic: F-ratio = Sample MS / Error MS; d.f. = numerator, denominator.
Two-way ANOVA: Total SS = Sample₁ SS + Sample₂ SS + Interaction SS + Error SS.
If interaction is significant, both factors must be important, even if main effects are not significant on their own. If interaction is not significant, it may be dropped from model. Interactions can only be tested with replication.
Adjusted SS, unlike sequential SS, does not depend upon order in which terms are added to model. It is important when experimental design is not orthogonal (fully cross-factored and balanced).
ANOVA is a parametric test: it assumes random sampling, independence, homogeneity of variance, normality.
Directional heterogeneity tests have alternative hypothesis m_A < m_B < m_C etc rather than m_A š m_B š m_C; based on r_sP_c test statistic where r_s is rank correlation between observed and expected and P_c is p-value from ANOVA etc.

Regression

Response variable is (in decreasing order of importance): variable to be predicted, variable theory explains, variable with most error (if correlation is all that's of interest).
Choice of response variable does not affect statistical significance, correlation coefficient, or direction of slope, but does affect slope and intercept of regression line.
Least squares regression minimises 'least squares' (distance of points from regression line). It assumes variation is in y axis only.
Analysis of covariance (ANCOVA) seeks to explain a dependent variable in terms of a discrete and a continuous variable. It fits separate lines to each group; non-parallel lines indicate an interaction.

General Linear Models

ANOVAs are a particular case of General Linear Model. Other GLMs can handle: unbalanced designs, more complex designs, combinations of continuous and discrete predictors, other error structures.
To construct a GLM: start with maximal model (within reason), test assumptions, throw out non-significant highest-order interactions, reanalyse, throw out non-significant lower-order terms (if they aren't part of significant higher order terms), reanalyse, until minimal model has been reached. Re-test assumptions.

Errors

Type I error (false positive): null hypothesis wrongly rejected. Probability of Type I error = a = p-value.
Type II error (false negative): null hypothesis wrongly accepted. Probability of Type II error = b, which can't be specified in advance. a and b trade off against each other.
Liberal tests have high Type I error rates; conservative tests have high Type II error rates.
Power = 1 - b = probability of detecting an effect when it is actually there (not making a Type II error).
Power determined by: size of the effect, variability in the data (due to natural variability or sampling error).
Power calculations can be used in planning, to determine sample size required for given level of power.
Confidence intervals can tells us how big an effect could be and still generate a null result.
Multiplicity is the problem that the chance of making a mistake is heightened when multiple test are run: probability of at least one mistake = 1 - (1-a)ⁿ.
Multiplicity can be avoided by: asking focused questions, using large models where possible (e.g. ANOVA not multiple t-tests), avoiding fishing expeditions, distinguishing between a priori tests (inspired by theory) and post hoc tests (inspired by data), using statistical procedures such as Bonferroni adjustment (a=0.05/n).

More notes and essays

Quantitative Zoology: Revision Notes

Types of distribution

Types of test

Study design

Interpretation of results

Transformation of data

Testing if two samples differ

Parametric

Non-parametric

Testing if two variables are related

Testing if two variables are independent

Analysis of Variance (ANOVA)

Regression

General Linear Models

Errors

Quantitative Zoology:
Revision Notes