Statistical Information
Data sources and measures
Population data
Where populations (denominators) cannot be ascertained from the data, estimates are obtained from ABS Censuses for computing population-level indicators, and in some instances, for numerators (e.g. births and deaths). The Australian Census is carried out every five years, with the most recent performed in 2016.
Administrative data
Administrative data are routinely collected data that are captured whenever an individual comes into contact with a government agency or service. These data can be de-identified and made available for research purposes. There are some limitations that should be considered when interpreting indicators based on this data. Firstly, indicators derived from administrative data will be restricted to the individuals who made contact with the agency/service, and so may not be representative of the general population. For example, individuals presenting to hospitals for treatment could be those with more severe disease, or come from more disadvantaged population groups who are more likely to access hospitals for treatment (e.g., due to issues of affordability and access to primary care). Reliance on administrative data also means that a section of the population who have a certain condition will not be captured (e.g., those treated in the community or in the home).
Secondly, administrative data generally do not capture information about mediating or moderating mechanisms, such as social support networks, severity of illness, or cultural practices. Such variables can influence the incidence of a particular outcome and/or the likelihood of an individual making contact with a service. Many indicators derived from administrative data should therefore be viewed as underestimates of the true population incidence.
Lastly, there are limitations associated with parent-level indicators. Parent-child links are obtained from the Western Australian Family Connections Genealogy System, which uses information from birth registrations to pair children with parents. As such, only biological family connections can be made – no information is included on divorce, adoptions, step-families, grand-families, or other care arrangements. Family connections links can also not be made for individuals born outside of Western Australia, and for individuals who have missing paternity information on their birth registration record.
Summary measures of population health
Measures of event or disease frequency represented by indicators used in the Child Development Atlas are summarised below:
Measure |
Definition |
Numerator |
Denominator |
Units |
Prevalence |
The proportion of the population with disease at a specific point in time |
Number of people with disease at a given point in time |
Total number of people in the population |
% or proportion (or per 1,000; 10,000; 100,000 etc.) |
Cumulative Incidence |
The proportion of people who develop disease during a specific period |
Number who develop disease in a specific period |
Number at risk of the disease at the start of the period |
% or proportion (or per 1,000; 10,000; 100,000 etc.) |
Standardised Mortality Ratio (SMR) |
Compares incidence to a standard population using indirect standardisation |
Observed number of cases |
Expected number of cases based on a standard population |
Ratio (sometimes a %) |
Crude rates
All indicators included in the Child Development Atlas are at the population level, therefore the assumption is that everyone is at risk for the whole of the year (or years) of interest, as opposed to person-time at risk. Therefore, an event rate is calculated by dividing the total number of new cases of an event in a specified period (usually one year) by the average number of people in the population during the same period. This is then usually multiplied by 10,000 and presented as a rate per 10,000 people per year.
Depending on the data source, population denominators are calculated as the average of the size of the population at the start and at the end of the period of interest, or estimated from Census data. These basic rates are called ‘crude’ rates because they describe the overall incidence in a population without taking any other features of the population into account (e.g. age structures).
Age-specific rates
A crude comparison may have little meaning if the groups that are being compared have very different age structures. A way to get around this problem is to calculate separate rates for different age groups (age-specific rates). The rate in a particular age group can then be compared between geographic areas. This process can be extended to calculate separate rates for other groups, for instance male and female (sex-specific rates), and for different racial or socioeconomic groups.
Standardised rates
Comparisons between rates may become difficult if age-specific rates are presented for a large number of different age-groups. An alternative is to summarise or combine these age-specific rates using the process of direct standardisation. This involves calculating the overall incidence or mortality rate that would be expected in a ‘standard’ population (i.e. population with a hypothetical age structure) if it had the same age-specific rates as the study population. Direct standardisation requires:
- The age-specific event rates in the study population and
- The age distribution of the standard population
When the population being studied are not known but the total number of events is known, then the indirect standardisation is commonly used. The indirect method is also often used for small populations where fluctuations in age-specific rates can affect the reliability of rates calculated using the direct method. There are many similarities as well as differences between the two methods. However, the two methods will yield comparable results in most cases. It could be argued that the choice of a standard population is more important than the choice of the direct or indirect method. The standard population used in the Child Development Atlas for purposes of indicator comparisons is the Western Australian population.
Where the data allows, the direct method of age-standardisation is the method chosen for use in the Child Development Atlas because of its advantages over the indirect method when comparing Aboriginal and non-Aboriginal mortality rates, disease incidence and prevalence rates over time. Also note that indirect standardisation fixes the quantity of interest (i.e. age-specific rates) as the standard, and then compares the effect of differences in age-structure in two or more populations. For this reason, it is less useful as a public health comparator than direct standardisation.
Formula^{1}:
Direct method
SR= (SUM (r_{i} * P_{i}))/SUM P_{i}
Indirect method
SR=(C/SUM(R_{i} *p_{i}))*R
Where:
SR is the age-standardised rate for the population being studied
r_{i} is the age-group specific rate for age group i in the population being studied
P_{i} is the population of age group i in the standard population
C is the observed number of events* in the population being studied
SUM(R_{i}p_{i}) is the expected number of events in the population being studied
R_{i} is the age-group specific rate for age group i in the standard population
p_{i} is the population for age group i in the population being studied
R is the crude rate in the standard population
* 'Events' can include deaths, incident or prevalent cases of disease or other conditions, or health care utilisation occurrences.
Age groupings
As there is little difference in the resulting rate ratios and rate differences using five or ten year age-groupings, we follow the usual convention of using five year age-groupings in the calculation of directly age-standardised rates. However, if the distribution of the data across age-groups requires collapsing of age-groups to overcome small numbers, then 10 year age-groupings may be used.
Also, due to little or no difference in rate differences produced using 0-4 compared to using <1 and 1-4 age groups in the estimation of age-standardised rates, we follow the usual practice of using the 0-4 age group as the youngest age group in the calculation of age-standardised rates. This only applies to the calculation of age-standardised rates, and does not preclude presenting age-specific rates and distribution of events (e.g. deaths) for <1 and 1-4 age groups). If these age groups are not used, the actual age groups are detailed in notes accompanying the age standardised population rate information. Standardised rates are generally multiplied by 1,000 or 100,000 to avoid small decimal fractions. They are then called standardised rates per 1,000 or 100,000 population.
Standardised ratios
The indirect method is also used to calculate standardised mortality ratios (SMRs) and other standardised ratios, for example for health service utilisation and other events. These ratios express the overall experience of a comparison population in terms of the standard population by calculating the ratio of observed to expected deaths in the comparison population. This is calculated by dividing the observed number of deaths by the expected number. Sometimes the SMR is multiplied by 100 to express the ratio as a percentage, although this is not universally accepted. Not multiplying by 100 has the benefit of being able to say that the SMR was, for example, 2.3 times that expected rather than 130% higher.
Association and Cause
All data presented in the Child Development Atlas are not based on specific information about individuals but relate to the number of events (or deaths) in a population relative to the size of that population (often an estimate from the ABS census).
When comparing the strength of the relation between indicators, caution should therefore be made when trying to relate the occurrence of an event to potential causes. For example, there may be a statistical association between two indicators in the Child Development Atlas, which may lead to an assumption of a real association. However, there should be consideration of other possibilities that may be the cause of such associations, such as chance, bias or confounding. Three important ‘alternative explanations’ for associations are:
- chance (random error)
- bias (systematic error) and
- confounding
Chance or random error
Random error is the divergence, by chance alone, of a measurement from the true value. There are three main sources of random error: biological variation (natural variation of measurement depending on an individual’s biology), measurement error (imprecision inherent in the measuring system being used), and sampling error (selection of sample from whole population).
It is impossible to completely remove random error that has resulted from chance. Therefore, when examining an association between two indicators in the CDA, it is important to consider how likely it is to be a real effect, or whether it could have arisen by chance. Whilst associations between indicators should not be ignored, any interpretations of these relationships taken on its own should be cautious, and acknowledgement should be made of the possibility that it could just reflect the effect of chance.
Bias
Many potential sources of bias have been identified in epidemiological studies, but all fall into two main areas: bias with respect to who gets into the study (selection bias) and bias with respect to the information we collect from, or on, these people about their exposures and their diseases (measurement, information or observation bias). Bias, also known as systematic error, is potentially more problematic than random error because it’s much harder to know what effect it might have on an outcome. The most common systematic errors with administrative data involve underreporting of activity for a specific population, inaccurate re-coding of spatial information, or differences in data entry protocols^{4}.
Confounding
Confounding is where an apparent relationship between an exposure and an outcome is really due, in whole or in part, to a third factor that is associated with both the exposure and the outcome of interest. Confounding is a mixing of effects because the effect of the exposure we might be interested in is mixed up with the effect of some other factor. Age, sex and socioeconomic status (SES) are common confounders.
Linkage Quality
The Data Linkage Branch of the WA Department of Health maintains a series of Data Quality Statements containing information on the core datasets (http://www.datalinkage-wa.org.au/downloads/dlb_reports). These provide insight into the characteristics of many of the datasets that are used in the Child Development Atlas, and will help users of the Atlas to understand the variety of strategies and tools used to ensure that the linkage system contains the highest quality links.
References:
- Australian Institute of Health and Welfare (AIHW). Age-standardised rates. AIHW (METeOR). Available from: http://meteor.aihw.gov.au/content/index.phtml/itemId/327276
- Australian Institute of Health and Welfare (AIHW). National Healthcare Agreement: PI 07–Infant and young child mortality rate, 2017, AIHW (METeOR). Available from: http://meteor.aihw.gov.au/content/index.phtml/itemId/630004
- Australian Institute of Health and Welfare (AIHW). 2011. Principles on the use of direct age-standardisation in administrative data collections: For measuring the gap between Indigenous and non-Indigenous Australians. Available from: https://www.aihw.gov.au/reports/indigenous-australians/principles-on-the-use-of-direct-age-standardisatio/contents/table-of-contents
- Ardal S, & Ennis S (2001). Data detectives: Uncovering systematic errors in administrative databases. In Proceedings: Symposium 2001, Achieving Data Quality in a Statistical Agency: A Methodological Perspective.
- Australian Bureau of Statistics (ABS). 2013. Statistical Language. Available from: http://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language