You are on page 1of 14

2007 International Conference on Quality Management of Official Statistics 6-7 September, 2007.

Daejeon, Korea Assessing the quality and coverage of Vitals administrative data - Australian Bureau of Statistics Experience Caitlin Szigetvari <caitlin.szigetvari@abs.gov.au> Introduction 1. The Australian Bureau of Statistics (ABS) is Australia's National Statistical Organisation. The ABS is responsible for providing all Australian governments (Federal, State/Territory and Local) and communities with a high quality, objective and responsive national statistical service. To achieve this, the ABS produces a wide range of official statistics to assist with informed decision-making, to help government agencies formulate policies and to encourage research and discussion. 2. ABS Vitals statistics are key statistics used to formulate health and demographic policies and decisions. ABS Vitals statistics refers to a suite of data collected in relation to births, deaths, marriages and divorces. These statistics provide an insight to the sustainability of current and future Australian societies. 3. Australian birth and death statistics are collated from administrative data provided by each State and Territory Registrars of Births, Deaths and Marriages. The primary use of birth and death statistics is to contribute to the production of quarterly estimates of natural population change in the calculation of resident population estimates. It is imperative that the quality of birth and death data is of the highest standard, as population estimates are used to form government, business and community decisions including the distribution of Federal funding to State/Territory and Local governments. These statistics also contribute to numerous social and economic indicators. 4. This paper provides the reader with background information on ABS birth and death statistics and the steps involved in processing ABS births and deaths statistics. This paper also discusses the issues impacting on ABS births and deaths statistics, describes the framework used to measure the quality of ABS births and deaths statistics, the methodology explored to link ABS births and deaths data, and finally the dissemination of ABS Vitals statistics. Source of ABS Birth and Death Statistics 5. The registration of a birth or death in Australia is considered a State responsibility rather than a Federal responsibility. Australia has eight States and Territories with each having its own legislative Act covering the birth and death registration collection process, as well as defining the Registrars role and responsibilities for the dissemination of the statistical information. ABS receive the administrative births and deaths data from the Registrars and is responsible for processing, compilation and dissemination of the resultant birth and death statistics. 6. Birth and death registrations are a very valuable source of administrative data on births and fertility, and death and mortality, respectively in Australia. From a statistical point of view, the ABS has spent a considerable amount of time working with the Registrars across Australia to ensure high quality data is being collected. This has included ensuring that wherever possible common data items adhere to standard data quality requirements, thus maintaining comparability between States and Territories. This assists in achieving not only the objectives of the ABS data quality framework but also the United Nations (UN) international standards.

Birth Registrations 7. The Registrars supply Birth Registration Statements to hospitals and birth clinics for distribution to parents. Parents have a legislative requirement to register a birth within 60 days, with the onus on the parents to submit a completed Birth Registration Statement to the Registrar in their State/Territory. The primary purpose of the Birth Registration Statement is to allow for the legal identification of persons, through the issue of a birth certificate. The Registrar is responsible for processing the form and forwarding data to the ABS for further coding (for example, geography, multiple births, marital status and birthplace), compilation, analysis and dissemination. The Birth Registration Statement is slightly different in each State and Territory. While many questions are the same, there are some questions that are not comparable across States/Territories. 8. In addition to the Birth Registration Statement, hospitals, birth clinics, midwives and doctors are responsible for notifying the Registrars of all occurrences of births (including still-births). As a very useful quality check, these records are later linked to the parent-completed Birth Registration Statement and used to follow up outstanding Birth Registration Statement forms. Death Registrations 9. A death registration involves three separate components. Legislation requires that all components must be completed in order for the Registrar to consider a death registration finalised. The three components are: i. Death Notification Form - completed by a funeral director and then submitted to the Registrar. This form is based on information supplied by relatives/friends of the deceased. ii. Medical Certificate of Cause of Death - this is either completed by a doctor who attended the patient prior to death and forwarded to the Registrar or, if the death is suspicious or due to an unnatural cause (approx 14% of all deaths in Australia), a Coroner will be responsible for investigating and determining the cause of death. Coroners pass this information on to the National Coronial Information System and to the State/Territory Registrar. iii. Certificate of Burial or Cremation - completed by the funeral director and then submitted to the State/Territory Registrar. 10. Like the Birth Registration Statement form, each component of the death registration is slightly different for each State and Territory. While many questions are the same, there are some questions that are not comparable across States/Territories. 11. The primary purpose of death registrations is to allow for the legal identification of persons who are deceased and to ascertain the cause of death. Death registrations are used in a number of areas including, for legal purposes, in the finalisation and distribution of a deceased persons estate, and also to ascertain whether the death is 'suspicious' and requires further legal investigations. Registrars pass information from death notifications and Medical Certificates to the ABS (even if the registration is not finalised i.e. full cause of death details are not available). Like the Births data, the ABS conducts further coding on the deaths data including geography and the coding of causes of death to the International Classification of Diseases: ICD-10. The ABS then compiles and analyses the death registrations, and publishes annual statistics on deaths and causes of death across Australia.

Processing of ABS Births and Deaths Statistics 12. Birth and death records from each Registrar across Australia are received monthly by the ABS in a standard electronic format. Processing of this data is carried out by the Health and Vitals Statistics Unit, located in the Queensland State Office of the ABS. This process involves the derivation of data items, further demographic coding and editing to meet the quality standards of both the ABS and the UN prior to releasing of data to the public. 13. The ABS Vitals Statistics System records all steps involved in these processes and provides a sound statistical platform for staff to intervene and resolve data quality errors as well as analyse the overall quality of the resulting dataset or a particular data item. A generic outline of the process for turning the birth and death registration data into statistics is briefly outlined in Appendix 1. Measuring the Quality of ABS Birth and Death Statistics 14. To ensure a global data quality standard is maintained, the ABS collects Vitals information in accordance with the following documents: ! United Nations Statistics Division (UNSD), 2001, Principles and Recommendations for a Vital Statistics System. Revision 2 ! United Nations Statistics Division (UNSD), Handbook on Civil Registration and Vital Statistics Systems: Preparation of a Legal Framework ! United Nations Statistics Division (UNSD), 2004, Handbook on the Collection of Fertility and Mortality Data. 15. The ABS also measures the quality of their Vitals administrative data against the six dimensions of the ABS data quality framework. The data quality framework used in the ABS is based on the framework developed by Statistics Canada, which has been published internationally by Brackstone G (1999). The six dimensions include Relevance, Accuracy, Timeliness, Accessibility, Interpretability and Coherence. See Appendix 2 for further detail. The ABS uses the data quality framework to ensure the administrative data provided from the Registrars adheres to all the aspects of quality. Measuring the quality of ABS Vitals administrative data against the 6 dimensions of quality allows any shortcoming in the data to be explicitly identified, considered and acted upon. 16. While quality measures and standards are used in an attempt to ensure high quality administrative data is compiled for birth and death statistics, complete coverage of the Vitals data is not always achieved. Even though there is a legislative requirement to register a birth or death within a specified time, this does not always happen due to registration and processing lags which impacts on the coverage of the Vitals data the ABS receives. The ABS is usually able to take into account processing lags during compilation of the data, but the registration lags are harder to estimate. 17. Data linking methodology is one method the ABS is exploring to measure the quality and coverage of birth and death registrations. In particular two types of exact matching methods are being investigated. These are deterministic matching and probabilistic matching methods. Analysis has been undertaken using these data matching methods to link 2005 and 2006 Australian birth and death administrative data focussing on neo-natal and infant birth and deaths. This analysis enables the ABS to undertake quality assessment of fertility estimates, improve the quality of preliminary resident population estimates and provide information to the Registrars across Australia about patterns in the characteristic of people who register births late or not at all.

Issues Impacting on the Collection of ABS Births and Deaths Statistics 18. The availability of high quality birth and death statistics, for the Australian population as a whole, and especially for the Indigenous population, is important to all levels of the Australian government. The quality of this data is fundamental to the quality of the ABS calculated population estimates and projections; mortality data for the measure of the health of the Australian population; and fertility data for the measure of Australia's sustainability. 19. Whilst there are numerous issues surrounding the production of birth and death statistics, key issues include: recognising the statistical role of the Civil Registration Process; standardisation of data items between States and Territories; and registration lags. Recognising the statistical role of the Civil Registration Process 20. In 2005 the ABS commenced a program to develop better relationships with Registrars, with the aim of increasing the quality of birth and death statistics. This program has included a range of activities: ! developing a national standard set of data items for collection of birth and death statistics ! having more frequent and structured communication, at both a strategic and operational level, between the ABS and Registrars ! providing ABS assistance to Registrars in areas such as form content and design, and providing technical assistance in developing appropriate formats for provision of data to the ABS ! developing Memorandums of Understanding in order to clarify the type of data required by the ABS, security of the information and how the data will be used. 21. The program has had a number of successes including delivery of data in standard file formats, regular and productive meetings and increased information sharing at an operational level. While Memorandums of Understandings have not been signed, negotiations are continuing positively and the ABS anticipates a number of Memorandums of Understandings to be completed during 2007. Standardisation of Data Items 22. Since 1993 the ABS has endeavoured to work with Registrars on forms, systems and quality issues. The range of information currently collected by the Registrars varies significantly across States and Territories. As a result the standardisation of forms across all States and Territories remains an important goal for the collection of birth and deaths data. To date these projects have had varying degrees of success. During 2005 and 2006 the ABS undertook negotiations with all Registrars to develop a standard set of data items to be supplied from their various administrative systems, including provision of data to the ABS in a standard format. 23. The second stage of the standardisation process involved defining definitions, questions and procedures to the Registrars in an attempt to achieve a more consistent initial phase. There are still a considerable number of data items for which the information collected is inconsistent. From a national point of view it would be preferable to collect identical information using identical wording and definitions for all birth and death registrations. Although significant progress has been made towards achieving consistency of the information collected, there remain a number of items for which collected data is not comparable across States and Territories.

Registration lags 24. Time lags in birth registrations is a significant issue impacting the production, quality and coverage of birth and fertility statistics. Under the Registration Act in each State and Territory, a parent is allowed up to 60 days to register a birth. The registration of a birth can then further be delayed by parents not forwarding the Birth Registration Statement to their State/Territory Registrar, documentation not being processed in a timely manner by the Registrars, or the provision of incomplete documentation. As a result of these delays in registration, some births occurring in one year will not be registered until the following year or even later. In 2005, only 89% of births registered in that year occurred in 2005, with the remaining 2005 registered births being from previous years. 25. On 12 November 2006, the Australian Government Minister for Families, Communities and Indigenous Affairs announced that parents would be required to register the child's birth in order to qualify for the Maternity Payment (Baby Bonus). The Baby Bonus is a Federal Government initiative which recognises the extra costs incurred at the time of a new birth or adoption of a baby. The Baby Bonus is a one off financial payment of approximately $4000 paid to families following the birth (including stillborn babies) or adoption of a baby. As a condition of receiving the Baby Bonus for births on or after 1st July 2007, parents are required to formally registister the birth of their child and apply for the Baby Bonus within 26 weeks from the child's date of birth. The Government will also be making legislative changes to require all parents to formally register the birth of their child as a condition of receiving the Baby Bonus... An increasing number of parents are delaying registration of births, in some cases until many years later. Precise birth statistics are needed so governments can ensure they are accurately planning funding for future service delivery such as funding for schools and health.The Hon Mal Brough MP. 26. This policy decision is expected to substantially alter behaviour with regard to parents registering a birth. Previously, the only incentive for a parent to register a birth was the requirement of presenting a child's birth certificate when first enrolling a child in a school. Generally a child doesn't start school until the age of five, which meant birth registrations were often effected by registration lags. With a financial incentive to register a birth and the short time frame associated with claiming the Baby Bonus, this will flow onto improved quality, coverage and timeliness of birth statistics for population estimates. 27. Registration lags for deaths are not considered to be a major issue, although lags in processing by Coroners can affect the timeliness and accuracy of provision of data for the ABS Causes of Deaths Collection. Over the last 6 years, on average 96% of deaths are registered in the year in which the death occurred. The majority of the remaining deaths are registered in the following year. On occasion lags have occurred with the provision of medical certificates by Coroners and this has led to delays in the release of the cause of death statistics. In some cases there has been increased coding to non-specific codes which affects interpretation of cause of death statistics as the detailed cause of death information is not available at the time that the statistics are finalised. Methodology Used to Link ABS Births and Deaths Data 28. There is a wide range of statistics available on health issues in Australia collated from administrative data systems, census and sample surveys. In addition to the ABS collated Vitals data, there is Hospital Morbidity data, a Midwives Collection (Perinatal statistics), and administrative data from welfare systems. While these are all rich sources of information, linking such data would provide a greater insight to the health issues currently faced by Australian society.

29. Several investigations are occurring in Australia in analysing the feasibility and statistical validity of linked data sets. Privacy and confidentiality are fundamental to ABS business practices. Both legislative protection and strict security processes enable the ABS to ensure a high level of confidentiality to respondents. To maintain such community protection and reputability, it is not feasible for the ABS to participate in some projects. 30. An investigation being currently undertaken by the ABS is linking 2005 and 2006 birth and death administrative data with a focus on neo-natal and infant deaths. Given the data available for this investigation, the maximum age an infant can be is two years old. Two linking methods have been proposed for this analysis including the deterministic exact matching method and probabilistic exact matching method. For this project the deterministic exact matching method is used to assess the accuracy of using the probablistic exact matching method. This is particularly useful when analysing matching results when different variables are being used as part of the matching criteria, which in some cases changes the degree of uniqueness. Such analysis enables the quality and coverage of ABS Vitals administrative data to be measured. Results are provided below under the deterministic exact matching method. An outline of the procedure is provided for the probabilistic exact matching method, however results are not currently available as this matching project is in its initial stages. Deterministic Exact Matching Method 31. The ABS generally uses the deterministic exact matching method to link two datasets together. This method involves using a unique identifier or set of variables that form a unique matching criteria. A successful match is declared when the matching criteria is exactly the same on both datasets. This type of matching can only take place if the datasets are unique, complete and error free. 32. For the birth and death data files the variables used to form a matching criteria were state of registration, location of the mothers usual residence stated as a 9 digit Australian Standard Geographical Classification, sex and date of birth. The likelihood of this matching criteria being unique is considered to be satisfactory but not ideal. It is possible for multiple matches and incorrect matches to occur with this matching criteria consequently achieving a slightly higher match rate. 33. The following matching success was achieved using the deterministic exact matching method on the birth and death data. Table 1: Counts and percentage of matches for 2005 and 2006 birth and death adminsitrative data by State based on the matching criteria of state of registration, location of the mothers usual residence, sex and date of birth. 34. The matching criteria used to link births and deaths administrative data reveals that approximately 30% of neo-natal and infant child deaths don't have a corresponding birth registration. It is believed the 68% match rate achieved at the Australia level is quite high and is
Matches no. S tate New South Wales Victoria Queensland South Australia Western Australia Tasmania Northern Territory Australian Capital Territory Australia 691 507 400 116 211 37 43 45 2050 276 192 259 71 94 14 35 21 962 967 699 659 187 305 51 78 66 3012 71.5 72.5 60.7 62.0 69.2 72.5 55.1 68.2 68.1 Non-matches no. Total no. Percentage Matched %

a result of the flexible matching criteria. The inability to match all the neo-natal and infant birth and death registrations is somewhat expected given the family stress experienced when the death of a young child occurs. The results from this analysis indicate the coverage of neo-natal and infant birth registrations are under represented on the births administrative dataset. 35. The ABS is currently negotiating with each State/Territory Registrar to allow name of child and mothers address to be used as part of the matching criteria as this would increase the likelihood of the matching criteria being unique. To comply with confidentiality and privacy legislative Acts, the ABS is required to seek consent from the Registrars for the inclusion of these variables in the matching criteria. Probabilistic Exact Matching Method 36. An alternative matching procedure that has been explored in the ABS for data linking is a probabilistic exact matching method. This matching method involves evaluating all possible matches or links and giving a score based on the likelihood of a match, thus allowing ranking of the feasibility of all possible matches. A match is declared if the link score is higher than some predetermined cut-off. This type of matching occurs when there is partial identifying information, but no unique, error free, identifying matching criteria. The ABS is investigating the validity for implementing this matching method under the model formulated by Fellegi and Sunter (1969) which is the basis of the following general framework adopted by the data linkage packages being considered for use by the ABS. 37. The following diagram provides a visual display of the general framework used by the ABS when implementing the probabilistic exact matching method for data linking as discussed in Gu, et al (2003). A brief description of each step follows.
Diagram 1: General framework of data linking used in the ABS.
Initial data Records Standardisation Records Blocking and Searching Record pairs Single Pass Comparison Comparison vectors Decision Model Record pairs match status

Multiple pass: unassigned pairs

Measurement

a. Standardisation The variables listed in the datasets to be linked need to be standardised to allow comparisons to be made between the two different data sources. This may involve changing some variables from character/text format to numeric or removing inconsistencies. When linking the births and deaths administration datasets, the date of birth variable requires standardisation which involves changing the format from date/time format to ddmmyy format. b. Blocking and Searching Blocking is a technique used to simply split the dataset up by a blocking variable to reduce the number of comparisons required to assess the true pairs from the matching process. Jaro (1995) discussed the importance of choosing appropriate blocking strategies when linking

two dataset. The choice of blocking variables is important as more stable, high quality variables can increase the probability of attaining true matching pairs. For example when linking the 2005 and 2006 birth and death data for neo-natal and infant child deaths, if every single record was considered, there would be over 1.5 billion possible comparisons. By introducing a blocking strategy of state of registration, location of the mothers usual residence stated as a 9 digit Australian Standard Geographical Classification and sex the number of comparisons required is significantly reduced. It is possible that a true matching pair may not be identified when the blocking variables differ between the two records. This could be a possible issue for linking birth and death data if the mothers usual residence has changed from the time of the birth to the time of the child's death. However, changes in mothers usual residence are generally minimal when comparing neo-natal and infant child birth and death registrations. c. Comparison Record pairs between two datasets are only compared when the blocking variables agree. Field weights are then applied to each linking variable depending on how well they agree between the datasets. The comparison options explored by the ABS have included: ! Exact match. The variables either match or they don't. ! Key difference tolerated. Allows the field weight to depend on the number of characters that are different, allowing for misspellings, poor handwriting. ! Numerical difference. Allows the field weight to depend on how far apart values are. ! Geographical difference. Can use spatial information to determine the distance between address variables. d. Decision Model Based on the results of the comparison options, a final record weight is calculated by summing the individual field weights. Record pairs are then sorted by their final weight from highest to lowest. Using a decision rule based on certain cut-offs, the sorted final weight is used to determine whether a record pair is a true link, non-link or possible link which requires clerical review. A typical decision model is illustrated below. Diagram 2: Decision model considered for use in the ABS.
Highest weight

Li nks
Upper cutoff w eight

Cl eri cal Revi ew


Low er cutoff weight

reco rd pairs are sort ed by t heir final weight from high est t o lowest

Non-l i nks
Low est weight

e. Measurement This involves assessing the quality of linking pairs by determining how many matching record pairs have been missed, and how many matching pairs have been declared as a true link but are actually a non-link.

38. There are several different quality measures used in the ABS to determine the quality of linked datasets. Two measures readily used in the ABS to investigate the quality of linked datasets include match rate and match accuracy. 39. Match rate is the proportion of total matches that are actually linked. A match rate of 1 indicates that a linked dataset contains all true matches. This quality measure is defined as: match rate = matches linked/total matches. 40. Match accuracy is the proportion of links that are actually true matches. A match accuracy of 1 indicated that every record pair on a linked dataset is a true match. This quality measure is defined as: match accuracy = matches linked/total links. 41. There is a trade-off between match rate and match accuracy. Under the probabilistic exacting matching method, a linking strategy with a high final weight cut-off will have a low match rate and a high match accuracy. Conversely, a linking strategy with a low final weight cut-off will have a high match rate and possibly a low match accuracy. 42. Alternative measures of quality for linked datasets involve estimating match/non-match status' as these are generally unknown. Data quality measures that have been considered by the ABS for assessing linked datasets are detailed in Appendix 3. Dissemination of ABS Vitals Statistics 43. ABS Vitals statistics are released by on a Year of Registration basis. Release generally occurs within 12 months of the end of the reference period e.g. 2006 data will be published in November/December 2007. A wide range of methods are used by the ABS to ensure that data is widely accessible by government and the community. These include release of standard statistical publications, release of special analytical articles on issues of particular interest, information papers, conference papers, releases of coded quality assured unit record level data (subject to stringent processes) and release of specialised data sets as specified by the user. The vast majority of ABS data is available free on the ABS website - <www.abs.gov.au>.

Appendix 1 A generic outline of the process for turning the registration data into statistics is outlined below. 1. Record is loaded. 2. Record has derivations applied.

3. Record passes through the duplicate identification process.


4. If a record is found to be a duplicate, it stays in the duplicate view until resolved. If the record has been found to not be a duplicate, it will either continue on through the process it was in prior to the identification or pass through derivation, coders and edits. Records found to be a duplicate are cancelled and play no further part in processing or analysis. 5. Record passes through a range of coding processes including Birthplace coder, Geographical coder, occupation coder, medical terminology coder etc. 6. Record passes through the edit process. 7. If a record fails an edit, it stays in the error processing view until clerically resolved. Once the edit has been resolved, the records will pass through derivations, coders (and assigned a code if applicable) and then the record is edited again. If a record is clean (no edits) then the record has passed through all demographic processes and found to be of sufficient quality. 8. For the deaths Collection, the record will then proceed through to the Automatic Cause of Death coder to assign an ICD-10 code. 9. ICD- 10 coded records then pass through edits for quality assurance processes. 10. Records unable to be assigned a cause of death code automatically, are sent to a query stage where additional information is sought from the certifier. On receipt of additional information, records proceed through the Cause of Death coder and edit process again. 11. If record is clean (no edits) then the record has passed through all cause of death processes and found to be of sufficient quality.

10

Appendix 2 ABS Data Quality Framework The ABS data quality framework consists of 6 dimensions. This framework is based on the quality framework developed by Statistics Canada. The six dimensions of quality are described as follows. ! Relevance The relevance of statistical information reflects the degree to which it meets the real needs of clients. Relevance is concerned with whether the available information sheds light on the issues most important to users. Relevance of data must be tested against the requirements of any particular user of the statistics. Users need to make a determination of the relevance of the dataset, taking into account all of the aspects of quality, for the purpose to which the statistics are being used. ! Accuracy The accuracy of statistical information is the degree to which the information correctly describes what it was designed to measure. It may be described in terms of major sources of error that potentially cause inaccuracy. Quality standards used to achieve administrative goals can be different to quality standards needed to produce statistics, although the application of sound quality assurance principles by the administrative program will usually result in quality suitable for statistical use. ! Timeliness The timeliness of statistical information refers to the delay between the end of the reference period to which the information pertains, and the date on which the information becomes available. ! Accessibility The accessibility of statistical information refers to the ease with which it can be referenced by users. This includes the ease with which the existence of information can be determined, as well as the suitability of the form or medium through which the information is being assessed. ! Interpretability The interpretability of statistical information reflects the availability of the supplementary information (metadata) necessary to interpret and utilise it appropriately, including concepts, classifications and measures of accuracy. In addition, interpretability includes the appropriate presentation of data to aid in the correct interpretation of the data. ! Coherence The coherence of statistical information reflects the degree to which it can be successfully brought together with other statistical information within a broad analytical framework and over time. Coherence encompasses the internal consistency of a collection as well as its comparability both over time and with other data sources. The use of standard concepts and classifications promotes coherence.

11

Appendix 3 Alternative measures of quality explored in the ABS for assessing linked datasets include: ! Estimating the expected number of matches. In most linking projects the expected number of total matches can be estimated. This can then be compared to the total number of linked pairs to see if the total number of links is approximately equal to the total number of matches. ! Create a benchmark or 'truth' file. Comparing linked records with a benchmark file can provide insight to the quality of the matching process. A benchmark file can be created by matching records manually, linking variables on different matching criteria or using alternative probabilistic linking strategies. ! Estimate totals by analysing a sample of records Comparisons can be made between a sample of record pairs. The results provided by the sample can be used to estimate the population success of the linking process. ! Use the Fellegi and Sunter quality measures Felligi and Sunter (1969) provide theory underlying modern data linking. They propose two quality measures to determine the accuracy of linkage. The first is the probability a record pair is incorrectly linked (the pair is linked but is not a true match). The second measure is the probability a record pair is incorrectly assigned to being a non-linking pair (the pair is a true match but are not linked). ! Benlin and Rubin's curve fitting procedure (1995) Benlin and Rubin (1995) suggest quantifying a false match rate by the ratio of the number of false links to total links. This rate is based on transformed normal distributions estimated using mixture models fitted from the record pairs. ! The Simrate approach (Winglee et al, 2005) The Simrate approach, suggested by Winglee et al (2006), is a method of calculating Felligi and Sunter's probabilities using a simulation from a multinomial distribution to create a set of simulated record pair comparisons. This simulated distribution is then used to set appropriate cut-offs so that the two quality measures are satisfied. ! Calculating the probability of matching due to chance This approach applies probability theory to estimate how likely it is that two records may match by chance. If a record does match by chance then it would be a false link. The quality measure is referenced from Karmel (2004) and Blakley (2002). ! The duplicate method This measure involves counting the number of duplicate links after a matching process. A duplicate link occurs when a record from one file links with more than one record. Blakley et al (1999) discusses a method they describe as 'the duplicate method'. Their method is an extension of the chance matching methodology using the count of the number of duplicates as an extension.

12

References Australian Bureau of Statistics (ABS) (2006) Births, Australia 2005, cat. no. 3301.0, Canberra. Australian Bureau of Statistics (ABS) (2006) Deaths, Australia 2005, cat. no. 3302.0, Canberra. Australian Bureau of Statistics (ABS) (2007) Information Paper: External Causes of Death, Data Quality 2005, cat. no. 3317.0.55.001, Canberra. Belin TB and Rubin DB (1995) A Method for Calibrating False Match Rates in Record Linkage, Journal of the American Statistical Association, Vol 90, No 430, pp 694 - 707. Bishop G and Khoo J (2007) Research Paper: Methodology of Evaluating the Quality of Probabilistic Linking, Cat. No 1351.0.55.018, ABS, Canberra. Blakely T, Salmond C and Woodward A (1999) Anonymous record linkage of 1991 census records and 1991-94 mortality records: The New Zealand Census-Mortality Study (NZCMS Technical Report No. 1), Public Health Monograph Series No. 4, ISSN 1173-6844. Brackstone G (1999) Managing Data Quality in a Statistical Agency, Survey Methodology, Vol 25, No. 2, Statistics Canada. Accessible from <http://www.nso.go.kr/sqs2000/canada-attachment.pdf>. Carson C (2001) Toward a Framework for Assessing Data Quality, IMF Working Paper, WP/01/25, International Monetary Fund, Washington. Christen P and Goiser K (2006) Quality and Complexity Measures for Data Linkage and Deduplication in F. Guillet and H. Hamilton (eds) (2006) Quality Measures in Data Mining, Studies in Computational Intelligence, Springer. Conn L and Bishop G (2006) Research Paper: Exploring Methods for Creating a Longitudinal Census Dataset, (Methodology Advisory Committee), cat. no 1352.0.55.076, ABS, Canberra. Fellegi IP and Sunter AB (1969) A Theory for Record Linkage, Journal of the American Statistical Association, Vol 64, No 328, pp 1183 - 1210. Gu L, Baxter, Vickers, and Rainsford (2003) Record Linkage: Current Practice and Future Directions, CMIS Technical Report No. 03/83, CSIRO Mathematical and Information Sciences. Jaro M (1989) Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association, Vol 84, No 406, pp. 414 - 420. Jaro M (1995) Probabilistic linkage of large public health data files Statistics in Medicine, Vol 14, pp 491-498. Karmel R (2004) Linking Hospital morbidity and residential aged care data: examining matching due to chance, AIHW Cat No AGE 40, AIHW, Canberra. The Hon Mal Brough MP, Minister for Families, Community Service and Indigenous Affairs. Changes to Baby Bonus for under 18 Year Olds. Press release. Retrieved on 3 July 2007 from <http://www.facsia.gov.au/internet/Minister3.nsf/content/baby_bonus_12nov06.htm>.

13

National Coroners Information System (NCIS) (2007) Strategic Plan for the National Coroners Information System 2007-2012. Press Release. NCIS, Melbourne. Retrieved on 20 July 2007 from <http://www3.fhs.usyd.edu.au/ncch/downloads/AMDIG/2006MortalityWorkshop/Daking.pdf > National Coroners Information System (2007) NCIS home page viewed 6 March 2007, <http://www.ncis.org.au/index.htm>. Peirce J and Daking L (2007) The National Coroners Information System: contributing to death and injury prevention, Health Information Management Journal 36 (2) Pritchard T (2007) Growing data needs and potential of administrative data sources for compiling MDG and other development indicators - Australian Bureau of Statistics experience. Paper presented at the Regional Workshop on RETA 6356 Improving Administrative Data Sources for the Monitoring of Millennium Development Indicators, 9-11 July 2007 AIT, BANGKOK, Thailand. Tepping B (1968) A Model for Optimum Linkage of Records, Journal of the American Statistical Association, Vol. 63, No. 324, pp 1321-1332. United Nations Statistics Division (UNSD) (2001), Principles and Recommendations for a Vital Statistics System. Revision 2. United Nations Statistics Division (UNSD), Handbook on Civil Registration and Vital Statistics Systems: Preparation of a Legal Framework. United Nations Statistics Division (UNSD) (2004), Handbook on the Collection of Fertility and Mortality Data. Winglee M, Valliant R and Schuren F (2005) A Case Study in Record Linkage, Survey Methodology, Vol 31, No 1: pp 3-11. Winkler WE (2006) Overview of Record Linkage and Current Research Directions, US Bureau of the Census Research Report, No 2006 - 02. Wood, M and Pritchard, T (2007) Australian Mortality Coding History, Benefits and Future Directions, Health Information Management Journal 35(3). World Health Organization (WHO) (2007) ICD-10 home page viewed 6 March 2007, <http://www.who.int/classifications/icd/en/>. World Health Organisation (WHO) (1975) International Classification of Diseases, 9th Revision, Geneva. World Health Organisation (WHO) (1984) International Classification of Diseases and Health Related Problems, 10th Revision, Geneva.

14

You might also like