You are on page 1of 21

Norwegian experiences to register-based

household and dwelling statistics

Li-Chun Zhang 1,2


1
University of Southampton (L.Zhang@soton.ac.uk)
2
Statistisk sentralbyrå, Norway
Population, Household & Dwelling Registers
A brief history in Norway:
• First, there was census (till 1980)
[Central Pop. Register: 1964, used in census 1970]
[CPR with family relationship, used in census 1980]

• Then, there was virtual census [admin+survey, 1990]

• The ‘last’ census [Dwelling Register (DR), 2001]

• Household Register (HR) in 2006

• Register-based census-like statistics in 2011


[A telling tale of CPR from personal experience...]
2
Main problems with registered addresses in CPR

Complete address (CA) = address that identifies a dwelling

1. address (17 digit) [e.g. for villa, farm house, etc.]

2. address + DIN (dwelling id, 9 digit) [e.g. in apartment building]

CPR-household = persons with the same CA in CPR

• Not everyone is registered with a CA in CPR

[about 7% population with missing CA in 2006]

• Not every registered CA in CPR is correct

Historically, address (not dwelling) registered in CPR

3
Main problems with registered addresses in CPR

What if CA-b household registered at a in CPR?


Presence of other households registered at (a, b)
(No a, No b) (No a, Yes b) (Yes a, No b) (Yes a, Yes b)
Unaffected Unaffected or Under-count Under-count
Under-count

NB. Address registration errors always lead to overall net


under-count of dwelling households, based on direct
enumeration of address-households in CPR.
NB. Register-based living household is impossible

4
Main problems with registered addresses in CPR

Reality
Dwelling ID Family ID Household ID Person ID Name Sex Age Income
H101 1 1 1 Astrid Female 72 y1
H102 2 2 2 Geir Male 35 y2
H102 2 2 3 Jenny Female 34 y3
H102 2 2 4 Markus Male 5 y4
H201 3 3 5 Knut Male 29 y5
H201 4 3 6 Lena Female 28 y6
H202 5 4 7 Ole Male 28 y7
Household Register
Dwelling ID Family ID Household ID∗ Person ID Name Sex Age Income
H101 1 1 1 Astrid Female 72 y1
H101 2 2 2 Geir Male 35 y2
H101 2 2 3 Jenny Female 34 y3
H101 2 2 4 Markus Male 5 y4
H101 3 3 5 Knut Male 29 y5
- 4 4 6 Lena Female 28 y6
- 5 4 7 Ole Male 28 y7

5
Key elements of HR 2006

Do not use survey data as gold-standard


[linked survey-register data for editing rules; indirect use]

Use of data from census 2001


• census household distribution for macro editing

• maximum use of census household micro data

Use of data in CPR


• family relationship: e.g. unmarried parents, kinships, etc.

• address-related information: e.g. date of moving, etc.

• ‘cohabitation tendency’

6
Key elements of HR 2006

Processing step Number (×1000)


Base Unit 1931.9
Merging kinship -23.2
Merging moving date -33.3
Merging cohabitation tendency -33.7
Splitting implausible merging +6.0
Household with registered DIN 1847.7
Base Unit without registered DIN 162.8
Total 2010.5

7
Key elements of HR 2006

8
Key elements of HR 2006

NB. Denmark in parentheses

9
How to assessing statistical uncertainty?

I. Double mixed-effects modelling approach (Zhang, 2009)

• SPREE (Purcell & Kish, 1980) to GSPREE (Zhang & Chambers, 2004):
mixed-effects model relating the association structure of the CA-
household (target) to that of the CPR-family (auxiliary)

• differential DIN-missing rates by Municipality and household type:


random effects (Municipality by household type) of missing rate

II. A unit-error theory (Zhang, 2011)

• Introducing allocation matrix A for representation

• Uncertainty propagation by fˆ(A|A∗) given A∗ in HR

10
How to assessing statistical uncertainty?

11
How to assessing statistical uncertainty?
Example: To obtain household age composition for 4 age groups
(0-18, 18-30, 31-65, 66+), use dummy-index value matrix as follows
   
0 0 0 1 0 0 0 1
   
0 0 1 0 1 0 2 0
   
   
0 0 1 0 0 2 0 0
   
   
X =  1 0 0 0  ⇒ AX =  0 1 0 0 
   
   
   
0 1 0 0 0 0 0 0
   
   
0 1 0 0 0 0 0 0
   
0 1 0 0 0 0 0 0

12
How to assessing statistical uncertainty?

Kongsvinger Household by size


1 2 3 4 5 6+
Proxy Household Register 3050 2269 1061 1073 333 79
Census 3051 2319 1060 1080 310 77
Prediction Expectation 3100 2314 1053 1063 317 81
RSEP including estimation uncertainty 38 20 10 8 6 5

Kongsvinger Household by type


I II III IV V
Proxy Household Register 3050 1791 2124 671 229
Census 3051 1845 2166 699 136
Prediction Expectation 3100 1797 2134 713 183
RSEP including estimation uncertainty 37 14 12 10 14
NB. (I) Single; (II) Couple without Children; (III) Couple with Children; (IV) Single Adult with Children; (V) Others

13
Integrating HR and DR

One-number household and dwelling statistics in 2011

• yearly register-based household statistics since 2005

• separate yearly register-based dwelling statistics

• each considered fit-for-use on its own

• separate uses of HR and DR

Ideal linkage by CA = Address + DIN

• CA in CPR: missing, erroneous or duplicated

• CA in DR: lack rigorous quality assurance

14
Integrating HR and DR

CPR DR
Person Household Dwelling Dwelling
1 1 1 1
2 1 1 1
3 2 2 N/A
4 3 2 N/A
.. .. .. ..
N M DN DP
- - - DP + 1
.. .. .. ..
- - - D
NB. M = 2.24 × 106 and D = 2.42 × 106 in 2011
15
Integrating HR and DR

HD = Perfectly matched household-dwelling set


• HR = HD ∪ HuD [i.e. set of households without matched dwelling]

• DR = HD ∪ DuH [i.e. set of dwellings without matched household]

Options and challenges:

• linkage between HuD and DuH: not directly linkable

• weighting of HD set for statistics: not “one-number”

• impute dwelling characteristics for HuD set: lack of


coherence with low-level dwelling statistics

16
Nearest neighbour linkage (NNL) of units
sets of units A = {1, 2, ..., nA} and B = {1, 2, ..., nB }

vector of keys: x = (x1, x2, ..., xp) available to both A and B

dissimilarity or distance measure between x and x0: kx − x0k

Method: for each i ∈ A

1. nearest neighbour (NN): k = arg minj∈B kxi − xj k

2. nearest neighbour linkage (NNL) i ↔ k

NB. linkage noise if ties, i.e. multiple NN-matches

NB. NNL from B to A may not result in same k ↔ i

NB. Similar to nearest neighbour imputation (Chen & Shao, 2000)


17
Nearest neighbour linkage (NNL) of units

DNNL from A to B: put R = {1, ..., N }


1. for i ∈ {1, ..., MA}, find NN-match j ∈ R, based on keys xA

2. for j above, find NN-match k ∈ {1, ..., MB }, based on keys zB

3. DNNL i ↔ k, where i ∈ A and k ∈ B [NB. without common keys]


18
Linkage noise: random variation due to match ties

Mid-row units are NN-match ties at the 1st-stage; 2nd-stage ties


stacked downwards. The 1st-stage ties do not cause linkage noise

• if 2nd-stage NN-matches is unique (e.g. shadowed), or

• provided identical interest value of all 2nd-stage NN-matches

Linkage noise for a given unit can be measured exactly by going


through all its potential DNNLs.
19
Implementation in 2011 (Zhang & Hendriks, 2012)

HuD set: DNNL by 2nd-stage blocking


HD set (Street) Address Census Tract Municipality
85% 15% × 47.6% 15% × 93.5% 15% × 97.8%

20
REFERENCES

References

[1] Chen, J. and Shao, J. (2000). Nearest neighbor imputation for survey data.
Journal of Official Statistics, vol. 16, pp. 113-131.
[2] Purcell, N.J. and Kish, L. (1980). Postcensal estimates for local areas (or
domains). International Statistical Review, Vol. 48, pp. 3 - 18.
[3] Zhang, L.-C. (2009). Estimates for small area compositions subjected to in-
formative missing data. Survey Methodology, vol. 35, pp. 191-201.
[4] Zhang, L.-C. (2011). A unit-error theory for register-based household statis-
tics. Journal of Official Statistics, vol. 27, pp. 415-432.
[5] Zhang, L.-C. and Chambers, R.L. (2004). Small area estimates for cross-
classifications. Journal of the Royal Statistical Society, Series B, vol. 66,
pp. 479-496.
[6] Zhang, L.-C. and Hendriks, C. (2012). Micro integration of register-based cen-
sus data for dwelling and household. UNECE: Work Session on Statistical
Data Editing, 2012.

21

You might also like