I present a simple R package called sampler . The package defines sample sizes and margins of error (MOE) for proportions, as usually it is needed when designing public opinion surveys. In a previous post , I showed some functions that do mostly the same thing. This new package, though, includes some new features that can be useful when allocating a sample.
Installation # you have to install devtools first
devtools :: install_github ( "sdaza/sampler" )
library ( sampler )
Functions The packages contains four functions:
ssize : computes sample size. serr : computes MOE. astrata : assigns sample sizes to strata. serrst : computes MOE for stratified samples. Define sample size: ssize ssize ( .05 )
## [1] 384
# design effect (deff) and response rate (rr)
ssize ( .05 , deff = 1.2 , rr = .90 )
## [1] 512
# finite population correction
ssize ( .05 , deff = 1.2 , rr = .90 , N = 1000 )
## [1] 370
# warning message
ssize ( .05 , deff = 1.2 , rr = .90 , N = 100 )
## n is bigger than N in some rows: n = N
## [1] 100
Define sampling error: serr serr ( 384 )
## [1] 0.05
serr ( 512 , deff = 1.2 , rr = .90 )
## [1] 0.05
serr ( 370 , deff = 1.2 , rr = .90 , N = 1000 )
## [1] 0.05
# we still get an answer
serr ( 100 , deff = 1.2 , rr = .90 , N = 100 )
## [1] 0.0569
Strata allocation: astrata These examples show how to allocate a sample size into strata. Look at ?astrata in R for definitions of the allocation procedures that are available.
# I will use data.table
library ( data.table )
chile <- data.table ( chile )
chile
## reg pob pr
## 1: 1 328782 0.3
## 2: 2 613328 0.4
## 3: 3 308247 0.5
## 4: 4 759228 0.5
## 5: 5 1808300 0.5
## 6: 6 910577 0.6
## 7: 7 1035593 0.3
## 8: 8 2100494 0.1
## 9: 9 983499 0.2
## 10: 10 834714 0.5
## 11: 11 107334 0.5
## 12: 12 163748 0.4
## 13: 13 7228581 0.6
## 14: 14 401548 0.2
## 15: 15 235081 0.3
# proportional for a sample of 1000
chile [, aprop := astrata ( 1000 , wp = 1 , N = pob )]
# fixed (same number by stratum)
chile [, afixed := astrata ( 1000 , wp = 0 , N = pob )]
# 40% proportional, 60% fixed
chile [, a40 := astrata ( 1000 , wp = .4 , N = pob )]
# 60% proportional, 40% fixed
chile [, a60 := astrata ( 1000 , wp = .6 , N = pob )]
# square-root
chile [, aroot := astrata ( 1000 , method = "root" , N = pob )]
# neyman
chile [, aneyman := astrata ( 1000 , method = "neyman" , N = pob , p = pr )]
# standard deviation
chile [, astdev := astrata ( 1000 , method = "stdev" , N = pob , p = pr )]
# error
chile [, aerr := astrata ( e = .11 , method = "error" , N = pob , p = pr )]
## reg pob pr aprop afixed a40 a60 aroot aneyman astdev aerr
## 1: 1 328782 0.3 18 67 47 38 41 18 66 67
## 2: 2 613328 0.4 34 67 54 47 56 37 71 76
## 3: 3 308247 0.5 17 67 47 37 40 19 72 79
## 4: 4 759228 0.5 43 67 57 53 62 46 72 79
## 5: 5 1808300 0.5 101 67 81 87 96 110 72 79
## 6: 6 910577 0.6 51 67 61 57 68 54 71 76
## 7: 7 1035593 0.3 58 67 63 62 73 58 66 67
## 8: 8 2100494 0.1 118 67 87 98 104 77 43 29
## 9: 9 983499 0.2 55 67 62 60 71 48 58 51
## 10: 10 834714 0.5 47 67 59 55 65 51 72 79
## 11: 11 107334 0.5 6 67 43 30 23 7 72 79
## 12: 12 163748 0.4 9 67 44 32 29 10 71 76
## 13: 13 7228581 0.6 406 67 203 270 192 432 71 76
## 14: 14 401548 0.2 23 67 49 41 45 20 58 51
## 15: 15 235081 0.3 13 67 45 35 35 13 66 67
Getting sampling error from a stratified sample: serrst # the second most efficient allocation
serrst ( n = chile $ aprop , N = chile $ pob , p = chile $ pr )
## [1] 0.0288
# the worst solution
serrst ( n = chile $ afixed , N = chile $ pob , p = chile $ pr )
## [1] 0.0518
serrst ( n = chile $ a40 , N = chile $ pob , p = chile $ pr )
## [1] 0.0339
serrst ( n = chile $ a60 , N = chile $ pob , p = chile $ pr )
## [1] 0.0311
serrst ( n = chile $ aroot , N = chile $ pob , p = chile $ pr )
## [1] 0.0339
# the most efficient allocation
serrst ( n = chile $ aneyman , N = chile $ pob , p = chile $ pr )
## [1] 0.0285
serrst ( n = chile $ astdev , N = chile $ pob , p = chile $ pr )
## [1] 0.0508
serrst ( n = chile $ aerr , N = chile $ pob , p = chile $ pr )
## [1] 0.0498
Combining criteria # get error for 60% proportional / 40% fixed allocation for each strata
chile [, error_a60 := serr ( a60 , p = pr )]
# assign sample sizes assuming 13% error for each strata
chile [, serr13 := astrata ( e = .13 , method = "error" , N = pob , p = pr )]
# total error, not that good!
serrst ( n = chile $ serr13 , N = chile $ pob , p = chile $ pr )
## [1] 0.0586
chile [, . ( reg , pob , pr , a60 , error_a60 , serr13 )]
## reg pob pr a60 error_a60 serr13
## 1: 1 328782 0.3 38 0.1457 48
## 2: 2 613328 0.4 47 0.1401 55
## 3: 3 308247 0.5 37 0.1611 57
## 4: 4 759228 0.5 53 0.1346 57
## 5: 5 1808300 0.5 87 0.1051 57
## 6: 6 910577 0.6 57 0.1272 55
## 7: 7 1035593 0.3 62 0.1141 48
## 8: 8 2100494 0.1 98 0.0594 20
## 9: 9 983499 0.2 60 0.1012 36
## 10: 10 834714 0.5 55 0.1321 57
## 11: 11 107334 0.5 30 0.1789 57
## 12: 12 163748 0.4 32 0.1697 55
## 13: 13 7228581 0.6 270 0.0584 55
## 14: 14 401548 0.2 41 0.1224 36
## 15: 15 235081 0.3 35 0.1518 48
We can adjust a bit more:
# when error is higher than .13, use serr13
chile [, sfinal := ifelse ( error_a60 > .13 , serr13 , a60 )]
# new error by stratum
chile [, error_sfinal := serr ( sfinal , p = pr )]
# total error, much better!
serrst ( n = chile $ sfinal , N = chile $ pob , p = chile $ pr )
## [1] 0.0309
# although the total sample size is now bigger
sum ( chile $ sfinal )
## [1] 1109
## reg pob pr sfinal error_sfinal
## 1: 1 328782 0.3 48 0.1296
## 2: 2 613328 0.4 55 0.1295
## 3: 3 308247 0.5 57 0.1298
## 4: 4 759228 0.5 57 0.1298
## 5: 5 1808300 0.5 87 0.1051
## 6: 6 910577 0.6 57 0.1272
## 7: 7 1035593 0.3 62 0.1141
## 8: 8 2100494 0.1 98 0.0594
## 9: 9 983499 0.2 60 0.1012
## 10: 10 834714 0.5 57 0.1298
## 11: 11 107334 0.5 57 0.1298
## 12: 12 163748 0.4 55 0.1295
## 13: 13 7228581 0.6 270 0.0584
## 14: 14 401548 0.2 41 0.1224
## 15: 15 235081 0.3 48 0.1296
That’s it. A simple package to do simple calculations.