---
title: "README"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{README}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
warning=FALSE,
message=FALSE
)
```
### Acknowledgement
We would like to express our gratitude to an anonymous reviewer whose valuable suggestions on applying a brute force Monte Carlo method in the implementation significantly improved the computational efficiency in certain settings.
### Download
```{r setup}
library(RDSsamplesize)
library(microbenchmark)
library(ggplot2)
library(dplyr)
library(latex2exp)
```
### Example
* Design A
* s = 10: 10 seeds to initiate the sampling process
* c = 3: 3 coupons issued to each recruiter
* maxWave = 9: 9-wave planned field period, which does not include the initial round of recruiting seeds
* rr = 0.4: 40% recruitment rate constant across the entire field period. The recruitment rate represents the average coupon use, i.e., the ratio of the number of successful recruits brought into the study by their recruiters to the total number of coupons issued to those recruiters
* bruteMC: If TRUE then use a brute force Monte Carlo approach to obtain empirical data and estimate sample size distribution; If FALSE then compute the theoretical results of sample size distribution using an approximation algorithm.
* tol = 0.025: Accuracy loss limit control, which is set up for the approximation algorithm when bruteMC=FALSE, with default of 0.025. This parameter determines the acceptable level of accuracy loss in the approximate computation of the sample size distribution
```{r}
A1 <-calSize(s=10,c=3,maxWave=9,rr=0.4,bruteMC=FALSE,tol=0.025)
A2 <-calSize(s=10,c=3,maxWave=9,rr=0.4,bruteMC=TRUE)
```
* Compare the execution time of two approaches: compute the theoretical results (A1) vs use a brute force Monte Carlo approach to obtain empirical data (A2). Usually, a brute force Monte Carlo approach is more time-efficient when dealing with high recruitment rates and long field periods.
```{r}
microbenchmark(A1 =calSize(s=10,c=3,maxWave=9,rr=0.4,bruteMC=FALSE,tol=0.025),
A2 =calSize(s=10,c=3,maxWave=9,rr=0.4,bruteMC=TRUE),
times=10)
```
* Design B
* s = 10: 10 seeds
* c = 3: 3 coupons issued to each recruiter
* maxWave = 3: 3-wave planned field period
* rr = c(0.3,0.2,0.1): 30%, 20%, 10% recruitment rate at 1st, 2nd, and 3rd recruitment wave, respectively
```{r}
B <- calSize(s=10,c=3,maxWave=3,c(0.3,0.2,0.1),bruteMC=FALSE,tol=0.025)
```
### Probability of sampling at least $n$ individuals (including seeds) by each wave
* Design A (with theoretical results)
```{r}
round(nprobw(A1,n=210),3) # recruit 200 more individuals besides 10 seeds
```
* Design A (with empirical results)
```{r}
round(nprobw(A2,n=210),3) # recruit 200 more individuals besides 10 seeds
```
* Design B
```{r}
round(nprobw(B,n=15),3) # recruit 5 more individuals besides 10 seeds
round(nprobw(B,n=50),3) # recruit 40 more individuals besides 10 seeds
```
### Visualization
* Extinction probability
* describe the likelihood of the recruitment process dying out naturally (not recruiting any new participants) over time
* $Pr(\tau = w)$ represents the probability of the recruitment process stopping at the $w$th wave
* Survival probability
* $Pr(\tau > w)$: the complementary cumulative distribution function of the extinction time $\tau$, where $w$ is the wave time. That is, the recruitment process will likely continue recruiting new participants at wave $w$ with probability $Pr(\tau > w)$
```{r,fig.width=5, fig.height=3, fig.align='center'}
plot_survival_prob<-function(P_tau_list){
P_tau<-data.frame(Wave=1:length(P_tau_list),PMF=P_tau_list)
P_tau<-P_tau%>%mutate(CDF=cumsum(PMF),CCDF=1-CDF)
print(ggplot(P_tau,aes(x=Wave,y=CCDF))+
geom_point()+
geom_line(linetype=2)+
scale_y_continuous(breaks=seq(0,1,by=.1),limits = c(min(0.9,min(P_tau$CCDF)),1))+
xlab('Wave, w')+
ylab('')+
scale_x_continuous(breaks =1:length(P_tau_list),minor_breaks = NULL)+
theme_minimal()+
theme(panel.grid.minor.y = element_blank(),
legend.title = element_blank(),
plot.title = element_text(hjust = 0.5))+
ggtitle(TeX('$Pr(\\tau > w)$')))
}
### design A (with theoretical results)
plot_survival_prob(P_tau_list=A1[[1]])
### design A (with empirical results)
plot_survival_prob(P_tau_list=A2[[1]])
### design B
plot_survival_prob(P_tau_list=B[[1]])
```
* Distribution the sample size by each wave
```{r,fig.width=7, fig.height=5, fig.align='center'}
plot_sample_size<-function(x){
info<-data.frame(Wave=as.character(),Zk=as.integer(),PMF=as.double(),CCDF=as.double())
for (i in 2:length(x)){
info<-rbind(info,cbind(Wave=i-1,x[[i]]))
}
colnames(info)<-c('Wave','Size','PMF','CCDF')
info$Wave<-paste("Wave",info$Wave)
print(ggplot(info,aes(x=Size,y=CCDF))+
facet_wrap(~Wave,scales = "free")+
geom_point(size=0.5)+
xlab('n')+
ylab('')+
theme_minimal()+
theme(panel.grid.minor.y = element_blank(),
legend.title = element_blank(),
plot.title = element_text(hjust = 0.5))+
ggtitle(TeX('Pr( Accmulated sample size (including seeds) $\\geq n | k$-th wave)')))
}
### Design A (with theoretical results)
plot_sample_size(A1)
### Design A (with empirical results)
plot_sample_size(A2)
### Design B
plot_sample_size(B)
```