Title: | Convenience Functions, Moving Window Statistics, and Graphics |
---|---|
Description: | Collection of functions for running and summarizing statistical simulation studies, creating visualizations (e.g. CART Shiny app, histograms with fitted probability mass/density functions), calculating moving-window statistics efficiently, and performing common computations. |
Authors: | Dane R. Van Domelen |
Maintainer: | Dane R. Van Domelen <[email protected]> |
License: | GPL-3 |
Version: | 1.1.5 |
Built: | 2024-11-04 04:50:44 UTC |
Source: | https://github.com/vandomed/dvmisc |
Converts a continuous BMI variable into a 3-level factor variable: Normal
weight if [-Inf, 25)
, Overweight if [25, 30)
, and Obese if
[30, Inf)
.
bmi3(x, labels = TRUE)
bmi3(x, labels = TRUE)
x |
Numeric vector of BMI values. |
labels |
If |
Factor variable with 3 levels.
Converts a continuous BMI variable into a 4-level factor variable:
Underweight if [-Inf, 18.5)
, Normal weight if [18.5, 25)
,
Overweight if [25, 30)
, and Obese if [30, Inf)
.
bmi4(x, labels = TRUE)
bmi4(x, labels = TRUE)
x |
Numeric vector of BMI values. |
labels |
If |
Factor variable with 4 levels.
User selects response variable and predictors, app displays optimal tree in
terms of cross-validation error along with a summary table and importance
plot. Relies on rpart
for fitting and
fancyRpartPlot
for plotting.
cart_app(data)
cart_app(data)
data |
Data frame. |
Shiny app.
Formats a glm
object for printing to console or
inputting to kable
.
clean_glm( fit, columns = NULL, expand_factors = TRUE, variable_labels = NULL, prep_kable = FALSE, decimals = 2, formatp_list = NULL )
clean_glm( fit, columns = NULL, expand_factors = TRUE, variable_labels = NULL, prep_kable = FALSE, decimals = 2, formatp_list = NULL )
fit |
Object returned from |
columns |
Character vector specifying what columns to include. Choices
for each element are |
expand_factors |
Logical value for whether to include two blank rows for factor variables (name of variable and reference group). |
variable_labels |
Character vector in case you want labels other than the variable names. |
prep_kable |
Logical value for whether to prepare for
printing via |
decimals |
Numeric value of vector specifying number of decimal places for each column. |
formatp_list |
Arguments to pass to |
Data frame.
fit <- glm(mpg ~ wt + as.factor(cyl) + hp, data = mtcars) clean_glm(fit) fit %>% clean_glm(prep_kable = TRUE) %>% knitr::kable()
fit <- glm(mpg ~ wt + as.factor(cyl) + hp, data = mtcars) clean_glm(fit) fit %>% clean_glm(prep_kable = TRUE) %>% knitr::kable()
So you can stop guess-and-checking with cut
.
cleancut(x, breaks, labels = NULL)
cleancut(x, breaks, labels = NULL)
x |
Numeric vector. |
breaks |
Character string, e.g. |
labels |
Character vector. |
Factor or integer vector.
x <- rnorm(100) y <- cleancut(x, "(-Inf, -1), [-1, 1], (1, Inf)") tapply(x, y, range) y <- cleancut(x, "(-Inf, -1), [-1, 1], (1, Inf)", c("<-1", "-1 to 1", ">1")) tapply(x, y, range)
x <- rnorm(100) y <- cleancut(x, "(-Inf, -1), [-1, 1], (1, Inf)") tapply(x, y, range) y <- cleancut(x, "(-Inf, -1), [-1, 1], (1, Inf)", c("<-1", "-1 to 1", ">1")) tapply(x, y, range)
Combines quantile
and cut
into a
single function, with strata-specific quantiles possible. For example, you
could create sex-specific height tertiles with
create_qgroups(height, groups = 3, strata = sex)
. Compatible with
dplyr functions like mutate
and
transmute
.
create_qgroups( x, groups = 4, probs = seq(1/groups, 1 - 1/groups, 1/groups), strata = NULL, quantile_list = list(na.rm = TRUE), cut_list = list(include.lowest = TRUE) )
create_qgroups( x, groups = 4, probs = seq(1/groups, 1 - 1/groups, 1/groups), strata = NULL, quantile_list = list(na.rm = TRUE), cut_list = list(include.lowest = TRUE) )
x |
Numeric vector. |
groups |
Numeric value, e.g. 3 for tertiles, 4 for quartiles, etc. |
probs |
Numeric vector. |
strata |
Factor specifying subgroups to calculate quantiles within. For
multivariable subgroups, you can use |
quantile_list |
Arguments to pass to |
cut_list |
Arguments to pass to |
Factor variable.
# In mtcars dataset, create tertiles for mpg mtcars$mpg_tertiles <- create_qgroups(mtcars$mpg, groups = 3) table(mtcars$mpg_tertiles) # Define tertile cutpoints separately for 4-, 6-, and 8-cylinder vehicles mtcars$mpg_tertiles <- create_qgroups(mtcars$mpg, groups = 3, strata = mtcars$cyl) table(mtcars$mpg_tertiles) # Works with dplyr functions like mutate mtcars <- mtcars %>% dplyr::mutate(mpg_tertiles = create_qgroups(mpg, groups = 3, strata = cyl)) table(mtcars$mpg_tertiles) # Can embed in lm, glm, etc. summary(lm(mpg ~ create_qgroups(wt), data = mtcars))
# In mtcars dataset, create tertiles for mpg mtcars$mpg_tertiles <- create_qgroups(mtcars$mpg, groups = 3) table(mtcars$mpg_tertiles) # Define tertile cutpoints separately for 4-, 6-, and 8-cylinder vehicles mtcars$mpg_tertiles <- create_qgroups(mtcars$mpg, groups = 3, strata = mtcars$cyl) table(mtcars$mpg_tertiles) # Works with dplyr functions like mutate mtcars <- mtcars %>% dplyr::mutate(mpg_tertiles = create_qgroups(mpg, groups = 3, strata = cyl)) table(mtcars$mpg_tertiles) # Can embed in lm, glm, etc. summary(lm(mpg ~ create_qgroups(wt), data = mtcars))
Complex survey version of create_qgroups
. Relies heavily on the
survey package [1,2].
create_qgroups_svy( x, groups = 4, probs = seq(1/groups, 1 - 1/groups, 1/groups), strata = NULL, design, svyquantile_list = list(na.rm = TRUE), cut_list = list(include.lowest = TRUE) )
create_qgroups_svy( x, groups = 4, probs = seq(1/groups, 1 - 1/groups, 1/groups), strata = NULL, design, svyquantile_list = list(na.rm = TRUE), cut_list = list(include.lowest = TRUE) )
x |
Numeric vector. |
groups |
Numeric value, e.g. 3 for tertiles, 4 for quartiles, etc. |
probs |
Numeric vector. |
strata |
Factor specifying subgroups to calculate quantiles within. For
multivariable subgroups, you can use |
design |
Survey design object. |
svyquantile_list |
Arguments to pass to
|
cut_list |
Arguments to pass to |
Factor variable.
1. Therneau, T. (2015). A Package for Survival Analysis in S. R package version 2.38. https://cran.r-project.org/package=survival.
2. Therneau, T.M. and Grambsch, P.M. (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York. ISBN 0-387-98784-3.
Convenience function to get decreasing factor levels from cut
.
Currently requires specifying breaks
as vector of cutpoints rather
than number of desired intervals.
cut_decreasing(x, breaks, include.lowest = FALSE, right = TRUE, ...)
cut_decreasing(x, breaks, include.lowest = FALSE, right = TRUE, ...)
x , breaks , include.lowest , right
|
See |
... |
Arguments to pass to |
Factor variable.
# In mtcars dataset, create 3 mpg groups table(cut(mtcars$mpg, breaks = c(-Inf, 15, 20, Inf))) # Repeat with cut_decreasing to get factor levels ordered from high to low. # To match cut here, need to specify right = FALSE table(cut_decreasing(mtcars$mpg, breaks = c(Inf, 20, 15, -Inf), right = FALSE)) # You can specify breaks from low to high, but then include.lowest and right # arguments get confusing table(cut_decreasing(mtcars$mpg, breaks = c(-Inf, 15, 20, Inf), right = TRUE))
# In mtcars dataset, create 3 mpg groups table(cut(mtcars$mpg, breaks = c(-Inf, 15, 20, Inf))) # Repeat with cut_decreasing to get factor levels ordered from high to low. # To match cut here, need to specify right = FALSE table(cut_decreasing(mtcars$mpg, breaks = c(Inf, 20, 15, -Inf), right = FALSE)) # You can specify breaks from low to high, but then include.lowest and right # arguments get confusing table(cut_decreasing(mtcars$mpg, breaks = c(-Inf, 15, 20, Inf), right = TRUE))
Creates plot showing user-specified points (e.g. means, medians, regression coefficients) along with user-specified error bars (e.g. standard deviations, min/max, 95% confidence intervals).
dots_bars( y = NULL, bars = NULL, bars.lower = y - bars, bars.upper = y + bars, truth = NULL, group.labels = NULL, group.dividers = TRUE, subgroup.spacing = 1, subgroup.labels = NULL, subgroup.pch = NULL, subgroup.col = NULL, points.list = NULL, arrows.list = NULL, xaxis.list = NULL, yaxis.list = xaxis.list, abline.dividers.list = NULL, abline.truth.list = NULL, legend.list = NULL, ... )
dots_bars( y = NULL, bars = NULL, bars.lower = y - bars, bars.upper = y + bars, truth = NULL, group.labels = NULL, group.dividers = TRUE, subgroup.spacing = 1, subgroup.labels = NULL, subgroup.pch = NULL, subgroup.col = NULL, points.list = NULL, arrows.list = NULL, xaxis.list = NULL, yaxis.list = xaxis.list, abline.dividers.list = NULL, abline.truth.list = NULL, legend.list = NULL, ... )
y |
Numeric vector of y-values for different groups, or numeric matrix where each column contains y-values for clustered subgroups within a group. |
bars |
Numeric vector or matrix (matching whichever type |
bars.lower |
Numeric vector or matrix (matching whichever type |
bars.upper |
Numeric vector or matrix (matching whichever type |
truth |
Numeric value specifying true value of parameter being estimated. If specified, a horizontal reference line is added to the plot. |
group.labels |
Character vector of labels for the groups. |
group.dividers |
Logical value for whether to add vertical lines distinguishing the groups. |
subgroup.spacing |
Numeric value controlling the amount of spacing between subgroups, with values > 1 corresponding to more spacing. |
subgroup.labels |
Character vector giving labels for the subgroups. |
subgroup.pch |
Plotting symbol for different subgroups within each group. |
subgroup.col |
Plotting color for different subgroups within each group. |
points.list |
Optional list of inputs to pass to
|
arrows.list |
Optional list of inputs to pass to
|
xaxis.list |
Optional list of inputs to pass to
|
yaxis.list |
Optional list of inputs to pass to
|
abline.dividers.list |
Optional list of inputs to pass to
|
abline.truth.list |
Optional list of inputs to pass to
|
legend.list |
Optional list of inputs to pass to
|
... |
Additional arguments to pass to |
Plot showing points +/- error bars across groups/subgroups.
# Generate 100 values from normal distributions with different means, and # graph mean +/- standard deviation across groups dat <- cbind(rnorm(100, 2), rnorm(100, 2.5), rnorm(100, 1.75)) means <- apply(dat, 2, mean) sds <- apply(dat, 2, sd) fig1 <- dots_bars(y = means, bars = sds, main = "Mean +/- SD by Group", ylab = "Mean +/- SD") # Simulate BMI values for males and females in 3 different age groups, and # graph mean +/- 95% CI sex <- as.factor(c(rep("Male", 300), rep("Female", 300))) age <- as.factor(rep(c("Young", "Middle", "Old"), 2)) bmi <- c(rnorm(100, 25, 4), rnorm(100, 26, 4.25), rnorm(100, 27, 4.5), rnorm(100, 26.5, 4.5), rnorm(100, 27.25, 4.75), rnorm(100, 28, 5)) dat <- data.frame(sex = sex, age = age, bmi = bmi) means <- tapply(dat$bmi, dat[, c("sex", "age")], mean) ci.lower <- tapply(dat$bmi, dat[, c("sex", "age")], function(x) t.test(x)$conf.int[1]) ci.upper <- tapply(dat$bmi, dat[, c("sex", "age")], function(x) t.test(x)$conf.int[2]) fig2 <- dots_bars(y = means, bars.lower = ci.lower, bars.upper = ci.upper, main = "BMI by Sex and Age", ylab = "BMI (mean +/- CI)", xlab = "Age group")
# Generate 100 values from normal distributions with different means, and # graph mean +/- standard deviation across groups dat <- cbind(rnorm(100, 2), rnorm(100, 2.5), rnorm(100, 1.75)) means <- apply(dat, 2, mean) sds <- apply(dat, 2, sd) fig1 <- dots_bars(y = means, bars = sds, main = "Mean +/- SD by Group", ylab = "Mean +/- SD") # Simulate BMI values for males and females in 3 different age groups, and # graph mean +/- 95% CI sex <- as.factor(c(rep("Male", 300), rep("Female", 300))) age <- as.factor(rep(c("Young", "Middle", "Old"), 2)) bmi <- c(rnorm(100, 25, 4), rnorm(100, 26, 4.25), rnorm(100, 27, 4.5), rnorm(100, 26.5, 4.5), rnorm(100, 27.25, 4.75), rnorm(100, 28, 5)) dat <- data.frame(sex = sex, age = age, bmi = bmi) means <- tapply(dat$bmi, dat[, c("sex", "age")], mean) ci.lower <- tapply(dat$bmi, dat[, c("sex", "age")], function(x) t.test(x)$conf.int[1]) ci.upper <- tapply(dat$bmi, dat[, c("sex", "age")], function(x) t.test(x)$conf.int[2]) fig2 <- dots_bars(y = means, bars.lower = ci.lower, bars.upper = ci.upper, main = "BMI by Sex and Age", ylab = "BMI (mean +/- CI)", xlab = "Age group")
Collection of functions for running and summarizing statistical simulation studies, creating visualizations (e.g. CART Shiny app, histograms with fitted probability mass/density functions), calculating moving-window statistics efficiently, and performing common computations.
Package: | dvmisc |
Type: | Package |
Version: | 1.1.5 |
Date: | 2020-09-27 |
License: | GPL-3 |
See CRAN documentation for full list of functions.
Dane R. Van Domelen
[email protected]
Eddelbuettel, D. and Francois, R. (2011) Rcpp: Seamless R and C++ Integration. Journal of Statistical Software, 40(8), 1-18. http://www.jstatsoft.org/v40/i08/.
Eddelbuettel, D. (2013) Seamless R and C++ Integration with Rcpp. Springer, New York. ISBN 978-1-4614-6867-7.
Eddelbuettel, D. and Balamuta, J.J. (2017). Extending R with C++: A Brief Introduction to Rcpp. PeerJ Preprints 5:e3188v1. https://doi.org/10.7287/peerj.preprints.3188v1.
Acknowledgment: This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-0940903.
Loops over the last argument, then the second-last, and so on. It should be
faster than expand.grid
.
expand_grid(..., together = NULL)
expand_grid(..., together = NULL)
... |
Vectors you want all combinations of. |
together |
Data frame of vectors, where each row is a set of parameter values that are always kept together. |
Data table.
# Simple example of expand.grid vs. expand_grid expand.grid(x = c("a", "b", "c"), y = c(1, 2), z = c(TRUE, FALSE)) expand_grid(x = c("a", "b", "c"), y = c(1, 2), z = c(TRUE, FALSE)) # How to keep certain variables together expand_grid(x = 1: 5, together = data.frame(y = c("1a", "2a"), z = c("1b", "2b")))
# Simple example of expand.grid vs. expand_grid expand.grid(x = c("a", "b", "c"), y = c(1, 2), z = c(TRUE, FALSE)) expand_grid(x = c("a", "b", "c"), y = c(1, 2), z = c(TRUE, FALSE)) # How to keep certain variables together expand_grid(x = 1: 5, together = data.frame(y = c("1a", "2a"), z = c("1b", "2b")))
Uses maximum likelihood to fit
Y|X ~ Gamma(exp(beta_0 + beta_x^T X), b), with the
shape-scale (as opposed to shape-rate) parameterization described in
GammaDist
. Y can be precisely measured or subject to
multiplicative mean-1 lognormal errors, in which case replicates can be
incorporated by specifying y
as a list.
gammareg( y, x = NULL, merror = FALSE, integrate_tol = 1e-08, integrate_tol_hessian = integrate_tol, estimate_var = TRUE, fix_posdef = FALSE, ... )
gammareg( y, x = NULL, merror = FALSE, integrate_tol = 1e-08, integrate_tol_hessian = integrate_tol, estimate_var = TRUE, fix_posdef = FALSE, ... )
y |
Numeric vector. |
x |
Numeric vector or matrix. If |
merror |
Logical value for whether to model multiplicative lognormal measurement errors in Y. |
integrate_tol |
Numeric value specifying the |
integrate_tol_hessian |
Same as |
estimate_var |
Logical value for whether to return Hessian-based variance-covariance matrix. |
fix_posdef |
Logical value for whether to repeatedly reduce
|
... |
Additional arguments to pass to |
List containing:
Numeric vector of parameter estimates.
Variance-covariance matrix (if estimate_var = TRUE
).
Returned nlminb
object from maximizing the
log-likelihood function.
Akaike information criterion (AIC).
The MSE, defined as the sum of the squared residuals divided by n-p
(n
= number of observations, p
= number of regression
coefficients), is an unbiased estimator for the error variance in a linear
regression model. This is a convenience function that extracts the MSE from
a fitted lm
or glm
object. The code is
rev(anova(model.fit)$"Mean Sq")[1]
if model.fit
is a
lm
object and
sum(model.fit$residuals^2) / model.fit$df.residual
if model.fit
is a glm
object.
get_mse(model.fit, var.estimate = FALSE)
get_mse(model.fit, var.estimate = FALSE)
model.fit |
|
var.estimate |
If |
If var.estimate = FALSE
, numeric value indicating the MSE; if
var.estimate = TRUE
, named numeric vector indicating both the MSE and
a variance estimate for the error variance.
# Generate 100 values: Y = 0.5 + 1.25 X + e, e ~ N(0, 1) set.seed(123) x <- rnorm(100) y <- 0.5 + 1.25 * x + rnorm(100, sd = 1) # Fit regression model using lm and using glm lm.fit <- lm(y ~ x) glm.fit <- glm(y ~ x) # Extract MSE from lm.fit and glm.fit get_mse(lm.fit) get_mse(glm.fit)
# Generate 100 values: Y = 0.5 + 1.25 X + e, e ~ N(0, 1) set.seed(123) x <- rnorm(100) y <- 0.5 + 1.25 * x + rnorm(100, sd = 1) # Fit regression model using lm and using glm lm.fit <- lm(y ~ x) glm.fit <- glm(y ~ x) # Extract MSE from lm.fit and glm.fit get_mse(lm.fit) get_mse(glm.fit)
Optionally resizes title and axis labels and saves a 1000 dpi png 5 inches
tall and ~9.5 inches wide via ggsave
. Values are
adjustable.
gg_facebook( filename = paste("fig1-", Sys.Date(), ".png", sep = ""), path = getwd(), plot = last_plot(), resize_labels = TRUE, title_size = 16, axistitle_size = 13, axistext_size = 9, height = 5, width = height/0.525, dpi = 1000 )
gg_facebook( filename = paste("fig1-", Sys.Date(), ".png", sep = ""), path = getwd(), plot = last_plot(), resize_labels = TRUE, title_size = 16, axistitle_size = 13, axistext_size = 9, height = 5, width = height/0.525, dpi = 1000 )
filename |
Character string |
path |
Character string |
plot |
|
resize_labels |
Logical value |
title_size |
Numeric value |
axistitle_size |
Numeric value |
axistext_size |
Numeric value |
height |
Numeric value |
width |
Numeric value |
dpi |
Numeric value |
Optionally resizes title and axis labels and saves a 1000 dpi png 6 inches
tall and 8 inches wide via ggsave
. Values are
adjustable.
gg_quicksave( filename = paste("fig1-", Sys.Date(), ".png", sep = ""), path = getwd(), plot = last_plot(), resize_labels = TRUE, title_size = 16, axistitle_size = 13, axistext_size = 9, height = 6, width = height/0.75, dpi = 1000 )
gg_quicksave( filename = paste("fig1-", Sys.Date(), ".png", sep = ""), path = getwd(), plot = last_plot(), resize_labels = TRUE, title_size = 16, axistitle_size = 13, axistext_size = 9, height = 6, width = height/0.75, dpi = 1000 )
filename |
Character string |
path |
Character string |
plot |
|
resize_labels |
Logical value |
title_size |
Numeric value |
axistitle_size |
Numeric value |
axistext_size |
Numeric value |
height |
Numeric value |
width |
Numeric value |
dpi |
Numeric value |
Optionally resizes title and axis labels and saves a 600 dpi png 5 inches
tall and ~8.9 inches wide via ggsave
. Values are
adjustable.
gg_twitter( filename = paste("fig1-", Sys.Date(), ".png", sep = ""), path = getwd(), plot = last_plot(), resize_labels = TRUE, title_size = 16, axistitle_size = 13, axistext_size = 9, height = 5, width = height/0.5625, dpi = 600 )
gg_twitter( filename = paste("fig1-", Sys.Date(), ".png", sep = ""), path = getwd(), plot = last_plot(), resize_labels = TRUE, title_size = 16, axistitle_size = 13, axistext_size = 9, height = 5, width = height/0.5625, dpi = 600 )
filename |
Character string |
path |
Character string |
plot |
|
resize_labels |
Logical value |
title_size |
Numeric value |
axistitle_size |
Numeric value |
axistext_size |
Numeric value |
height |
Numeric value |
width |
Numeric value |
dpi |
Numeric value |
Simply head
and tail
combined.
headtail(x, ...)
headtail(x, ...)
x |
Input object. |
... |
Same class as x
.
# Generate data from N(0, 1), sort, and look at smallest and largest 3 values x <- rnorm(1000) x.sorted <- sort(x) headtail(x.sorted, 3)
# Generate data from N(0, 1), sort, and look at smallest and largest 3 values x <- rnorm(1000) x.sorted <- sort(x) headtail(x.sorted, 3)
Similar to base R function hist
, but with two added
features: (1) Can overlay one or more fitted probability density/mass
functions (PDFs/PMFs) for any univariate distribution supported in R (see
Distributions
); and (2) Can generate more of a barplot
type histogram, where each possible value gets its own bin centered over its
value (useful for discrete variables with not too many possible values).
histo( x, dis = "none", dis_shift = NULL, integer_breaks = NULL, colors = rep("black", length(dis)), lty = 1:length(dis), legend_form = ifelse(length(dis) == 1, 0, 1), aic_decimals = 1, points_list = NULL, axis_list = NULL, legend_list = NULL, ... )
histo( x, dis = "none", dis_shift = NULL, integer_breaks = NULL, colors = rep("black", length(dis)), lty = 1:length(dis), legend_form = ifelse(length(dis) == 1, 0, 1), aic_decimals = 1, points_list = NULL, axis_list = NULL, legend_list = NULL, ... )
x |
Numeric vector of values. |
dis |
Character vector indicating which distributions should be used to
add fitted PDF/PMF to the histogram. If not
|
dis_shift |
Numeric value for shifting the fitted PDF/PMF along the x-axis of the histogram. |
integer_breaks |
If |
colors |
Character vector of colors for each PDF/PMF. |
lty |
Integer vector specifying line types for each curve. |
legend_form |
Integer value controlling what type of legend to include. Choices are 0 for no legend, 1 for legend naming each distribution, and 2 for legend naming each distribution and the corresponding AIC. |
aic_decimals |
Integer value for number of decimals for AIC. |
points_list |
Optional list of inputs to pass to
|
axis_list |
Optional list of inputs to pass to
|
legend_list |
Optional list of inputs to pass to
|
... |
May include arguments to pass to |
When x
takes on whole numbers, you typically want to set
dis_shift = -0.5
if right = TRUE
(hist
's default) and dis_shift = 0.5
if
right = FALSE
. The function will do this internally by default.
To illustrate, suppose a particular bin represents (7, 10]
. Its
midpoint will be at x = 8.5
on the graph. But if input values are
whole numbers, this bin really only includes values of 8, 9, and 10, which
have a mean of 9. So you really want f(9)
to appear at x = 8.5
.
This requires shifting the curve to the left 0.5 units, i.e. setting
dis_shift = -0.5
.
When x
takes on whole numbers with not too many unique values, you may
want the histogram to show one bin for each integer. You can do this by
setting integer_breaks = TRUE
. By default, the function sets
integer_breaks = TRUE
if x
contains whole numbers with 10 or
fewer unique values.
Histogram with fitted PDFs/PMFs if requested.
# Sample 10,000 Poisson(2) values and commpare default hist vs. histo set.seed(123) x <- rpois(n = 10000, lambda = 2) par(mfrow = c(1, 2)) hist(x, main = "hist function") histo(x, main = "histo function") # Sample 10,000 lognormal(0, 0.35) values. Create histogram with curves # showing fitted lognormal, normal, and Gamma PDFs. set.seed(123) x <- rlnorm(n = 10000, meanlog = 0, sdlog = 0.35) par(mfrow = c(1, 1)) histo(x, c("lnorm", "norm", "gamma"), main = "X ~ Lognormal(0, 0.35)") # Generate 10,000 Binomial(8, 0.25) values. Create histogram, specifying # size = 5, with blue line/points showing fitted PMF. set.seed(123) x <- rbinom(n = 10000, size = 5, prob = 0.25) par(mfrow = c(1, 1)) histo(x, dis = "binom", size = 5, colors = "blue", points_list = list(type = "b"))
# Sample 10,000 Poisson(2) values and commpare default hist vs. histo set.seed(123) x <- rpois(n = 10000, lambda = 2) par(mfrow = c(1, 2)) hist(x, main = "hist function") histo(x, main = "histo function") # Sample 10,000 lognormal(0, 0.35) values. Create histogram with curves # showing fitted lognormal, normal, and Gamma PDFs. set.seed(123) x <- rlnorm(n = 10000, meanlog = 0, sdlog = 0.35) par(mfrow = c(1, 1)) histo(x, c("lnorm", "norm", "gamma"), main = "X ~ Lognormal(0, 0.35)") # Generate 10,000 Binomial(8, 0.25) values. Create histogram, specifying # size = 5, with blue line/points showing fitted PMF. set.seed(123) x <- rbinom(n = 10000, size = 5, prob = 0.25) par(mfrow = c(1, 1)) histo(x, dis = "binom", size = 5, colors = "blue", points_list = list(type = "b"))
Returns TRUE
if x
falls inside range defined by ends
and FALSE
otherwise. Also works for multiple sets of values and/or
endpoints.
inside(x, ends, inclusive = TRUE)
inside(x, ends, inclusive = TRUE)
x |
Numeric value or vector of numeric values. |
ends |
Numeric vector of length 2 specifying the endpoints for the interval, or a 2-column numeric matrix where each row specifies a pair of endpoints. |
inclusive |
Logical value indicating whether endpoints should be included. |
Logical value or vector.
# Check whether 2 is inside [0, 2.5] inside(1, c(0, 2.5)) # Check whether 2 and 3 are inside (0, 3) inside(c(2, 3), c(0, 3), inclusive = FALSE) # Check whether 1 is inside [1, 2] and [3, 4] inside(1, rbind(c(1, 2), c(3, 4)))
# Check whether 2 is inside [0, 2.5] inside(1, c(0, 2.5)) # Check whether 2 and 3 are inside (0, 3) inside(c(2, 3), c(0, 3), inclusive = FALSE) # Check whether 1 is inside [1, 2] and [3, 4] inside(1, rbind(c(1, 2), c(3, 4)))
Splits a continuous variable into equal-width groups. Useful for assessing linearity in regression models.
interval_groups(x, groups = 5, ...)
interval_groups(x, groups = 5, ...)
x |
Numeric vector. |
groups |
Numeric value specifying number of groups to create. |
... |
Arguments to pass to |
Factor variable.
# Convert values from N(0, 1) into 6 equal-width groups x <- rnorm(1000) groups <- interval_groups(x, 6) table(groups) # Use interval_groups to detect non-linearity set.seed(123) x <- rnorm(1000) y <- 1.5 + 1.25 * x + 0.25 * x^2 + rnorm(1000) plot(tapply(y, interval_groups(x), mean))
# Convert values from N(0, 1) into 6 equal-width groups x <- rnorm(1000) groups <- interval_groups(x, 6) table(groups) # Use interval_groups to detect non-linearity set.seed(123) x <- rnorm(1000) y <- 1.5 + 1.25 * x + 0.25 * x^2 + rnorm(1000) plot(tapply(y, interval_groups(x), mean))
Same idea as purrr::pmap, but with some different
functionality. It can runs all combinations of vector-valued arguments in
...
or the 1st set, 2nd set, and so forth, and multiple trials can be
run for each scenario, which can be useful for simulations.
iterate( f, ..., all_combinations = TRUE, fix = NULL, trials = 1, varnames = NULL )
iterate( f, ..., all_combinations = TRUE, fix = NULL, trials = 1, varnames = NULL )
f |
A function. |
... |
Arguments to |
all_combinations |
Logical value for whether to iterate over all
combinations of arguments in |
fix |
List of arguments to |
trials |
Numeric value. |
varnames |
Character vector of names for values that |
Data frame.
# Define function to generate data from N(mu, sigsq) and perform t-test. f <- function(n = 100, mu = 0, sigsq = 1, alpha = 0.05) { x <- rnorm(n = n, mean = mu, sd = sqrt(sigsq)) fit <- t.test(x = x, alpha = alpha) return(list(t = fit$statistic, p = fit$p.value)) } # Call f once for various sample sizes and means f %>% iterate(n = c(100, 500), mu = c(0.1, 0.25)) # Run 100 trials for each scenario and calculate empirical power f %>% iterate(n = c(100, 500), mu = c(0.1, 0.25), trials = 100) %>% group_by(n, mu) %>% summarise(mean(p < 0.05))
# Define function to generate data from N(mu, sigsq) and perform t-test. f <- function(n = 100, mu = 0, sigsq = 1, alpha = 0.05) { x <- rnorm(n = n, mean = mu, sd = sqrt(sigsq)) fit <- t.test(x = x, alpha = alpha) return(list(t = fit$statistic, p = fit$p.value)) } # Call f once for various sample sizes and means f %>% iterate(n = c(100, 500), mu = c(0.1, 0.25)) # Run 100 trials for each scenario and calculate empirical power f %>% iterate(n = c(100, 500), mu = c(0.1, 0.25), trials = 100) %>% group_by(n, mu) %>% summarise(mean(p < 0.05))
Adds each element of list2
to list1
, overriding any elements of
the same name. Similar to modifyList
function in
utils package, but either list can be NULL
. Useful for
do.call
statements, when you want to combine a list of
default inputs with a list of user-specified inputs.
list_override(list1, list2)
list_override(list1, list2)
list1 |
Initial list that has some number of named elements. Can be
|
list2 |
List with named elements that will be added to |
List containing the named elements initially in list1
and not
in list2
, any additional named elements in list2
, and any named
elements in list1
that were replaced by elements of the same name in
list2
.
# Create list that has default inputs to the plot function list.defaults <- list(x = 1: 5, y = 1: 5, type = "l", lty = 1) # Create list of user-specified inputs to the plot function list.user <- list(main = "A Straight Line", lty = 2, lwd = 1.25) # Combine the two lists into one, giving priority to list.user list.combined <- list_override(list.defaults, list.user) # Plot data using do.call do.call(plot, list.combined)
# Create list that has default inputs to the plot function list.defaults <- list(x = 1: 5, y = 1: 5, type = "l", lty = 1) # Create list of user-specified inputs to the plot function list.user <- list(main = "A Straight Line", lty = 2, lwd = 1.25) # Combine the two lists into one, giving priority to list.user list.combined <- list_override(list.defaults, list.user) # Plot data using do.call do.call(plot, list.combined)
Defined as: exp_x <- exp(x); out <- exp_x / (1 + exp_x)
. This 2-step
approach is faster than exp(x) / (1 + exp(x))
because the exponentials
only have to be calculated once.
logit_prob(x)
logit_prob(x)
x |
Numeric vector. |
Numeric vector.
Uses maximum likelihood to fit
Y|X ~ Lognormal(beta_0 + beta_x^T X, sigsq). Y
can be precisely measured or subject to multiplicative mean-1 lognormal
errors, in which case replicates can be incorporated by specifying y
as a list).
lognormalreg( y, x = NULL, merror = FALSE, estimate_var = TRUE, fix_posdef = FALSE, ... )
lognormalreg( y, x = NULL, merror = FALSE, estimate_var = TRUE, fix_posdef = FALSE, ... )
y |
Numeric vector or list. |
x |
Numeric vector or matrix. If |
merror |
Logical value for whether to model multiplicative lognormal measurement errors in Y. |
estimate_var |
Logical value for whether to return Hessian-based variance-covariance matrix. |
fix_posdef |
Logical value for whether to repeatedly reduce
|
... |
Additional arguments to pass to |
List containing:
Numeric vector of parameter estimates.
Variance-covariance matrix (if estimate_var = TRUE
).
Returned nlminb
object from maximizing the
log-likelihood function.
Akaike information criterion (AIC).
Creates plot showing sample log-odds of binary Y variable across levels of a
grouping variable, with customizable error bars. Observations with missing
values for y
and/or group
are dropped.
logodds_graph( y, group, error.bars = "none", alpha = 0.05, p.legend = "chi", plot.list = NULL, lines.list = NULL, axis.list = NULL, legend.list = NULL, ... )
logodds_graph( y, group, error.bars = "none", alpha = 0.05, p.legend = "chi", plot.list = NULL, lines.list = NULL, axis.list = NULL, legend.list = NULL, ... )
y |
Vector of values for binary response variable. Must take on 2
values, but can be any type (e.g. numeric, character, factor, logical).
Function plots log-odds of second value returned by |
group |
Vector of values indicating what group each |
error.bars |
Character string indicating what the error bars should
represent. Possible values are |
alpha |
Numeric value indicating what alpha should be set to for
confidence intervals. Only used if |
p.legend |
Character string controlling what p-value is printed in a
legend. Possible values are |
plot.list |
Optional list of inputs to pass to
|
lines.list |
Optional list of inputs to pass to
|
axis.list |
Optional list of inputs to pass to
|
legend.list |
Optional list of inputs to pass to
|
... |
Additional arguments to pass to |
Plot showing log-odds of y
across levels of group
.
Written in C++, this function tends to run faster than max
for large
numeric vectors/matrices.
max_n(x)
max_n(x)
x |
Numeric vector. |
Numeric value.
# For large objects, max_n is faster than max x <- rnorm(100000) max(x) == max_n(x) benchmark(max(x), max_n(x), replications = 1000) # For smaller objects, max_n is slower than max x <- rnorm(100) max(x) == max_n(x) benchmark(max(x), max_n(x), replications = 1000)
# For large objects, max_n is faster than max x <- rnorm(100000) max(x) == max_n(x) benchmark(max(x), max_n(x), replications = 1000) # For smaller objects, max_n is slower than max x <- rnorm(100) max(x) == max_n(x) benchmark(max(x), max_n(x), replications = 1000)
Written in C++, this function runs faster than mean
for
large integer vectors/matrices.
mean_i(x)
mean_i(x)
x |
Integer vector or matrix. |
Numeric value.
# For very large integer objects, sum_i is faster than sum x <- rpois(100000, lambda = 5) mean(x) == mean_i(x) benchmark(mean(x), mean_i(x), replications = 1000) # For smaller integer objects, sum_i is slower than sum x <- rpois(1000, lambda = 5) mean(x) == mean_i(x) benchmark(mean(x), mean_i(x), replications = 1000)
# For very large integer objects, sum_i is faster than sum x <- rpois(100000, lambda = 5) mean(x) == mean_i(x) benchmark(mean(x), mean_i(x), replications = 1000) # For smaller integer objects, sum_i is slower than sum x <- rpois(1000, lambda = 5) mean(x) == mean_i(x) benchmark(mean(x), mean_i(x), replications = 1000)
Creates plot showing mean of Y variable across levels of a grouping variable,
with customizable error bars. Observations with missing values for y
and/or group
are dropped.
means_graph( y, group, error.bars = "t.ci", alpha = 0.05, p.legend = TRUE, plot.list = NULL, lines.list = NULL, axis.list = NULL, legend.list = NULL, ... )
means_graph( y, group, error.bars = "t.ci", alpha = 0.05, p.legend = TRUE, plot.list = NULL, lines.list = NULL, axis.list = NULL, legend.list = NULL, ... )
y |
Numeric vector of values for the continuous variable. |
group |
Vector of values indicating what group each |
error.bars |
Character string indicating what the error bars should
represent. Possible values are |
alpha |
Numeric value indicating what alpha should be set to for
confidence intervals. Only used if |
p.legend |
If |
plot.list |
Optional list of inputs to pass to
|
lines.list |
Optional list of inputs to pass to
|
axis.list |
Optional list of inputs to pass to
|
legend.list |
Optional list of inputs to pass to
|
... |
Plot showing mean of y
across levels of group
.
Written in C++, this function tends to run faster than min
for large
numeric vectors/matrices.
min_n(x)
min_n(x)
x |
Numeric vector. |
Numeric value.
# For large objects, min_n is faster than min x <- rnorm(100000) min(x) == min_n(x) benchmark(min(x), min_n(x), replications = 1000) # For smaller objects, min_n is slower than min x <- rnorm(100) min(x) == min_n(x) benchmark(min(x), min_n(x), replications = 20000)
# For large objects, min_n is faster than min x <- rnorm(100000) min(x) == min_n(x) benchmark(min(x), min_n(x), replications = 1000) # For smaller objects, min_n is slower than min x <- rnorm(100) min(x) == min_n(x) benchmark(min(x), min_n(x), replications = 20000)
Performs maximization via nlminb
. alpha and beta
correspond to the shape and scale (not shape and rate) parameters described
in GammaDist
.
mle_gamma(x, alpha = NULL, beta = NULL, estimate_var = FALSE, ...)
mle_gamma(x, alpha = NULL, beta = NULL, estimate_var = FALSE, ...)
x |
Numeric vector. |
alpha |
Numeric value specifying known alpha. |
beta |
Numeric value specifying known beta. |
estimate_var |
Logical value for whether to return Hessian-based variance-covariance matrix. |
... |
Additional arguments to pass to |
List containing:
Numeric vector of parameter estimates.
Variance-covariance matrix (if estimate_var = TRUE
).
Returned nlminb
object from maximizing the
log-likelihood function.
Akaike information criterion (AIC).
# Generate 1,000 values from Gamma(0.5, 1) and estimate alpha and beta set.seed(123) x <- rgamma(1000, shape = 0.5, scale = 1) mle_gamma(x)
# Generate 1,000 values from Gamma(0.5, 1) and estimate alpha and beta set.seed(123) x <- rgamma(1000, shape = 0.5, scale = 1) mle_gamma(x)
Each observation is assumed to be the product of a Gamma(alpha, beta) and
Lognormal(mu, sigsq) random variable. Performs maximization via
nlminb
. alpha and beta correspond to the shape and scale
(not shape and rate) parameters described in GammaDist
,
and mu and sigsq correspond to meanlog and sdlog^2 in
Lognormal
.
mle_gamma_lnorm( x, gamma_mean1 = FALSE, lnorm_mean1 = TRUE, integrate_tol = 1e-08, estimate_var = FALSE, ... )
mle_gamma_lnorm( x, gamma_mean1 = FALSE, lnorm_mean1 = TRUE, integrate_tol = 1e-08, estimate_var = FALSE, ... )
x |
Numeric vector. |
gamma_mean1 |
Whether to use restriction that the Gamma variable is mean-1. |
lnorm_mean1 |
Whether to use restriction that the lognormal variable is mean-1. |
integrate_tol |
Numeric value specifying the |
estimate_var |
Logical value for whether to return Hessian-based variance-covariance matrix. |
... |
Additional arguments to pass to |
List containing:
Numeric vector of parameter estimates.
Variance-covariance matrix (if estimate_var = TRUE
).
Returned nlminb
object from maximizing the
log-likelihood function.
Akaike information criterion (AIC).
# Generate 1,000 values from Gamma(0.5, 1) x Lognormal(-1.5/2, 1.5) and # estimate parameters ## Not run: set.seed(123) x <- rgamma(1000, 0.5, 1) * rlnorm(1000, -1.5/2, sqrt(1.5)) mle_gamma_lnorm(x, control = list(trace = 1)) ## End(Not run)
# Generate 1,000 values from Gamma(0.5, 1) x Lognormal(-1.5/2, 1.5) and # estimate parameters ## Not run: set.seed(123) x <- rgamma(1000, 0.5, 1) * rlnorm(1000, -1.5/2, sqrt(1.5)) mle_gamma_lnorm(x, control = list(trace = 1)) ## End(Not run)
Performs maximization via nlminb
. mu and sigsq
correspond to meanlog and sdlog^2 in Lognormal
.
mle_lnorm(x, mu = NULL, sigsq = NULL, estimate_var = FALSE, ...)
mle_lnorm(x, mu = NULL, sigsq = NULL, estimate_var = FALSE, ...)
x |
Numeric vector. |
mu |
Numeric value specifying known mu. |
sigsq |
Numeric value specifying known sigsq. |
estimate_var |
Logical value for whether to return Hessian-based variance-covariance matrix. |
... |
Additional arguments to pass to |
List containing:
Numeric vector of parameter estimates.
Variance-covariance matrix (if estimate_var = TRUE
).
Returned nlminb
object from maximizing the
log-likelihood function.
Akaike information criterion (AIC).
# Generate 1,000 values from Lognormal(0.5, 1) and estimate mu and sigsq set.seed(123) x <- rlnorm(1000, meanlog = 0.5, sdlog = sqrt(1)) mle_lnorm(x)
# Generate 1,000 values from Lognormal(0.5, 1) and estimate mu and sigsq set.seed(123) x <- rlnorm(1000, meanlog = 0.5, sdlog = sqrt(1)) mle_lnorm(x)
Each observation is assumed to be the product of a Lognormal(mu1, sigsq1) and
Lognormal(mu2, sigsq2) random variable, with mu2 and sigsq2 known. Performs
maximization via nlminb
. mu and sigsq correspond to
meanlog and sdlog^2 in Lognormal
.
mle_lnorm_lnorm(x, mu2 = NULL, sigsq2 = NULL, estimate_var = FALSE, ...)
mle_lnorm_lnorm(x, mu2 = NULL, sigsq2 = NULL, estimate_var = FALSE, ...)
x |
Numeric vector. |
mu2 |
Numeric value specifying known mu2. |
sigsq2 |
Numeric value specifying known sigsq2. |
estimate_var |
Logical value for whether to return Hessian-based variance-covariance matrix. |
... |
Additional arguments to pass to |
List containing:
Numeric vector of parameter estimates.
Variance-covariance matrix (if estimate_var = TRUE
).
Returned nlminb
object from maximizing the
log-likelihood function.
Akaike information criterion (AIC).
# Generate 1,000 values from Lognormal(0.5, 1) x Lognormal(0.75, 1.5) and # estimate parameters based on known mu and sigsq for one of them set.seed(123) x <- rlnorm(1000, 0.5, sqrt(1)) * rlnorm(1000, 0.75, sqrt(1.5)) mle_lnorm_lnorm(x, mu2 = 0.75, sigsq2 = 1.5)
# Generate 1,000 values from Lognormal(0.5, 1) x Lognormal(0.75, 1.5) and # estimate parameters based on known mu and sigsq for one of them set.seed(123) x <- rlnorm(1000, 0.5, sqrt(1)) * rlnorm(1000, 0.75, sqrt(1.5)) mle_lnorm_lnorm(x, mu2 = 0.75, sigsq2 = 1.5)
Calculates moving averages or maximum moving average. For optimal speed, use
integer = TRUE
if x
is an integer vector and
integer = FALSE
otherwise.
moving_mean(x, window, integer = FALSE, max = FALSE)
moving_mean(x, window, integer = FALSE, max = FALSE)
x |
Integer or numeric vector. |
window |
Integer value specifying window length. |
integer |
Logical value for whether |
max |
Logical value for whether to return maximum moving average (as opposed to vector of moving averages). |
Numeric value or vector depending on max
.
# 5-unit moving average for integer vector of length 10 x <- rpois(10, lambda = 3) moving_mean(x, 5)
# 5-unit moving average for integer vector of length 10 x <- rpois(10, lambda = 3) moving_mean(x, 5)
Same idea as power.t.test
. Less flexible, but faster.
n_2t_equal(d, sigsq, alpha = 0.05, beta = 0.2)
n_2t_equal(d, sigsq, alpha = 0.05, beta = 0.2)
d |
Numeric value specifying true difference in group means. |
sigsq |
Numeric value specifying the variance of observations. |
alpha |
Numeric value specifying type-1 error rate. |
beta |
Numeric value specifying type-2 error rate. |
Numeric value indicating per-group sample size, rounded up to the nearest whole number.
# Per-group sample size for 90% power to detect difference of 0.2 with # sigsq = 1 n_2t_equal(d = 0.2, sigsq = 1, beta = 0.1)
# Per-group sample size for 90% power to detect difference of 0.2 with # sigsq = 1 n_2t_equal(d = 0.2, sigsq = 1, beta = 0.1)
Unequal variance version of n_2t_equal
. Assumes an equal sample
size for both groups, which is actually not optimal.
n_2t_unequal(d, sigsq1, sigsq2, alpha = 0.05, beta = 0.2)
n_2t_unequal(d, sigsq1, sigsq2, alpha = 0.05, beta = 0.2)
d |
Numeric value specifying true difference in group means. |
sigsq1 , sigsq2
|
Numeric value specifying the variance of observations in each group. |
alpha |
Numeric value specifying type-1 error rate. |
beta |
Numeric value specifying type-2 error rate. |
Numeric value indicating per-group sample size, rounded up to the nearest whole number.
# Per-group sample size for 90% power to detect difference of 0.2 with # sigsq's of 1 and 1.25 n_2t_unequal(d = 0.2, sigsq1 = 1, sigsq2 = 1.25, beta = 0.1)
# Per-group sample size for 90% power to detect difference of 0.2 with # sigsq's of 1 and 1.25 n_2t_unequal(d = 0.2, sigsq1 = 1, sigsq2 = 1.25, beta = 0.1)
Defined simply as log(x / (x + 1))
.
odds_prob(x)
odds_prob(x)
x |
Numeric vector. |
Numeric vector.
Generates plot of log-likelihood vs. one parameter of interest while other parameters are held fixed at certain values (e.g. MLEs). This is not a profile likelihood, and is mainly intended for use with a Shiny app.
plot_ll( start, objective, lower = -Inf, upper = Inf, xaxis_param = 1, xaxis_range = NULL, param_values = NULL, mles = NULL, return_info = FALSE )
plot_ll( start, objective, lower = -Inf, upper = Inf, xaxis_param = 1, xaxis_range = NULL, param_values = NULL, mles = NULL, return_info = FALSE )
start |
See |
objective |
See |
lower |
See |
upper |
See |
xaxis_param |
Integer value specifying which parameter should be plotted on the x-axis. |
xaxis_range |
Numeric vector specifying x-axis range over which to vary the parameter of interest. Only values with likelihood ratio > 0.01 are ultimately plotted. |
param_values |
Numeric vector of values to use for other parameters in
model, in case you want an additional curve for log-likelihood function vs.
parameter of interest at certain non-MLE values for other parameters. For
example, if there are 3 parameters and |
mles |
Numeric vector of previously obtained maximum likelihood estimates. |
return_info |
Logical value for whether to return the estimated MLEs and 99.99% confidence intervals for parameters rather than create the plot. |
Note that objective
should be the negative log-likelihood function,
since internal optimization uses (nlminb
), which does
minimization.
Plot of log-likelihood vs. value of parameter of interest, generated
by ggplot
.
# Generate normal data, define log-likelihood function, and plot likelihood set.seed(123) x <- rnorm(100, mean = 0.5, sd = sqrt(0.25)) ll.f <- function(theta) { return(-sum(dnorm(x, log = TRUE, mean = theta[1], sd = sqrt(theta[2])))) } plot_ll(start = c(0, 1), objective = ll.f, lower = c(-Inf, 1e-6))
# Generate normal data, define log-likelihood function, and plot likelihood set.seed(123) x <- rnorm(100, mean = 0.5, sd = sqrt(0.25)) ll.f <- function(theta) { return(-sum(dnorm(x, log = TRUE, mean = theta[1], sd = sqrt(theta[2])))) } plot_ll(start = c(0, 1), objective = ll.f, lower = c(-Inf, 1e-6))
Calculates pooled sample variance used in equal variance two-sample t-test.
pooled_var(x, y, integer = FALSE)
pooled_var(x, y, integer = FALSE)
x , y
|
Integer or numeric vectors. |
integer |
Logical value for whether |
Numeric value.
Same idea as power.t.test
. Less flexible, but faster.
power_2t_equal(n = 100, d, sigsq, alpha = 0.05)
power_2t_equal(n = 100, d, sigsq, alpha = 0.05)
n |
Numeric value specifying per-group sample size. |
d |
Numeric value specifying true difference in group means. Should be positive. |
sigsq |
Numeric value specifying the variance of observations. |
alpha |
Numeric value specifying type-1 error rate. |
Numeric value.
# Power to detect difference of 0.2 with 100 subjects per group and sigsq = 1 power_2t_equal(n = 100, d = 0.2, sigsq = 1)
# Power to detect difference of 0.2 with 100 subjects per group and sigsq = 1 power_2t_equal(n = 100, d = 0.2, sigsq = 1)
Unequal variance version of power_2t_equal
. Assumes an equal
sample size for both groups, which is actually not optimal.
power_2t_unequal(n = 100, d, sigsq1, sigsq2, alpha = 0.05)
power_2t_unequal(n = 100, d, sigsq1, sigsq2, alpha = 0.05)
n |
Numeric value specifying per-group sample size. |
d |
Numeric value specifying true difference in group means. Should be positive. |
sigsq1 , sigsq2
|
Numeric value specifying the variance of observations in each group. |
alpha |
Numeric value specifying type-1 error rate. |
Numeric value.
# Power to detect difference of 0.2 with 100 subjects per group and sigsq's # of 1 and 1.25 power_2t_unequal(n = 100, d = 0.2, sigsq1 = 1, sigsq2 = 1.25)
# Power to detect difference of 0.2 with 100 subjects per group and sigsq's # of 1 and 1.25 power_2t_unequal(n = 100, d = 0.2, sigsq1 = 1, sigsq2 = 1.25)
Defined simply as log(x / (1 - x))
.
prob_logit(x)
prob_logit(x)
x |
Numeric vector. |
Numeric vector.
Defined simply as x / (1 - x)
.
prob_odds(x)
prob_odds(x)
x |
Numeric vector. |
Numeric vector.
Splits a continuous variable into quantiles groups. Basically combines
quantile
and cut
into a single
function. Note that create_qgroups
will likely supersede this
function in future versions of dvmisc.
quant_groups( x, groups = 4, probs = NULL, quantile.list = NULL, cut.list = NULL )
quant_groups( x, groups = 4, probs = NULL, quantile.list = NULL, cut.list = NULL )
x |
Numeric vector. |
groups |
Numeric value specifying number of quantile groups. |
probs |
Numeric vector specifying probabilities. |
quantile.list |
Arguments to pass to |
cut.list |
Arguments to pass to |
Factor variable.
# Convert values from N(0, 1) into quintiles (i.e. 5 groups) x <- rnorm(1000) groups <- quant_groups(x, 5) table(groups)
# Convert values from N(0, 1) into quintiles (i.e. 5 groups) x <- rnorm(1000) groups <- quant_groups(x, 5) table(groups)
Complex survey version of quant_groups
. Speeds up process of
creating quantile groups based on survey weighted percentiles.
quant_groups_svy(x, by = NULL, groups = 4, probs = NULL, design)
quant_groups_svy(x, by = NULL, groups = 4, probs = NULL, design)
x |
Formula, e.g. |
by |
Formula, e.g. |
groups |
Numeric value specifying number of quantile groups. |
probs |
Numeric vector. |
design |
A |
Factor variable.
Calls remove_all_labels
and then restores special
variable names.
ral(x)
ral(x)
x |
Vector or data frame. |
Vector or data frame.
Convenience function to get reversed factor levels from cut
.
Currently requires specifying breaks
as vector of cutpoints rather
than number of desired intervals.
reverse_cut(x, breaks, include.lowest = FALSE, right = TRUE, ...)
reverse_cut(x, breaks, include.lowest = FALSE, right = TRUE, ...)
x , breaks , include.lowest , right
|
See |
... |
Arguments to pass to |
Factor variable.
# In mtcars dataset, create 3 mpg groups table(cut(mtcars$mpg, breaks = c(-Inf, 15, 20, Inf))) # Repeat with reverse_cut to get factor levels ordered from high to low table(reverse_cut(mtcars$mpg, breaks = c(Inf, 20, 15, -Inf))) # You can specify breaks from low to high, but then include.lowest and right # arguments get confusing table(reverse_cut(mtcars$mpg, breaks = c(-Inf, 15, 20, Inf), right = TRUE))
# In mtcars dataset, create 3 mpg groups table(cut(mtcars$mpg, breaks = c(-Inf, 15, 20, Inf))) # Repeat with reverse_cut to get factor levels ordered from high to low table(reverse_cut(mtcars$mpg, breaks = c(Inf, 20, 15, -Inf))) # You can specify breaks from low to high, but then include.lowest and right # arguments get confusing table(reverse_cut(mtcars$mpg, breaks = c(-Inf, 15, 20, Inf), right = TRUE))
Uses C++ code for efficiency.
sliding_cor(short, long)
sliding_cor(short, long)
short |
Numeric vector. |
long |
Numeric vector. |
Numeric vector.
short <- rnorm(4) long <- rnorm(10) sliding_cor(short, long)
short <- rnorm(4) long <- rnorm(10) sliding_cor(short, long)
Uses C++ code for efficiency.
sliding_cov(short, long)
sliding_cov(short, long)
short |
Numeric vector. |
long |
Numeric vector. |
Numeric vector.
short <- rnorm(4) long <- rnorm(10) sliding_cov(short, long)
short <- rnorm(4) long <- rnorm(10) sliding_cov(short, long)
Calls kable
and kable_styling
as appropriate based on class of input object and creates object ready to
snip or copy/paste out of R. Currently works for data frames, matrices, and
tables.
snip(x)
snip(x)
x |
Object to snip |
# Print 4-cylinder cars data(mtcars) mtcars %>% dplyr::filter(cyl == 4) %>% snip() # Crosstab of gears and cylinder using mtcars dataset snip(table(mtcars$cyl, mtcars$gear))
# Print 4-cylinder cars data(mtcars) mtcars %>% dplyr::filter(cyl == 4) %>% snip() # Crosstab of gears and cylinder using mtcars dataset snip(table(mtcars$cyl, mtcars$gear))
Written in C++, this function runs faster than sum
for
large integer vectors/matrices.
sum_i(x)
sum_i(x)
x |
Integer vector or matrix. |
Numeric value.
# For very large integer objects, sum_i is faster than sum x <- rpois(100000, lambda = 5) sum(x) == sum_i(x) benchmark(sum(x), sum_i(x), replications = 1000) # For smaller integer objects, sum_i is slower than sum x <- rpois(1000, lambda = 5) sum(x) == sum_i(x) benchmark(sum(x), sum_i(x), replications = 1000)
# For very large integer objects, sum_i is faster than sum x <- rpois(100000, lambda = 5) sum(x) == sum_i(x) benchmark(sum(x), sum_i(x), replications = 1000) # For smaller integer objects, sum_i is slower than sum x <- rpois(1000, lambda = 5) sum(x) == sum_i(x) benchmark(sum(x), sum_i(x), replications = 1000)
Creates table summarizing results of statistical simulations, providing common metrics of performance like mean bias, standard deviation, mean standard error, mean squared error, and confidence interval coverage.
sumsim( estimates, ses = NULL, truth = NULL, theta_0 = 0, statistics = c("mean_bias", "sd", "mean_se", "mse", "coverage"), alpha = 0.05, digits = 3, listwise_deletion = TRUE )
sumsim( estimates, ses = NULL, truth = NULL, theta_0 = 0, statistics = c("mean_bias", "sd", "mean_se", "mse", "coverage"), alpha = 0.05, digits = 3, listwise_deletion = TRUE )
estimates |
Numeric matrix where each column gives the point estimates for a particular method across multiple trials. |
ses |
Numeric matrix where each column gives the standard errors for a particular method across multiple trials. |
truth |
Numeric value specifying the true value of the parameter being estimated. |
theta_0 |
Numeric value specifying null value for hypothesis test
|
statistics |
Numeric vector specifying which performance metrics should
be calculated. Possible values are |
alpha |
Numeric value specifying alpha for confidence interval. Set to
|
digits |
Numeric value or vector specifying the number of decimal places to include. |
listwise_deletion |
Logical value for whether to remove trials in which any of the estimators have missing values. |
Numeric matrix.
# For X ~ N(mu, sigma^2), the MLE for sigma^2 is the sample variance with n # in the denominator, but the unbiased version with (n - 1) is typically used # for its unbiasedness. Compare these estimators in 1,000 trials with n = 25. MLE <- c() Unbiased <- c() for (ii in 1: 1000) { x <- rnorm(n = 25) MLE[ii] <- sum((x - mean(x))^2) / 25 Unbiased[ii] <- sum((x - mean(x))^2) / 24 } sumsim(estimates = cbind(MLE, Unbiased), truth = 1)
# For X ~ N(mu, sigma^2), the MLE for sigma^2 is the sample variance with n # in the denominator, but the unbiased version with (n - 1) is typically used # for its unbiasedness. Compare these estimators in 1,000 trials with n = 25. MLE <- c() Unbiased <- c() for (ii in 1: 1000) { x <- rnorm(n = 25) MLE[ii] <- sum((x - mean(x))^2) / 25 Unbiased[ii] <- sum((x - mean(x))^2) / 24 } sumsim(estimates = cbind(MLE, Unbiased), truth = 1)
Returns input vector with tail values trimmed off of it. User can specify tail probability to trim or lower and upper cutpoints for values to retain.
trim(x, p = NULL, tails = "both", cutpoints = NULL, keep.edge = TRUE)
trim(x, p = NULL, tails = "both", cutpoints = NULL, keep.edge = TRUE)
x |
Numeric vector. |
p |
Numeric value giving tail probability to trim from |
tails |
Numeric value indicating which tail should be trimmed. Possible
values are |
cutpoints |
Numeric vector indicating what range of values should be
retained. For example, set to |
keep.edge |
Logical value indicating whether values in |
Numeric vector.
# Generate data from N(0, 1) and then trim the lower and upper 1\% x <- rnorm(1000) y <- trim(x, p = 0.01) # Generate data from N(0, 1) and then trim values outside of (-1.5, 1.5) x <- rnorm(100000) y <- trim(x, cutpoints = c(-1.5, 1.5))
# Generate data from N(0, 1) and then trim the lower and upper 1\% x <- rnorm(1000) y <- trim(x, p = 0.01) # Generate data from N(0, 1) and then trim values outside of (-1.5, 1.5) x <- rnorm(100000) y <- trim(x, cutpoints = c(-1.5, 1.5))
The base R function range
returns the minimum and maximum
of a vector, but the "range" is actually defined as the difference between
the minimum and maximum. This function calculates the actual range. It is
equivalent to the base R code diff(range(x))
, but a bit simpler and
much faster.
truerange(x, integer = FALSE)
truerange(x, integer = FALSE)
x |
Integer or numeric vector. |
integer |
Logical value for whether |
Integer or numeric value.
# truerange vs. diff(range()) for integer vector x <- rpois(1000, lambda = 5) all.equal(diff(range(x)), truerange(x, TRUE)) benchmark(diff(range(x)), truerange(x, TRUE), replications = 2000) # truerange vs. diff(range()) for numeric vector x <- rnorm(1000) all.equal(diff(range(x)), truerange(x)) benchmark(diff(range(x)), truerange(x), replications = 2000)
# truerange vs. diff(range()) for integer vector x <- rpois(1000, lambda = 5) all.equal(diff(range(x)), truerange(x, TRUE)) benchmark(diff(range(x)), truerange(x, TRUE), replications = 2000) # truerange vs. diff(range()) for numeric vector x <- rnorm(1000) all.equal(diff(range(x)), truerange(x)) benchmark(diff(range(x)), truerange(x), replications = 2000)
Written in C++, this function tends to run much faster than the equivalent
(if maximum is unique) base R solution
which(x == max(x), arr.ind = TRUE)
.
which_max_im(x)
which_max_im(x)
x |
Integer matrix. |
For optimal speed, choose the version of this function that matches the
class of your x
:
which_max_nv
for numeric vector. which_max_iv
for integer vector. which_max_nm
for numeric matrix. which_max_im
for integer matrix.
Integer vector.
# which_max_im is typically much faster than # which(x == max(x), arr.ind = TRUE) x <- matrix(rpois(100, lambda = 15), ncol = 10) all(which(x == max(x), arr.ind = TRUE) == which_max_im(x)) benchmark(which(x == max(x), arr.ind = TRUE), which_max_im(x), replications = 5000)
# which_max_im is typically much faster than # which(x == max(x), arr.ind = TRUE) x <- matrix(rpois(100, lambda = 15), ncol = 10) all(which(x == max(x), arr.ind = TRUE) == which_max_im(x)) benchmark(which(x == max(x), arr.ind = TRUE), which_max_im(x), replications = 5000)
Written in C++, this function tends to run faster than which.max
for
large integer vectors.
which_max_iv(x)
which_max_iv(x)
x |
Integer vector. |
For optimal speed, choose the version of this function that matches the
class of your x
:
which_max_nv
for numeric vector. which_max_iv
for integer vector. which_max_nm
for numeric matrix. which_max_im
for integer matrix.
Integer value.
# For long vectors, which_max_iv is faster than which.max x <- rpois(10000, lambda = 15) which.max(x) == which_max_iv(x) benchmark(which.max(x), which_max_iv(x), replications = 5000) # For shorter vectors, which_max_iv is slower than which.max x <- rpois(100, lambda = 15) which.max(x) == which_max_iv(x) benchmark(which.max(x), which_max_iv(x), replications = 20000)
# For long vectors, which_max_iv is faster than which.max x <- rpois(10000, lambda = 15) which.max(x) == which_max_iv(x) benchmark(which.max(x), which_max_iv(x), replications = 5000) # For shorter vectors, which_max_iv is slower than which.max x <- rpois(100, lambda = 15) which.max(x) == which_max_iv(x) benchmark(which.max(x), which_max_iv(x), replications = 20000)
Written in C++, this function tends to run much faster than the equivalent
(if maximum is unique) base R solution
which(x == max(x), arr.ind = TRUE)
.
which_max_nm(x)
which_max_nm(x)
x |
Numeric matrix. |
For optimal speed, choose the version of this function that matches the
class of your x
:
which_max_nv
for numeric vector. which_max_iv
for integer vector. which_max_nm
for numeric matrix. which_max_im
for integer matrix.
Integer vector.
# which_max_nm is typically much faster than # which(x == max(x), arr.ind = TRUE) x <- matrix(rnorm(100), ncol = 10) all(which(x == max(x), arr.ind = TRUE) == which_max_nm(x)) benchmark(which(x == max(x), arr.ind = TRUE), which_max_nm(x), replications = 5000)
# which_max_nm is typically much faster than # which(x == max(x), arr.ind = TRUE) x <- matrix(rnorm(100), ncol = 10) all(which(x == max(x), arr.ind = TRUE) == which_max_nm(x)) benchmark(which(x == max(x), arr.ind = TRUE), which_max_nm(x), replications = 5000)
Written in C++, this function tends to run faster than which.max
for
large numeric vectors.
which_max_nv(x)
which_max_nv(x)
x |
Numeric vector. |
For optimal speed, choose the version of this function that matches the
class of your x
:
which_max_nv
for numeric vector. which_max_iv
for integer vector. which_max_nm
for numeric matrix. which_max_im
for integer matrix.
Integer value.
# For long vectors, which_max_nv is faster than which.max x <- rnorm(100000) which.max(x) == which_max_nv(x) benchmark(which.max(x), which_max_nv(x), replications = 500) # For shorter vectors, which_max_nv is slower than which.max x <- rnorm(100) which.max(x) == which_max_nv(x) benchmark(which.max(x), which_max_nv(x), replications = 10000)
# For long vectors, which_max_nv is faster than which.max x <- rnorm(100000) which.max(x) == which_max_nv(x) benchmark(which.max(x), which_max_nv(x), replications = 500) # For shorter vectors, which_max_nv is slower than which.max x <- rnorm(100) which.max(x) == which_max_nv(x) benchmark(which.max(x), which_max_nv(x), replications = 10000)
Written in C++, this function tends to run much faster than the equivalent
(if minimum is unique) base R solution
which(x == min(x), arr.ind = TRUE)
.
which_min_im(x)
which_min_im(x)
x |
Integer matrix. |
For optimal speed, choose the version of this function that matches the
class of your x
:
which_min_nv
for numeric vector. which_min_iv
for integer vector. which_min_nm
for numeric matrix. which_min_im
for integer matrix.
Integer vector.
# which_min_im is typically much faster than # which(x == min(x), arr.ind = TRUE) x <- matrix(rpois(100, lambda = 10), ncol = 10) all(which(x == min(x), arr.ind = TRUE) == which_min_im(x)) benchmark(which(x == min(x), arr.ind = TRUE), which_min_im(x), replications = 5000)
# which_min_im is typically much faster than # which(x == min(x), arr.ind = TRUE) x <- matrix(rpois(100, lambda = 10), ncol = 10) all(which(x == min(x), arr.ind = TRUE) == which_min_im(x)) benchmark(which(x == min(x), arr.ind = TRUE), which_min_im(x), replications = 5000)
Written in C++, this function tends to run faster than
which.min
for large integer vectors.
which_min_iv(x)
which_min_iv(x)
x |
Integer vector. |
For optimal speed, choose the version of this function that matches the
class of your x
:
which_min_nv
for numeric vector. which_min_iv
for integer vector. which_min_nm
for numeric matrix. which_min_im
for integer matrix.
Integer value.
# For long vectors, which_min_iv is faster than which.min x <- rpois(10000, lambda = 15) which.min(x) == which_min_iv(x) benchmark(which.min(x), which_min_iv(x), replications = 5000) # For shorter vectors, which_min_iv is slower than which.min x <- rpois(100, lambda = 15) which.min(x) == which_min_iv(x) benchmark(which.min(x), which_min_iv(x), replications = 20000)
# For long vectors, which_min_iv is faster than which.min x <- rpois(10000, lambda = 15) which.min(x) == which_min_iv(x) benchmark(which.min(x), which_min_iv(x), replications = 5000) # For shorter vectors, which_min_iv is slower than which.min x <- rpois(100, lambda = 15) which.min(x) == which_min_iv(x) benchmark(which.min(x), which_min_iv(x), replications = 20000)
Written in C++, this function tends to run much faster than the equivalent
(if minimum is unique) base R solution
which(x == min(x), arr.ind = TRUE)
.
which_min_nm(x)
which_min_nm(x)
x |
Numeric matrix. |
For optimal speed, choose the version of this function that matches the
class of your x
:
which_min_nv
for numeric vector. which_min_iv
for integer vector. which_min_nm
for numeric matrix. which_min_im
for integer matrix.
Integer vector.
# which_min_nm is typically much faster than # which(x == min(x), arr.ind = TRUE) x <- matrix(rnorm(100), ncol = 10) all(which(x == min(x), arr.ind = TRUE) == which_min_nm(x)) benchmark(which(x == min(x), arr.ind = TRUE), which_min_nm(x), replications = 5000)
# which_min_nm is typically much faster than # which(x == min(x), arr.ind = TRUE) x <- matrix(rnorm(100), ncol = 10) all(which(x == min(x), arr.ind = TRUE) == which_min_nm(x)) benchmark(which(x == min(x), arr.ind = TRUE), which_min_nm(x), replications = 5000)
Written in C++, this function tends to run faster than
which.min
for large numeric vectors.
which_min_nv(x)
which_min_nv(x)
x |
Numeric vector. |
For optimal speed, choose the version of this function that matches the
class of your x
:
which_min_nv
for numeric vector. which_min_iv
for integer vector. which_min_nm
for numeric matrix. which_min_im
for integer matrix.
Integer value.
# For long vectors, which_min_nv is faster than which.min x <- rnorm(100000) which.min(x) == which_min_nv(x) benchmark(which.min(x), which_min_nv(x), replications = 1000) # For shorter vectors, which_min_nv is slower than which.min x <- rnorm(100) which.min(x) == which_min_nv(x) benchmark(which.min(x), which_min_nv(x), replications = 10000)
# For long vectors, which_min_nv is faster than which.min x <- rnorm(100000) which.min(x) == which_min_nv(x) benchmark(which.min(x), which_min_nv(x), replications = 1000) # For shorter vectors, which_min_nv is slower than which.min x <- rnorm(100) which.min(x) == which_min_nv(x) benchmark(which.min(x), which_min_nv(x), replications = 10000)
Returns index of maximum for vectors and index or (row, column) position for
matrices. For optimal speed, use integer = TRUE
if x
is an
integer vector/matrix and integer = FALSE
otherwise. Typically faster
than which.max
for matrices and for large
vectors.
which.max2(x, arr.ind = FALSE, integer = FALSE)
which.max2(x, arr.ind = FALSE, integer = FALSE)
x |
Integer or numeric vector/matrix. |
arr.ind |
Logical value for whether to return (row, col) position rather
than vector position, if |
integer |
Logical value for whether |
Numeric value.
# which.max2 vs. which.max for integer vector x <- rpois(10000, lambda = 5) all.equal(which.max(x), which.max2(x, integer = TRUE)) benchmark(which.max(x), which.max2(x, integer = TRUE), replications = 10000) # which.max2 vs. which.max for numeric vector x <- rnorm(10000) all.equal(which.max(x), which.max2(x)) benchmark(which.max(x), which.max2(x), replications = 10000)
# which.max2 vs. which.max for integer vector x <- rpois(10000, lambda = 5) all.equal(which.max(x), which.max2(x, integer = TRUE)) benchmark(which.max(x), which.max2(x, integer = TRUE), replications = 10000) # which.max2 vs. which.max for numeric vector x <- rnorm(10000) all.equal(which.max(x), which.max2(x)) benchmark(which.max(x), which.max2(x), replications = 10000)
Returns index of minimum for vectors and index or (row, column) position for
matrices. For optimal speed, use integer = TRUE
if x
is an
integer vector/matrix and integer = FALSE
otherwise. Typically faster
than which.min
for matrices and for large vectors.
which.min2(x, arr.ind = FALSE, integer = FALSE)
which.min2(x, arr.ind = FALSE, integer = FALSE)
x |
Integer or numeric vector/matrix. |
arr.ind |
Logical value for whether to return (row, col) position rather
than vector position, if |
integer |
Logical value for whether |
Numeric value.
# which.min2 vs. which.min for integer vector x <- rpois(10000, lambda = 10) all.equal(which.min(x), which.min2(x, integer = TRUE)) benchmark(which.min(x), which.min2(x, integer = TRUE), replications = 10000) # which.min2 vs. which.min for numeric vector x <- rnorm(10000) all.equal(which.min(x), which.min2(x)) benchmark(which.min(x), which.min2(x), replications = 10000)
# which.min2 vs. which.min for integer vector x <- rpois(10000, lambda = 10) all.equal(which.min(x), which.min2(x, integer = TRUE)) benchmark(which.min(x), which.min2(x, integer = TRUE), replications = 10000) # which.min2 vs. which.min for numeric vector x <- rnorm(10000) all.equal(which.min(x), which.min2(x)) benchmark(which.min(x), which.min2(x), replications = 10000)