Title: | Functions for Processing Accelerometer Data |
---|---|
Description: | A collection of functions that perform operations on time-series accelerometer data, such as identify non-wear time, flag minutes that are part of an activity bout, and find the maximum 10-minute average count value. The functions are generally very flexible, allowing for a variety of algorithms to be implemented. Most of the functions are written in C++ for efficiency. |
Authors: | Dane R. Van Domelen |
Maintainer: | Dane R. Van Domelen <[email protected]> |
License: | GPL-3 |
Version: | 3.1.3 |
Built: | 2024-11-04 04:50:28 UTC |
Source: | https://github.com/vandomed/accelerometry |
A collection of functions that perform operations on time-series accelerometer data, such as identify non-wear time, flag minutes that are part of an activity bout, and find the maximum 10-minute average count value. The functions are generally very flexible, allowing for a variety of algorithms to be implemented. Most of the functions are written in C++ for efficiency.
Package: | accelerometry |
Type: | Package |
Version: | 3.1.3 |
Date: | 2018-12-01 |
License: | GPL-3 |
See CRAN documentation for full list of functions.
Dane R. Van Domelen
[email protected]
Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data. Hyattsville, MD: US Department of Health and Human Services, Centers for Disease Control and Prevention, 2003-6. Available at: https://wwwn.cdc.gov/nchs/nhanes/Default.aspx. Accessed Aug. 19, 2018.
Eddelbuettel, D. and Francois, R. (2011) Rcpp: Seamless R and C++ Integration. Journal of Statistical Software, 40(8), 1-18. http://www.jstatsoft.org/v40/i08/.
Eddelbuettel, D. (2013) Seamless R and C++ Integration with Rcpp. Springer, New York. ISBN 978-1-4614-6867-7.
Eddelbuettel, D. and Balamuta, J.J. (2017). Extending R with C++: A Brief Introduction to Rcpp. PeerJ Preprints 5:e3188v1. https://doi.org/10.7287/peerj.preprints.3188v1.
National Cancer Institute. Risk factor monitoring and methods: SAS programs for analyzing NHANES 2003-2004 accelerometer data. Available at: http://riskfactor.cancer.gov/tools/nhanes_pam. Accessed Aug. 19, 2018.
Van Domelen, D.R., Pittard, W.S. and Harris, T.B. (2018) nhanesaccel: Process accelerometer data from NHANES 2003-2006. R package version 3.1.1. https://github.com/vandomed/accelerometry.
Acknowledgment: This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-0940903.
Implements the Bayesian approach developed by Dodd and used in the landmark Troiano et al. paper (MSSE 2008).
adherence_dodd(n, x, n.rec = 7, x.rec = 5, posterior = NULL)
adherence_dodd(n, x, n.rec = 7, x.rec = 5, posterior = NULL)
n |
Number of monitoring days. |
x |
Number of active days. |
n.rec |
Denominator for recommendation. |
x.rec |
Numerator for recommendation. |
posterior |
Can be |
The approach aims to estimate a participant's probability of meeting guidelines of the form "at least x minutes per day for at least y days per week" based on observing X active days out of n monitoring days. We illustrate here with the "5+ active days per week" guideline that motivated the approach.
The prior assumption for the participant's daily adherence probability is:
p_d ~ Uni(0, 1)
Given p_d, the number of active days out of n monitoring days is distributed:
X|p_d ~ Bin(n, p_d)
It can be shown that the posterior for p_d is:
p_d|X ~ Beta(X + 1, n - X + 1)
Under a somewhat questionable independence assumption, the weekly adherence probability is p_w = P(Y >= 5) with Y ~ Bin(7, p_d). Dodd estimates p_w as:
p_w.hat = P(p_d >= 5/7 | X)
which can be calculated using pbeta
.
In my view, the quantity P(p_d >= 5/7 | X) is not a good estimator for p_w. Consider what would happen in a really long protocol. The Beta posterior for p_d would be very tightly centered around the true p_d, and p_w.hat = P(p_d >= 5/7 | X) would be very close to either 0 or 1 – not very close to what we're trying to estimate, p_w.
A solution is to define p_d.hat as the posterior mean, median, or mode, and map that estimate to p_w, i.e. p_w.hat = P(Y >= 5) with Y ~ Bin(7, p_d.hat). So there is an option for that.
Dodd, K. (2008). Estimation of the population prevalence of adherence to physical activity recommendations based on NHANES accelerometry measurements. Technical Report. Available at: https://epi.grants.cancer.gov/nhanes_pam/bayesian_adherence_estimation.pdf. Accessed Nov. 13, 2018.
Troiano, R.P., Berrigan, D., Dodd, K.W., Masse, L.C. and McDowell, M. (2008). Physical activity in the United States measured by accelerometer. Medicine \& Science in Sports \& Exercise 40(1): 181–188.
# Generate data from hypothetical study with 1000 subjects, valid days # randomly sampled from 1-7, and p_d's drawn from Beta(0.5, 3). set.seed(1) n <- sample(1: 7, size = 1000, replace = TRUE) p_d <- rbeta(n = 1000, shape1 = 0.5, shape2 = 3) x <- rbinom(n = 1000, size = n, prob = p_d) # Estimate p_w's using Dodd's method p_w.hat <- adherence_dodd(n = n, x = x) # Note that the mean p_w.hat differs considerably from the true mean p_w, # reflecting bias in the estimator. mean(p_w.hat) mean(pbinom(q = 4, size = 7, prob = p_d, lower.tail = FALSE))
# Generate data from hypothetical study with 1000 subjects, valid days # randomly sampled from 1-7, and p_d's drawn from Beta(0.5, 3). set.seed(1) n <- sample(1: 7, size = 1000, replace = TRUE) p_d <- rbeta(n = 1000, shape1 = 0.5, shape2 = 3) x <- rbinom(n = 1000, size = n, prob = p_d) # Estimate p_w's using Dodd's method p_w.hat <- adherence_dodd(n = n, x = x) # Note that the mean p_w.hat differs considerably from the true mean p_w, # reflecting bias in the estimator. mean(p_w.hat) mean(pbinom(q = 4, size = 7, prob = p_d, lower.tail = FALSE))
Implements the Bayesian approach described by Garriguet (Statistics Canada 2016).
adherence_garriguet(n, x, alpha, beta, n.rec = 7, x.rec = 5)
adherence_garriguet(n, x, alpha, beta, n.rec = 7, x.rec = 5)
n |
Number of monitoring days. |
x |
Number of exercise days. |
alpha |
Parameter in p_d ~ Beta(alpha, beta). Corresponds to
|
beta |
Parameter in p_d ~ Beta(alpha, beta). Corresponds to
|
n.rec |
Denominator for recommendation. |
x.rec |
Numerator for recommendation. |
The approach aims to estimate a participant's probability of meeting guidelines of the form "at least x minutes per day for at least y days per week" based on observing X active days out of n monitoring days.
The prior assumption for the participant's daily adherence probability is:
p_d ~ Beta(alpha, beta)
where alpha and beta are estimated via maximum likelihood using the observed
sample proportions if active days for all study participants. This can be
done separately via mles_beta
.
Given p_d, the number of active days out of n monitoring days is distributed:
X|p_d ~ Bin(n, p_d)
It can be shown that the posterior for p_d is:
p_d|X ~ Beta(alpha2 = alpha + X, beta2 = beta + n - X)
Garriguet then uses the Beta-binomial distribution, which describes binomial data with success probability randomly drawn from Beta(alpha, beta). The weekly adherence estimator is defined as:
p_w.hat <- P(Y >= 5) with Y ~ Betabin(7, alpha2, beta2)
which can be calculated using mles_beta
.
Garriguet, D. (2016). Using a betabinomial distribution to estimate the prevalence of adherence to physical activity guidelines among children and youth. Statistics Canada, Catalogue no. 82-003-X. Health Reports 27(4): 3-9. Available at: https://www150.statcan.gc.ca/n1/pub/82-003-x/2016004/article/14489-eng.pdf.
# Generate data from hypothetical study with 1000 subjects, valid days # randomly sampled from 1-7, and p_d's drawn from Beta(0.5, 3). set.seed(1) n <- sample(1: 7, size = 1000, replace = TRUE) p_d <- rbeta(n = 1000, shape1 = 0.5, shape2 = 3) x <- rbinom(n = 1000, size = n, prob = p_d) # First step: Estimate (alpha, beta) via maximum likelihood. Have to change # 0's to 0.01 and 1's to 0.99 to avoid Inf's p_d.hat <- x / n p_d.hat[p_d.hat == 0] <- 0.01 p_d.hat[p_d.hat == 1] <- 0.99 mles <- mles_beta(x = p_d.hat) # Estimate each subject's weekly adherence probability p_w.hat <- adherence_garriguet(n = n, x = x, alpha = mles$par[1], beta = mles$par[2]) # Note that the mean p_w.hat differs considerably from the true mean p_w, # reflecting bias in the estimator. mean(p_w.hat) mean(pbinom(q = 4, size = 7, prob = p_d, lower.tail = FALSE))
# Generate data from hypothetical study with 1000 subjects, valid days # randomly sampled from 1-7, and p_d's drawn from Beta(0.5, 3). set.seed(1) n <- sample(1: 7, size = 1000, replace = TRUE) p_d <- rbeta(n = 1000, shape1 = 0.5, shape2 = 3) x <- rbinom(n = 1000, size = n, prob = p_d) # First step: Estimate (alpha, beta) via maximum likelihood. Have to change # 0's to 0.01 and 1's to 0.99 to avoid Inf's p_d.hat <- x / n p_d.hat[p_d.hat == 0] <- 0.01 p_d.hat[p_d.hat == 1] <- 0.99 mles <- mles_beta(x = p_d.hat) # Estimate each subject's weekly adherence probability p_w.hat <- adherence_garriguet(n = n, x = x, alpha = mles$par[1], beta = mles$par[2]) # Note that the mean p_w.hat differs considerably from the true mean p_w, # reflecting bias in the estimator. mean(p_w.hat) mean(pbinom(q = 4, size = 7, prob = p_d, lower.tail = FALSE))
Corrects abnormally high count values in accelerometer data by replacing such values with the average of neighboring count values. Returns integer vector despite the average calculation often producing a decimal; this follows the convention used in the NCI's SAS programs (http://riskfactor.cancer.gov/tools/nhanes_pam).
artifacts(counts, thresh, counts_classify = NULL)
artifacts(counts, thresh, counts_classify = NULL)
counts |
Integer vector with accelerometer count values. |
thresh |
Integer value specifying the smallest count value that should be considered an artifact. |
counts_classify |
Integer vector with accelerometer count values to base artifact classification on, but not to adjust. Mainly included for triaxial data, where you might want to define artifacts based on vertical-axis counts but then actually adjust the triaxial sum or vector magnitude counts. |
Integer vector equivalent to counts
except where artifacts
were adjusted.
National Cancer Institute. Risk factor monitoring and methods: SAS programs for analyzing NHANES 2003-2004 accelerometer data. Available at: http://riskfactor.cancer.gov/tools/nhanes_pam. Accessed Aug. 19, 2018.
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21007 counts.part3 <- unidata[unidata[, "seqn"] == 21007, "paxinten"] # Replace counts > 10,000 with average of neighboring values counts.part3.corrected <- artifacts(counts = counts.part3, thresh = 10000)
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21007 counts.part3 <- unidata[unidata[, "seqn"] == 21007, "paxinten"] # Replace counts > 10,000 with average of neighboring values counts.part3.corrected <- artifacts(counts = counts.part3, thresh = 10000)
Calculates block averages (i.e. moving averages but for non-overlapping
intervals) or maximum block average. For optimal speed, use
integer = TRUE
if x
is an integer vector and
integer = FALSE
otherwise. If length(x)
is not an exact
multiple of window
, the last partial segment is dropped.
blockaves(x, window, integer = FALSE, max = FALSE)
blockaves(x, window, integer = FALSE, max = FALSE)
x |
Integer or numeric vector. |
window |
Integer value specifying window length. |
integer |
Logical value for whether |
max |
Logical value for whether to return maximum moving average (as opposed to vector of moving averages). |
Numeric value or vector depending on max
.
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005, Saturday only counts.sat <- unidata[unidata[, "seqn"] == 21005 & unidata[, "paxday"] == 7, "paxinten"] # Calculate and plot hourly count averages hourly.averages <- blockaves(x = counts.sat, window = 60, integer = TRUE) plot(hourly.averages)
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005, Saturday only counts.sat <- unidata[unidata[, "seqn"] == 21005 & unidata[, "paxday"] == 7, "paxinten"] # Calculate and plot hourly count averages hourly.averages <- blockaves(x = counts.sat, window = 60, integer = TRUE) plot(hourly.averages)
Calculates block sums (i.e. moving sums but for non-overlapping intervals) or
maximum block sum. For optimal speed, use integer = TRUE
if x
is an integer vector and integer = FALSE
otherwise. If
length(x)
is not an exact multiple of window
, the last partial
segment is dropped.
blocksums(x, window, integer = FALSE, max = FALSE)
blocksums(x, window, integer = FALSE, max = FALSE)
x |
Integer or numeric vector. |
window |
Integer value specifying window length. |
integer |
Logical value for whether |
max |
Logical value for whether to return maximum moving average (as opposed to vector of moving averages). |
Numeric value or vector depending on max
.
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005, Saturday only counts.sat <- unidata[unidata[, "seqn"] == 21005 & unidata[, "paxday"] == 7, "paxinten"] # Calculate and plot hourly count sums hourly.sums <- blocksums(x = counts.sat, window = 60, integer = TRUE) plot(hourly.sums)
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005, Saturday only counts.sat <- unidata[unidata[, "seqn"] == 21005 & unidata[, "paxday"] == 7, "paxinten"] # Calculate and plot hourly count sums hourly.sums <- blocksums(x = counts.sat, window = 60, integer = TRUE) plot(hourly.sums)
Identify bouts of physical activity based on a vector of accelerometer count values.
bouts(counts, weartime = NULL, bout_length = 10L, thresh_lower = 0L, thresh_upper = 100000L, tol = 0L, tol_lower = 0L, tol_upper = 100000L, nci = FALSE, days_distinct = FALSE)
bouts(counts, weartime = NULL, bout_length = 10L, thresh_lower = 0L, thresh_upper = 100000L, tol = 0L, tol_lower = 0L, tol_upper = 100000L, nci = FALSE, days_distinct = FALSE)
counts |
Integer vector with accelerometer count values. |
weartime |
Integer vector with 1's for wear time minutes and 0's for non-wear time minutes. |
bout_length |
Integer value specifying minimum length of an activity bout. |
thresh_lower |
Integer value specifying lower bound for count values to be included for the intensity level. |
thresh_upper |
Integer value specifying upper bound for count values to be included for the intensity level. |
tol |
Integer value specifying number of minutes with count values
outside of [ |
tol_lower |
Integer value specifying lower cut-off for count values outside of intensity range during an activity bout. |
tol_upper |
Integer value specifying upper cut-off for count values outside of intensity range during an activity bout. |
nci |
Logical value for whether to use algorithm from NCI's SAS programs. See Details. |
days_distinct |
Logical value for whether to treat each day of data as
distinct, i.e. identify non-wear time and activity bouts for day 1, then day
2, etc. If |
If nci = FALSE
, the algorithm uses a moving window to go through
every possible interval of length bout_length
in counts
. Any
interval in which all counts are >= tol_lower
and <=
tol_upper
, and no more than tol
counts are less than
thresh_lower
or greater than thresh_upper
, is classified as an
activity bout.
If nci = TRUE
, activity bouts are classified according to the
algorithm used in the NCI's SAS programs. Briefly, this algorithm defines an
activity bout as an interval of length bout_length
that starts with a
count value in [thresh_lower, thresh_upper]
and has no more than
tol
counts outside of that range. If these criteria are met, the bout
continues until there are (tol + 1)
consecutive minutes outside of
[thresh_lower, thresh_upper]
. The parameters tol_lower
and
tol_upper
are not used.
If the user allows for a tolerance (e.g. tol = 2
) and does not use
the NCI algorithm (i.e. nci = FALSE
), specifying a non-zero value for
tol_lower
is highly recommended. Otherwise the algorithm will tend to
classify minutes immediately before and after an activity bout as being part
of the bout.
Specifying thresh_lower
while using an arbitrarily large value for
thresh_upper
is generally recommended. Specifying both of these
parameters can be overly restrictive in that the algorithm may miss bouts of
activity in which counts are consistently high, but not exclusively in one
intensity range.
Integer vector with 1's for minutes that are part of an activity bout and 0's for minutes that are not.
National Cancer Institute. Risk factor monitoring and methods: SAS programs for analyzing NHANES 2003-2004 accelerometer data. Available at: http://riskfactor.cancer.gov/tools/nhanes_pam. Accessed Aug. 19, 2018.
Acknowledgment: This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-0940903.
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Identify periods of valid wear time wear.part1 <- weartime(counts = counts.part1) # Identify moderate-to-vigorous activity bouts mvpa.bouts <- bouts(counts = counts.part1, weartime = wear.part1, thresh_lower = 2020)
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Identify periods of valid wear time wear.part1 <- weartime(counts = counts.part1) # Identify moderate-to-vigorous activity bouts mvpa.bouts <- bouts(counts = counts.part1, weartime = wear.part1, thresh_lower = 2020)
Given a vector of accelerometer count values, classifies each count value into intensity level 1, 2, 3, 4, or 5 (typically representing sedentary, light, lifestyle, moderate, and vigorous).
cut_counts(counts, int_cuts = as.integer(c(100, 760, 2020, 5999)))
cut_counts(counts, int_cuts = as.integer(c(100, 760, 2020, 5999)))
counts |
Integer vector with accelerometer count values. |
int_cuts |
Numeric vector with four cutpoints from which five intensity
ranges are derived. For example, |
Integer vector.
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Cut into 5 intensity levels and plot intensity.part1 <- cut_counts(counts = counts.part1) plot(intensity.part1)
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Cut into 5 intensity levels and plot intensity.part1 <- cut_counts(counts = counts.part1) plot(intensity.part1)
Given a vector of accelerometer count values, calculates time spent in 5
mutually exclusive user-defined intensity levels (typically representing
sedentary, light, lifestyle, moderate, and vigorous) as well as the total
counts accumulated in various intensities. Non-wear time should be removed
from counts
before calling intensities
to avoid overestimating
sedentary time.
intensities(counts, int_cuts = as.integer(c(100, 760, 2020, 5999)))
intensities(counts, int_cuts = as.integer(c(100, 760, 2020, 5999)))
counts |
Integer vector with accelerometer count values. |
int_cuts |
Numeric vector with four cutpoints from which five intensity
ranges are derived. For example, |
Integer vector of length 16 in which the first eight values are minutes in intensities 1, 2, 3, 4, 5, 2-3, 4-5, and 2-5, and the next eight are counts accumulated during time spent in each of those intensities.
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Create vector of counts during valid wear time only counts.part1.wear <- counts.part1[weartime(counts = counts.part1) == 1] # Calculate physical activity intensity variables intensity.variables <- intensities(counts = counts.part1.wear)
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Create vector of counts during valid wear time only counts.part1.wear <- counts.part1[weartime(counts = counts.part1) == 1] # Calculate physical activity intensity variables intensity.variables <- intensities(counts = counts.part1.wear)
Re-constructs vector compressed by rle2
.
inverse_rle2(x)
inverse_rle2(x)
x |
Object returned by |
Integer or numeric vector.
# Create dummie vector x x <- c(0, 0, 0, -1, -1, 10, 10, 4, 6, 6) # Summarize x using rle2 x.summarized <- rle2(x) # Reconstruct x x.reconstructed <- inverse_rle2(x.summarized)
# Create dummie vector x x <- c(0, 0, 0, -1, -1, 10, 10, 4, 6, 6) # Summarize x using rle2 x.summarized <- rle2(x) # Reconstruct x x.reconstructed <- inverse_rle2(x.summarized)
Same idea as fitdistr
function in MASS, but has default
starting values and uses nlminb
rather than
optim
.
mles_beta(x, start = c(0.5, 0.5))
mles_beta(x, start = c(0.5, 0.5))
x |
Observations assumed to be iid Beta(alpha, beta). |
start |
Starting values for alpha and beta. |
# Generate data from Beta(1, 2) and get MLE's set.seed(1) x <- rbeta(n = 1000, shape1 = 1, shape2 = 2) mles <- mles_beta(x) mles$par
# Generate data from Beta(1, 2) and get MLE's set.seed(1) x <- rbeta(n = 1000, shape1 = 1, shape2 = 2) mles <- mles_beta(x) mles$par
Calculates moving averages or maximum moving average. For optimal speed, use
integer = TRUE
if x
is an integer vector and
integer = FALSE
otherwise.
movingaves(x, window, integer = FALSE, max = FALSE)
movingaves(x, window, integer = FALSE, max = FALSE)
x |
Integer or numeric vector. |
window |
Integer value specifying window length. |
integer |
Logical value for whether |
max |
Logical value for whether to return maximum moving average (as opposed to vector of moving averages). |
Numeric value or vector depending on max
.
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 id.part1 <- unidata[unidata[, "seqn"] == 21005, "seqn"] counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Create vector of all 10-minute moving averages all.movingaves <- movingaves(x = counts.part1, window = 10, integer = TRUE) # Calculate maximum 10-minute moving average max.movingave <- movingaves(x = counts.part1, window = 10, integer = TRUE, max = TRUE)
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 id.part1 <- unidata[unidata[, "seqn"] == 21005, "seqn"] counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Create vector of all 10-minute moving averages all.movingaves <- movingaves(x = counts.part1, window = 10, integer = TRUE) # Calculate maximum 10-minute moving average max.movingave <- movingaves(x = counts.part1, window = 10, integer = TRUE, max = TRUE)
Not intended for direct use.
personvars(dayvars, rows, days, wk, we)
personvars(dayvars, rows, days, wk, we)
dayvars |
Numeric matrix with daily physical activity variables. |
rows |
Numeric value specifying number of rows in the matrix to be returned. |
days |
Integer value specifying minimum number of valid days a participant must have to be included. |
wk |
Integer value specifying minimum number of valid weekdays a participant must have to be included. |
we |
Integer value specifying minimum number of valid weekend days a participant must have to be included. |
Numeric matrix.
Calculates a variety of physical activity variables based on triaxial minute-to-minute accelerometer count values for individual participants. Assumes first 1440 minutes are day 1, next 1440 are day 2, and so on. If final day has less than 1440 minutes, it is excluded. A data dictionary for the variables created is available here: https://github.com/vandomed/accelerometry/blob/master/process_tri_dictionary.csv.
process_tri(counts, steps = NULL, nci_methods = FALSE, start_day = 1, start_date = NULL, id = NULL, brevity = 1, hourly_var = "cpm", hourly_wearmin = 0, hourly_normalize = FALSE, valid_days = 1, valid_wk_days = 0, valid_we_days = 0, int_axis = "vert", int_cuts = c(100, 760, 2020, 5999), cpm_nci = FALSE, days_distinct = FALSE, nonwear_axis = "vert", nonwear_window = 60, nonwear_tol = 0, nonwear_tol_upper = 99, nonwear_nci = FALSE, weartime_minimum = 600, weartime_maximum = 1440, active_bout_length = 10, active_bout_tol = 0, mvpa_bout_tol_lower = 0, vig_bout_tol_lower = 0, active_bout_nci = FALSE, sed_bout_tol = 0, sed_bout_tol_maximum = int_cuts[2] - 1, artifact_axis = "vert", artifact_thresh = 25000, artifact_action = 1, weekday_weekend = FALSE, return_form = "daily")
process_tri(counts, steps = NULL, nci_methods = FALSE, start_day = 1, start_date = NULL, id = NULL, brevity = 1, hourly_var = "cpm", hourly_wearmin = 0, hourly_normalize = FALSE, valid_days = 1, valid_wk_days = 0, valid_we_days = 0, int_axis = "vert", int_cuts = c(100, 760, 2020, 5999), cpm_nci = FALSE, days_distinct = FALSE, nonwear_axis = "vert", nonwear_window = 60, nonwear_tol = 0, nonwear_tol_upper = 99, nonwear_nci = FALSE, weartime_minimum = 600, weartime_maximum = 1440, active_bout_length = 10, active_bout_tol = 0, mvpa_bout_tol_lower = 0, vig_bout_tol_lower = 0, active_bout_nci = FALSE, sed_bout_tol = 0, sed_bout_tol_maximum = int_cuts[2] - 1, artifact_axis = "vert", artifact_thresh = 25000, artifact_action = 1, weekday_weekend = FALSE, return_form = "daily")
counts |
Integer matrix with three columns of count values, e.g. vertical-axis counts, anteroposterior (AP)-axis counts, and mediolateral (ML)-axis counts. |
steps |
Integer vector with steps. |
nci_methods |
Logical value for whether to set all arguments so as to replicate the data processing methods used in the NCI's SAS programs. More specifically:
If |
start_day |
Integer value specifying day of week for first day of monitoring, with 1 = Sunday, ..., 7 = Satuday. |
start_date |
Date for first day of monitoring, which function can use to
figure out |
id |
Numeric value specifying ID number of participant. |
brevity |
Integer value controlling the number of physical activity variables generated. Choices are 1 for basic indicators of physical activity volume, 2 for addditional indicators of activity intensities, activity bouts, sedentary behavior, and peak activity, and 3 for additional hourly count averages. |
hourly_var |
Character string specifying what hourly activity variable
to record, if |
hourly_wearmin |
Integer value specifying minimum number of wear time minutes needed during a given hour to record a value for the hourly activity variable. |
hourly_normalize |
Logical value for whether to normalize hourly activity by number of wear time minutes. |
valid_days |
Integer value specifying minimum number of valid days to be considered valid for analysis. |
valid_wk_days |
Integer value specifying minimum number of valid weekdays to be considered valid for analysis. |
valid_we_days |
Integer value specifying minimum number of valid weekend days to be considered valid for analysis. |
int_axis |
Character string specifying which axis should be used to classify intensities. Choices are "vert", "ap", "ml", "sum" (for triaxial sum), and "vm (for triaxial vector magnitude). |
int_cuts |
Numeric vector with four cutpoints from which five intensity
ranges are derived. For example, |
cpm_nci |
Logical value for whether to calculate average counts per
minute by dividing average daily counts by average daily wear time, as
opposed to taking the average of each day's counts per minute value. Strongly
recommend leave as |
days_distinct |
Logical value for whether to treat each day of data as distinct, as opposed to analyzing the entire monitoring period as one continuous segment. |
nonwear_axis |
Character string specifying which axis should be used to classify non-wear time. Choices are "vert", "ap", "ml", "sum" (for triaxial sum), and "vm" (for triaxial vector magnitude). |
nonwear_window |
Integer value specifying minimum length of a non-wear period. |
nonwear_tol |
Integer value specifying tolerance for non-wear algorithm, i.e. number of minutes with non-zero counts allowed during a non-wear interval. |
nonwear_tol_upper |
Integer value specifying maximum count value for a minute with non-zero counts during a non-wear interval. |
nonwear_nci |
Logical value for whether to use non-wear algorithm from NCI's SAS programs. |
weartime_minimum |
Integer value specifying minimum number of wear time minutes for a day to be considered valid. |
weartime_maximum |
Integer value specifying maximum number of wear time minutes for a day to be considered valid. The default is 1440, but you may want to use a lower value (e.g. 1200) if participants were instructed to remove devices for sleeping, but often did not. |
active_bout_length |
Integer value specifying minimum length of an active bout. |
active_bout_tol |
Integer value specifying number of minutes with counts
outside the required range to allow during an active bout. If non-zero and
|
mvpa_bout_tol_lower |
Integer value specifying lower cut-off for count values outside of required intensity range for an MVPA bout. |
vig_bout_tol_lower |
Integer value specifying lower cut-off for count values outside of required intensity range for a vigorous bout. |
active_bout_nci |
Logical value for whether to use algorithm from the NCI's SAS programs for classifying active bouts. |
sed_bout_tol |
Integer value specifying number of minutes with counts outside sedentary range to allow during a sedentary bout. |
sed_bout_tol_maximum |
Integer value specifying upper cut-off for count values outside sedentary range during a sedentary bout. |
artifact_axis |
Character string specifying which axis should be used to identify artifacts (impossibly high count values). Choices are "vert", "ap", "ml", "sum" (for triaxial sum), and "vm" (for triaxial vector magnitude). |
artifact_thresh |
Integer value specifying the smallest count value that should be considered an artifact. |
artifact_action |
Integer value controlling method of correcting artifacts. Choices are 1 to exclude days with one or more artifacts, 2 to lump artifacts into non-wear time, 3 to replace artifacts with the average of neighboring count values, and 4 to take no action. |
weekday_weekend |
Logical value for whether to calculate averages for weekdays and weekend days separately (in addition to all valid days). |
return_form |
Character string controlling how variables are returned. Choices are "daily" for per-day summaries, "averages" for averages across all valid days, and "both" for a list containing both. |
Numeric matrix or list of two numeric matrices, depending on
return_form
.
National Cancer Institute. Risk factor monitoring and methods: SAS programs for analyzing NHANES 2003-2004 accelerometer data. Available at: http://riskfactor.cancer.gov/tools/nhanes_pam. Accessed Aug. 19, 2018.
# Note that the 'tridata' dataset contains 7 days of fake triaxial # accelerometer data # Process data using default parameters and request per-day variables accel.days <- process_tri( counts = tridata, return_form = "daily" ) # Repeat, but request averages across all valid days accel.averages <- process_tri( counts = tridata, return_form = "averages" ) # Create per-day summary again, but with many more variables accel.days2 <- process_tri( counts = tridata, brevity = 2, return_form = "daily" ) names(accel.days2)
# Note that the 'tridata' dataset contains 7 days of fake triaxial # accelerometer data # Process data using default parameters and request per-day variables accel.days <- process_tri( counts = tridata, return_form = "daily" ) # Repeat, but request averages across all valid days accel.averages <- process_tri( counts = tridata, return_form = "averages" ) # Create per-day summary again, but with many more variables accel.days2 <- process_tri( counts = tridata, brevity = 2, return_form = "daily" ) names(accel.days2)
Calculates a variety of physical activity variables based on uniaxial minute-to-minute accelerometer count values for individual participants. Assumes first 1440 minutes are day 1, next 1440 are day 2, and so on. If final day has less than 1440 minutes, it is excluded. A data dictionary for the variables created is available here: https://github.com/vandomed/accelerometry/blob/master/process_uni_dictionary.csv.
process_uni(counts, steps = NULL, nci_methods = FALSE, start_day = 1, start_date = NULL, id = NULL, brevity = 1, hourly_var = "cpm", hourly_wearmin = 0, hourly_normalize = FALSE, valid_days = 1, valid_wk_days = 0, valid_we_days = 0, int_cuts = c(100, 760, 2020, 5999), cpm_nci = FALSE, days_distinct = FALSE, nonwear_window = 60, nonwear_tol = 0, nonwear_tol_upper = 99, nonwear_nci = FALSE, weartime_minimum = 600, weartime_maximum = 1440, active_bout_length = 10, active_bout_tol = 0, mvpa_bout_tol_lower = 0, vig_bout_tol_lower = 0, active_bout_nci = FALSE, sed_bout_tol = 0, sed_bout_tol_maximum = int_cuts[2] - 1, artifact_thresh = 25000, artifact_action = 1, weekday_weekend = FALSE, return_form = "averages")
process_uni(counts, steps = NULL, nci_methods = FALSE, start_day = 1, start_date = NULL, id = NULL, brevity = 1, hourly_var = "cpm", hourly_wearmin = 0, hourly_normalize = FALSE, valid_days = 1, valid_wk_days = 0, valid_we_days = 0, int_cuts = c(100, 760, 2020, 5999), cpm_nci = FALSE, days_distinct = FALSE, nonwear_window = 60, nonwear_tol = 0, nonwear_tol_upper = 99, nonwear_nci = FALSE, weartime_minimum = 600, weartime_maximum = 1440, active_bout_length = 10, active_bout_tol = 0, mvpa_bout_tol_lower = 0, vig_bout_tol_lower = 0, active_bout_nci = FALSE, sed_bout_tol = 0, sed_bout_tol_maximum = int_cuts[2] - 1, artifact_thresh = 25000, artifact_action = 1, weekday_weekend = FALSE, return_form = "averages")
counts |
Integer vector with accelerometer count values. |
steps |
Integer vector with steps. |
nci_methods |
Logical value for whether to set all arguments so as to replicate the data processing methods used in the NCI's SAS programs. More specifically:
If |
start_day |
Integer value specifying day of week for first day of monitoring, with 1 = Sunday, ..., 7 = Satuday. |
start_date |
Date for first day of monitoring, which function can use to
figure out |
id |
Numeric value specifying ID number of participant. |
brevity |
Integer value controlling the number of physical activity variables generated. Choices are 1 for basic indicators of physical activity volume, 2 for addditional indicators of activity intensities, activity bouts, sedentary behavior, and peak activity, and 3 for additional hourly count averages. |
hourly_var |
Character string specifying what hourly activity variable
to record, if |
hourly_wearmin |
Integer value specifying minimum number of wear time minutes needed during a given hour to record a value for the hourly activity variable. |
hourly_normalize |
Logical value for whether to normalize hourly activity by number of wear time minutes. |
valid_days |
Integer value specifying minimum number of valid days to be considered valid for analysis. |
valid_wk_days |
Integer value specifying minimum number of valid weekdays to be considered valid for analysis. |
valid_we_days |
Integer value specifying minimum number of valid weekend days to be considered valid for analysis. |
int_cuts |
Numeric vector with four cutpoints from which five intensity
ranges are derived. For example, |
cpm_nci |
Logical value for whether to calculate average counts per
minute by dividing average daily counts by average daily wear time, as
opposed to taking the average of each day's counts per minute value. Strongly
recommend leave as |
days_distinct |
Logical value for whether to treat each day of data as distinct, as opposed to analyzing the entire monitoring period as one continuous segment. |
nonwear_window |
Integer value specifying minimum length of a non-wear period. |
nonwear_tol |
Integer value specifying tolerance for non-wear algorithm, i.e. number of minutes with non-zero counts allowed during a non-wear interval. |
nonwear_tol_upper |
Integer value specifying maximum count value for a minute with non-zero counts during a non-wear interval. |
nonwear_nci |
Logical value for whether to use non-wear algorithm from NCI's SAS programs. |
weartime_minimum |
Integer value specifying minimum number of wear time minutes for a day to be considered valid. |
weartime_maximum |
Integer value specifying maximum number of wear time minutes for a day to be considered valid. The default is 1440, but you may want to use a lower value (e.g. 1200) if participants were instructed to remove devices for sleeping, but often did not. |
active_bout_length |
Integer value specifying minimum length of an active bout. |
active_bout_tol |
Integer value specifying number of minutes with counts
outside the required range to allow during an active bout. If non-zero and
|
mvpa_bout_tol_lower |
Integer value specifying lower cut-off for count values outside of required intensity range for an MVPA bout. |
vig_bout_tol_lower |
Integer value specifying lower cut-off for count values outside of required intensity range for a vigorous bout. |
active_bout_nci |
Logical value for whether to use algorithm from the NCI's SAS programs for classifying active bouts. |
sed_bout_tol |
Integer value specifying number of minutes with counts outside sedentary range to allow during a sedentary bout. |
sed_bout_tol_maximum |
Integer value specifying upper cut-off for count values outside sedentary range during a sedentary bout. |
artifact_thresh |
Integer value specifying the smallest count value that should be considered an artifact. |
artifact_action |
Integer value controlling method of correcting artifacts. Choices are 1 to exclude days with one or more artifacts, 2 to lump artifacts into non-wear time, 3 to replace artifacts with the average of neighboring count values, and 4 to take no action. |
weekday_weekend |
Logical value for whether to calculate averages for weekdays and weekend days separately (in addition to all valid days). |
return_form |
Character string controlling how variables are returned. Choices are "daily" for per-day summaries, "averages" for averages across all valid days, and "both" for a list containing both. |
Numeric matrix or list of two numeric matrices, depending on
return_form
.
National Cancer Institute. Risk factor monitoring and methods: SAS programs for analyzing NHANES 2003-2004 accelerometer data. Available at: http://riskfactor.cancer.gov/tools/nhanes_pam. Accessed Aug. 19, 2018.
# Note that the 'unidata' dataset contains accelerometer data for first 5 # subjects in NHANES 2003-2004 # Get data from ID number 21005 id.part1 <- unidata[unidata[, "seqn"] == 21005, "seqn"] counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Process data from ID 21005 and request per-day variables accel.days <- process_uni( counts = counts.part1, id = id.part1, return_form = "daily" ) # Repeat, but request averages across all valid days accel.averages <- process_uni( counts = counts.part1, id = id.part1, return_form = "averages" ) # Process data according to methods used in NCI's SAS programs accel.nci1 <- process_uni( counts = counts.part1, id = id.part1, brevity = 2, valid_days = 4, cpm_nci = TRUE, days_distinct = TRUE, nonwear_tol = 2, nonwear_tol_upper = 100, nonwear_nci = TRUE, weartime_maximum = 1440, active_bout_tol = 2, active_bout_nci = TRUE, artifact_thresh = 32767, artifact_action = 3, return_form = "averages" ) # Repeat, but use nci_methods input for convenience accel.nci2 <- process_uni( counts = counts.part1, id = id.part1, nci_methods = TRUE, brevity = 2, return_form = "averages" ) # Results are identical all.equal(accel.nci1, accel.nci2)
# Note that the 'unidata' dataset contains accelerometer data for first 5 # subjects in NHANES 2003-2004 # Get data from ID number 21005 id.part1 <- unidata[unidata[, "seqn"] == 21005, "seqn"] counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Process data from ID 21005 and request per-day variables accel.days <- process_uni( counts = counts.part1, id = id.part1, return_form = "daily" ) # Repeat, but request averages across all valid days accel.averages <- process_uni( counts = counts.part1, id = id.part1, return_form = "averages" ) # Process data according to methods used in NCI's SAS programs accel.nci1 <- process_uni( counts = counts.part1, id = id.part1, brevity = 2, valid_days = 4, cpm_nci = TRUE, days_distinct = TRUE, nonwear_tol = 2, nonwear_tol_upper = 100, nonwear_nci = TRUE, weartime_maximum = 1440, active_bout_tol = 2, active_bout_nci = TRUE, artifact_thresh = 32767, artifact_action = 3, return_form = "averages" ) # Repeat, but use nci_methods input for convenience accel.nci2 <- process_uni( counts = counts.part1, id = id.part1, nci_methods = TRUE, brevity = 2, return_form = "averages" ) # Results are identical all.equal(accel.nci1, accel.nci2)
Summarizes vector containing runs of repeated values. Very similar to
rle
, but sometimes much faster, and with an option to
return the start/end indices for each run.
rle2(x, class = NULL, indices = FALSE)
rle2(x, class = NULL, indices = FALSE)
x |
Vector (see |
class |
Character string specifying class of |
indices |
Logical value for whether to record start/stop positions in addition to values and lengths for each run. |
Integer or numeric matrix.
# Create dummie vector x x <- c(0, 0, 0, -1, -1, 10, 10, 4, 6, 6) # Summarize x using rle2 x.summarized <- rle2(x) # Repeat, but also record start/stop indices for each run x.summarized <- rle2(x = x, indices = TRUE)
# Create dummie vector x x <- c(0, 0, 0, -1, -1, 10, 10, 4, 6, 6) # Summarize x using rle2 x.summarized <- rle2(x) # Repeat, but also record start/stop indices for each run x.summarized <- rle2(x = x, indices = TRUE)
Identifies sedentary breaks in accelerometer count data.
sedbreaks(counts, weartime = NULL, thresh = 100, flags = FALSE)
sedbreaks(counts, weartime = NULL, thresh = 100, flags = FALSE)
counts |
Integer vector with accelerometer count values. |
weartime |
Integer vector with 1's for wear time minutes and 0's for non-wear time minutes. |
thresh |
Integer value specifying minimum count value to consider a break from sedentary time. |
flags |
Logical value for whether to return a vector of 1's and 0's flagging the sedentary breaks (as opposed to the total number of sedentary breaks). |
Integer value or vector depending on flags
.
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 id.part1 <- unidata[unidata[, "seqn"] == 21005, "seqn"] counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Identify periods of valid wear time wear.part1 <- weartime(counts = counts.part1) # Count number of sedentary breaks (over full week) n.sedbreaks <- sedbreaks(counts = counts.part1, weartime = wear.part1) # Flag sedentary breaks sedbreaks.flagged <- sedbreaks(counts = counts.part1, weartime = wear.part1, flags = TRUE)
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 id.part1 <- unidata[unidata[, "seqn"] == 21005, "seqn"] counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Identify periods of valid wear time wear.part1 <- weartime(counts = counts.part1) # Count number of sedentary breaks (over full week) n.sedbreaks <- sedbreaks(counts = counts.part1, weartime = wear.part1) # Flag sedentary breaks sedbreaks.flagged <- sedbreaks(counts = counts.part1, weartime = wear.part1, flags = TRUE)
Toy dataset with triaxial minute-to-minute counts generated from a trivariate normal distribution. Does not closely resemble real accelerometer data.
Accelerometer data for the first 5 participants in the National Health and Nutrition Examination Survey (NHANES) 2003-2004 dataset.
https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Examination&CycleBeginYear=2003
Classifies wear time vs. non-wear time based on a vector of accelerometer count values.
weartime(counts, window = 60L, tol = 0L, tol_upper = 99L, nci = FALSE, days_distinct = FALSE, units_day = 1440L)
weartime(counts, window = 60L, tol = 0L, tol_upper = 99L, nci = FALSE, days_distinct = FALSE, units_day = 1440L)
counts |
Integer vector with accelerometer count values. |
window |
Integer value specifying minimum length of a non-wear period. |
tol |
Integer value specifying tolerance for non-wear algorithm, i.e. number of seconds/minutes with non-zero counts allowed during a non-wear interval. |
tol_upper |
Integer value specifying maximum count value for a second/minute with non-zero counts during a non-wear interval. |
nci |
Logical value for whether to use algorithm from NCI's SAS programs. See Details. |
days_distinct |
Logical value for whether to treat each day of data as
distinct, as opposed to analyzing the entire monitoring period as one
continuous segment. For minute-to-minute counts, strongly recommend setting
to |
units_day |
Integer value specifying how many data point are in a day. Typically either 1440 or 86400 depending on whether count values are minute-to-minute or second-to-second. |
If nci = FALSE
, the algorithm uses a moving window to go through
every possible interval of length window
in counts
. Any
interval in which no more than tol
counts are non-zero, and those
are still < tol.upper
, is classified as non-wear time.
If nci = TRUE
, non-wear time is classified according to the algorithm
used in the NCI's SAS programs. Briefly, this algorithm defines a non-wear
period as an interval of length window
that starts with a count value
of 0, does not contain any periods with (tol + 1)
consecutive
non-zero count values, and does not contain any counts > tol.upper
.
If these criteria are met, the non-wear period continues until there are
(tol + 1)
consecutive non-zero count values or a single count value >
tol.upper
.
Integer vector with 1's for valid wear time and 0's for non-wear time.
National Cancer Institute. Risk factor monitoring and methods: SAS programs for analyzing NHANES 2003-2004 accelerometer data. Available at: http://riskfactor.cancer.gov/tools/nhanes_pam. Accessed Aug. 19, 2018.
Acknowledgment: This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-0940903.
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Identify periods of valid wear time weartime.flag <- weartime(counts = counts.part1)
# Load accelerometer data for first 5 participants in NHANES 2003-2004 data(unidata) # Get data from ID number 21005 counts.part1 <- unidata[unidata[, "seqn"] == 21005, "paxinten"] # Identify periods of valid wear time weartime.flag <- weartime(counts = counts.part1)