In Texas, Everything is Bigger: In the context of data collection—is bigger better?

The traditional researcher concept that big data equates statistical significance could always eclipse the importance of understanding the interrelationship between the effect size, power, and sample size that could translate to both practical and statistical significance. In Texas, everything is bigger, everything a Texan do is bigger, but in the context of data collection—is bigger better? Current big data opportunities facing science, technology communities, and the health community is facing a tsunami of health- and healthcare-related content generated from numerous patient care points of contact, sophisticated medical instruments, and web-based health communities (Chen, Chiang & Storey, 2012). Two primary sources of health big data are payer–provider big data (electronic health records, insurance records, pharmacy prescription, patient feedback and responses), and data from my favorite field-genomics. I cannot help to imagine how many interesting research studies I could do with genomics-driven big data (genotyping, gene expression, sequencing data). Extracting knowledge from health big data poses significant research and practical challenges, especially considering the HIPAA (Health Insurance Portability and Accountability Act) and IRB (Institutional Review Board) requirements for building a privacy-preserving and trust-worthy health infrastructure and conducting ethical health-related research (Gelfand, 2011). Setting aside these challenges, can big data provide both practical and statistical significance? Just think about terabytes of expected raw sequencing data that associate variants that affect variation in two common highly heritable measures of obesity, weight and body mass index (BMI). For this discussion, let me broach the 2012 study of Hutchinson and Wilson in improving nutrition and physical activity in the workplace. The cumulative knowledge found in the meta-analysis of Hutchinson & Wilson (2012) found the extant results of 29 intervention studies examining physical activity or nutrition interventions in the workplace, published between 1999 and March 2009. The results from these 29 intervention studies were synthesized using meta-analyses in terms of the effectiveness of workplace health promotion programs to resolve inconsistent findings. The challenge of extant results that are sometimes discordant, Hutchinson & Wilson (2012) took into consideration the limitations in the methodology of some of the studies reviewed that demonstrated modest success in achieving long-term change. The importance of interventions’ association with successful outcomes that includes behavior maintenance and generalization was also considered in this study. Weighted Cohen’s d effect sizes, percentage overlap statistics, confidence intervals and failsafe Ns were calculated. The increased prevalence of obesity and its association with increased risk for chronic diseases including cancer, diabetes, cancer and cardiovascular disease warrants the needs for innovative and efficient interventions. Green (1988), stated that the workplace is a valuable intervention site for a number of reasons including the amount of time people spend at work, access to populations that may be difficult to engage in different settings and the opportunity to utilize peer networks and employer incentives. These reasons justify the practical significance of the study. Moreover, the statistical significance was established by the methodology of Hutchinson & Wilson (2012) developing inclusion criteria of the 29 identified studies. The inclusion criteria are published studies on workplace intervention; a control group, not receiving the intervention, health, and in particular diet, nutrition or physical activity as outcome measures; and statistical information for the calculation of effect sizes, (e.g. means and standard deviations, the results of t-tests or one-way F tests).Change over time (mean and standard deviation) data were requisite to calculate effect sizes for interventions. Studies that did not provide this data, the means and standard deviations at the end of the intervention of controls and interventions groups were compared. Statistical analyses was performed such as Cohen’s d to calculate effect sizes for the difference between the intervention and control groups on each outcome measure (diet measures: fruit, vegetables, fat; physical activity measures: activity, fitness; health measures: weight, cholesterol, blood pressure, heart rate or glucose). Based on outcome measures and the form of intervention, effect sizes were aggregated. Mean effect size, standard deviation and 95% confidence interval were calculated for each grouping (Zakzanis, 2001). Fail safe Ns (Nfs) were calculated to address the potential for studies with statistically significant results. The conclusion of this 2012 meta-analysis in terms of study design—randomized controlled trials were associated with larger effects; therefore, long-term maintenance of changes should be evaluated in order to determine the extent to which workplace interventions can make sustainable changes to individuals’ health.


Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business Intelligence and Analytics: From Big Data to Big Impact. MIS quarterly36(4), 1165-1188. Cohen, J. (2013). Statistical power analysis for the behavioral sciences. Academic press. Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press. Forthofer, R.N., Lee, E.S. & Hernandez, M. (2006). Biostatistics: A Guide to Design, Analysis and Discovery. 2nd Edition [Vital Source Bookshelf version]. Retrieved from Gelfand, A. (2011). Privacy and biomedical research: building a trust infrastructure: an exploration of data-driven and process-driven approaches to data privacy. Biomed Comput Rev2012, 23-28. Green, K. L. (1988). Issues of control and responsibility in workers’ health. Health Education & Behavior15(4), 473-486. Hutchinson, A. D., & Wilson, C. (2012). Improving nutrition and physical activity in the workplace: a meta-analysis of intervention studies. Health promotion international27(2), 238-249. Labilles, U. (2015). Big Data: Does it matter? Can it give a practical significance? Is bigger better? (Unpublished, Advanced Biostatistics (PUBH – 8500 – 1), 2015 Spring Qtr. Wk2DiscLabillesU) Walden University, Minneapolis. Thorleifsson, G., Walters, G. B., Gudbjartsson, D. F., Steinthorsdottir, V., Sulem, P., Helgadottir, A., … & Stefansson, K. (2009). Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nature genetics41(1), 18-24. Zakzanis, K. K. (2001). Statistics to tell the truth, the whole truth, and nothing but the truth: formulae, illustrative numerical examples, and heuristic interpretation of effect size analyses for neuropsychological researchers. Archives of clinical neuropsychology16(7), 653-667.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s