UConn Stat Anniversary : Abstracts

Abstracts

Patrick J. Cantwell, Chief of the Decennial Statistical Studies Division, U.S. Census Bureau

Title: Measuring Coverage of the 2020 U.S. Census: Methods and Results

Abstract: Every ten years the Federal government takes a census of the people and housing units in the United States. Among other purposes, census totals are used to divide the 435 seats in the U.S. House of Representatives among the 50 states, construct Congressional and local districts, and distribute over a trillion dollars in Federal funds each year. Following each census, we try to measure its coverage: How many people (and housing units) did we miss? How many did we count in error, e.g., more than once? Does the coverage vary across races, age groups, states? The 2020 Post-Enumeration Survey (PES) was conducted to measure the coverage of the 2020 Census. Based on the simple longstanding principle of "capture-recapture," the PES sampled about 160,000 U.S. housing units, matched the people to actual census enumerations in the same geographic blocks, and constructed estimates of the net coverage of the census. Further, we produced estimates of the components of coverage: correct and erroneous enumerations, whole-person imputations, and omissions. In this presentation, we briefly discuss statistical methods behind the 2020 PES, challenges encountered during the recent pandemic, and results--numerical estimates of coverage.

Macos Prates, Professor, Federal University of Minas Gerais and President of the Brazilian Statistics Association

Title: Is augmentation effective to improve prediction in unbalanced datasets?

Abstract: Many real-life text datasets are imbalanced, meaning we have more observations of a class than others; for example, we have more positive reviews of a product than negative ones. Classification of those datasets is an essential topic in Machine Learning in different data types as images, for detect images for example you have fewer images with a certain object; in text classification, for spam filtering and sentiment analysis. In this paper, we will provide theoretical and empirical evidence that, differently from common belief, data augmentation does not improve the model prediction capacity for an unbalanced dataset. Indeed an appropriate selection of the cut-off point over the prediction probability is sufficient to maximize the prediction capacity for a given dataset. The authors would like to thank FAPEMIG, CNPq and CAPES for partial financial support.

Brien Aronov, Lead Data Scientist, Business Insurance Analytics & Research at Travelers Insurance

Title: The Importance of Statistical Fundamentals in Insurance

Abstract: Idea is to focus on the educational aspects of how I apply what I learned at UConn to my day-to-day job.

Steve Leeds, VP Business Analytics, Ironwood Pharmaceuticals

Title: Using Bernstein-Bezier Curves to fit product uptake and event impacts, while achieving specific short and long-range performance expectations

Abstract: Abstract: As part of the brand sales forecast building process within the corporate environment, there are two dynamics that typically come into play. The individual(s) building the forecast are trying to assess the future monthly, quarterly, and annual trends using historic information, statistical techniques, and product knowledge. Concurrently, the individuals responsible for the brand are balancing the information presented from the results of that exercise, with the brand goals and expectations for which they are responsible. These goals and expectations are typically communicated at a higher level (e.g., we need x% growth in the upcoming year, and y% the following year.) The outcome is some blending of forecast and goal, where the goals are very specific (numerically). The expectations for this monthly (or even weekly) sales forecast deliverable are either a two, five, or ten-year monthly curve that achieves these numeric annual goals precisely. Additionally, the curve itself must appear to be visually correct for brand uptakes, as well as for specific events that are expected to occur at certain time periods. Using Bernstein-Bezier curves and their associated control points, equations can be immediately solved (in Excel, for example) generating good candidate curves, that will precisely satisfy all constraints, as well as being extremely flexible to those constraints changing.

Chun Wang, Data Science Director, Liberty Mutual Insurance

Title: My Journey of Applying Ph.D. Research to Industrial Data Science Job

Abstract: I would like to introduce the data science job at Liberty Mutual Insurance and share how my Ph.D. research has been applied in industry and potentially generated value to our policyholders.

Xia Wang, Professor, Department of Mathematical Sciences, University of Cincinnati

Title: Joint hierarchical Gaussian process model with applications in cystic fibrosis studies

Abstract: Our joint hierarchical Gaussian process model (JHGP) was motivated by the need to characterize the association between lung-function decline and onset of pulmonary exacerbation (PE) over time in patients with cystic fibrosis (CF). The clinical course of this lethal autosomal disease is marked by progressive loss of lung function and eventual respiratory failure. Joint modeling of these longitudinal measures and PE outcomes provides more accurate inference and dynamic prediction of disease progression. In the proposed JHGP model, a two-level Gaussian process (GP) is used to estimate the nonlinear longitudinal trajectories and a flexible link function is introduced for a more accurate depiction of the binary process on the event outcome. Bayesian model assessment is used to evaluate each component of the joint model. The model is applied to medical monitoring datasets from the United States Cystic Fibrosis Foundation Patient Registry at a CF center. The proposed model is particularly advantageous in personalized prediction. This talk is based on joint work with Dr. Rhonda D. Szczesniak, Dr. Leo L. Duan, and Dr. Weiji Su.

Zoe Hua, Director of Biostatistics, Servier Pharmaceuticals

Title: Data science technique applied in real world clinical trial problems: Hierarchical Semi-parametric Bayesian Modeling in Patient Screening and Enrollment Dynamic Prediction for Multicenter Clinical Trials

Abstract: Reliable prediction of patients screened and the time to reach target enrolled patients for clinical trial is important to support budget and resource planning, and timeline forecasting. Clinical trials can face challenges in the screening and enrollment process, including screening sufficient all comers for biomarker relevant trials and enrolling sufficient number of patients for some critical disease subtypes if they are rare, and so on. Bayesian Semiparametric Mixture Model was developed to model screening process and enrollment process to accommodate heterogeneity from disease subtypes or biomarker defined subtypes Model performance will be illustrated via simulations results.

Nathan Lally, AVP Data Science at HSB (Munich Re Group)

Title: Causal Inference with IoT Time Series Data in Support of Insurance Loss Mitigation--A Collaboration between HSB and UConn

Abstract: Through its internet of things (IoT) program, Hartford Steam Boiler (HSB) has deployed wireless sensor technology in the commercial properties of insureds (through our direct insurance carrier clients) with the goal of monitoring temperature in areas at risk of pipe freeze, burst and subsequent water damage and insurance loss. If conditions arise that are conducive to pipe freeze, HSB sends an alert to the insured via mobile app, SMS or a phone call. The insured is then meant to take action (turn up the temperature, fix heating system etc.) to mitigate the risk condition. HSB can measure the efficacy of its program, and subsequently its value to our clients, by our ability to motivate corrective actions among insureds as well as long-term behavioral change. Unfortunately, not all insureds respond promptly or entirely truthfully to HSB's alerts and outreach; making efficacy of the program hard to measure accurately. For this reason, HSB has partnered with the University of Connecticut's Statistics and Computer Science departments to develop causal ML methods to infer whether insureds took action or not from IoT sensor data streams alone.

Timothy Moore, Director of Statistical Consulting Services, University of Connecticut

Title: Statistical consulting at UConn

Statistical consulting at UConn goes as far back as the Department of Statistics itself. The Statistical Consulting Services (previously the Center of Applied Statistics) has been assisting clients from the UConn community and beyond for more than three decades. This relatively small facility, staffed by Statistics graduate students, has had a massive impact on research at the University. In this talk, we will sketch the past, present, and future legacy of the SCS, and highlight two recent projects that illustrate the impact of this vital service.

Eric Baron, Graduate Consultant, University of Connecticut

Title: Bayesian detection of bias in peremptory challenges using historical strike data

US law prohibits using peremptory strikes during jury selection because of prospective juror race, ethnicity, gender, or membership in other `cognizable' classes. In this talk, I present our proposed Bayesian approach for detecting such illegal bias. We develop a novel use of the power prior to adjust the weight of historical trial information in the analysis of an attorneys strike pattern in a current case. With our collaboration, our client has developed an R-Shiny app and the accompanying paper is under revision at The American Statistician.

Jung Wun Lee Graduate Consultant, University of Connecticut

Title: A latent class analysis among substance-involved families in child welfare

This project aimed at identifying latent groupings of families involved with the US Child Welfare system as a result of parental substance use disorder. I will discuss the implementation of the bootstrapped-likelihood ratio test for determining the number of latent classes and the development of its R function. Finally, I will discuss the contribution of our work to the client, and how collaboration with the SCS has contributed to research in real data practice.