Abstract: The conclusions of inference available from a Bayesian approach are easy to understand, as they are probability statements about unknown parameters. In the literature on foundations, this is called an “epistemic probability”, reflecting our uncertainty of knowledge. A drawback of the Bayesian approach is the need to provide a prior probability distribution for the unknown parameters. There have been many attempts to find methods that provide probabilistic conclusions without invoking prior distributions — “making the Bayesian omelette without breaking the Bayesian eggs”. A series of conferences on foundational aspects of inference held regularly since 2013, under the title "Bayes, Frequentist, Fiducial (BFF)”, has developed several of these methods. This talk will give a high-level overview of these, with a view to understanding how they relate to current concerns about reproducibility of statistical results.
Nancy Reid is University Professor in the Department of Statistical Sciences at the University of Toronto. Her research interests include statistical theory, likelihood inference, design of studies, and statistical science in public policy. She has held many professional leadership roles in statistical science, in Canada and abroad. Dr. Reid studied at the University of Waterloo (B.Math. 1974), the University of British Columbia (M.Sc. 1976), Stanford University (PhD 1979) and Imperial College, London (PDF 1980). She joined the University of Toronto in 1986 from the University of British Columbia. Dr. Reid has served as President of the Institute of Mathematical Statistics and the Statistical Society of Canada. She is co-editor of the Annual Review of Statistics and Its Application. She won the COPSS Presidents' Award in 1992, the Krieger–Nelson Prize in 1995, the Statistical Society of Canada Gold Medal and the Florence Nightingale David Award in 2009, and the Statistical Society of Canada Distinguished Service Award in 2013. She is a Fellow of the Royal Society, the Royal Society of Canada, the American Association for the Advancement of Science, and a Foreign Associate of the National Academy of Sciences of the United States. In 2014 she was appointed Officer of the Order of Canada.
Abstract: Causal inference from observational data is a vital problem, but it comes with strong assumptions. Most methods require that we observe all confounders, variables that affect both the causal variables and the outcome variables. But whether we have observed all confounders is a famously untestable assumption. In this talk I will describe the deconfounder, a way to do causal inference with alternative assumptions than the classical methods require.
How does the deconfounder work? While traditional causal methods measure the effect of a single cause on an outcome, many modern scientific studies involve multiple causes, different variables whose effects are simultaneously of interest. The deconfounder uses the correlation among multiple causes as evidence for unmeasured confounders, combining unsupervised machine learning and predictive model checking to perform causal inference.
In this talk I will describe the deconfounder methodology and discuss the theoretical requirements for the deconfounder to provide unbiased causal estimates. I will touch on some of the academic debates surrounding the deconfounder, and demonstrate the deconfounder on real-world data and simulation studies.
This is joint work with Yixin Wang.
David Blei is a Professor of Statistics and Computer Science at Columbia University, and a member of the Columbia Data Science Institute. He studies probabilistic machine learning, including its theory, algorithms, and application. David has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), ACM-Infosys Foundation Award (2013), a Guggenheim fellowship (2017), and a Simons Investigator Award (2019). He is the co-editor-in-chief of the Journal of Machine Learning Research. He is a fellow of the ACM and the IMS.
Background: Hopes were high for rapid advances in biomedicine when the human genome was decoded at the start of the 21st century. Studies of the role of single nucleotide polymorphisms (SNPs) have assembled huge samples and spent a great deal of money but have explained little of the heritability of most heritable complex diseases. Gene-by-gene interaction, known as “epistasis”, may play an important role, but research has been hindered because identification of SNP-SNP interactions requires exploration of immense search spaces. Our focus has been on young-onset conditions, such as birth defects, autism and schizophrenia, for which parents of cases are usually available to be genotyped. Current approaches using nuclear families to study epistasis can accommodate at most several hundred candidate SNPs. Environmental factors may often be involved and may interact with multiple genetic variants. Sub-phenotypes of the condition could arise through distinct sets of variants, and for some conditions with young onset there could be maternally-mediated genetic effects or interactions between the maternal and the fetal genomes that influence the fetus prenatally.
New methods: We developed GADGETS (Genetic algorithm for detecting genetic epistasis using triads or siblings), which finds epistatic SNP-sets by applying a “genetic” algorithm to case-parent or case-sibling data. The method is inspired by how Darwinian evolution selects for improved fitness. To best enable detection of multiple epistatic sets, we let many independent “island populations” of random SNP-sets evolve separately under selection that is based on evident joint relevance to disease risk. After relatively “fit” sets have evolved, the software evaluates the identified SNP-sets via permutation testing and provides graphical visualization. GADGETS correctly identified epistatic SNP-sets in realistically simulated case-parent triads with 10,000 candidate SNPs, far more than competitors can handle, and it outperformed competitors in simulations with many fewer SNPs. Applying GADGETS to family-based data for the birth defect, oral clefting, downloaded from dbGaP (database for genotypes and phenotypes), we identified SNP-sets with possible epistatic effects on risk. An extension to maternally-mediated genetic effects or maternal/fetal interactions is straightforward and simulations show equally good performance. We are currently also extending our method and simulations to allow for epistasis that can depend on the outcome phenotype, on another factor like being male or female, or on an exposure.
Availability: GADGETS is part of the epistasisGA package at https://github.com/mnodzenski/epistasisGA.Title: TBA
Dr. Clarice Weinberg is a biostatistician with substantial experience in epidemiology, currently a tenured Senior Investigator at the National Institute of Environmental Health Sciences, National Institutes of Health (NIH/NIEHS), in the Biostatistics and Computational Biology Branch. Her research has focused on devising improved methods for design and analysis of epidemiologic studies, and applying those methods to epidemiologic research. Much of her work has concerned methods and applications specific to reproductive, radiation, genetic, environmental and, most recently, cancer epidemiology. Examples include devising study designs based on pooling of bio-specimens prior to assay, new methods of analysis for biomarkers when a high fraction of determinations fall below the assay limit of detection, and statistical genetics methods for family-based inference. Their impact is extended by providing free software to the public, and advising users in applying the methods. She has mentored 24 UNC doctoral students (serving as primary for 6), 1 currently, and 8 postdoctoral fellows at NIEHS. She serves as co-PI for the NIEHS prospective Sister Study, for which she has assembled a cohort of 50,884 women who had never had breast cancer themselves at enrollment but were each the sister of a woman with breast cancer. As a biostatistician, she must secure outside funding for scientific projects that she leads and she was funded by Susan G. Komen for the Cure to launch my own companion case-control family study, based on families with daughters discordant for young-onset breast cancer (and their parents). This “Two Sister Study” yielded extensive genetic and environmental data
Abstract: All models are not wrong, in fact some of them could be correct, at least locally! Moreover, they are useful! Based on this principle, I will propose Bayesian local models using partitioning. The Bayesian partition model constructs arbitrarily complex models by splitting the covariate space into an unknown number of disjoint regions. Within each region the data are assumed to be generated by a simpler model. The partition can be created using Voronoi Tessellations or Trees. The main challenge is to determine the local regions (partitions) adaptively. I will discuss local models for density regression, survival analysis and spatial prediction. Some theoretical properties of the models will be discussed. I will show simulations and applications to real data analysis where the proposed method will successfully identify the partition structure as well as estimate the local model parameters.
Dr. Bani K. Mallick is a Distinguished Professor and Susan M. Arseven `75 Chair in Data Science and Computational Statistics in the Department of Statistics at Texas A&M University in College Station. He is the Director of the NSF TRIPODS Institute of Data science and the Center for Statistical Bioinformatics. Dr. Mallick is well known for his contribution to the theory and practice of Bayesian Semiparametric methods and Uncertainty Quantification. He is an elected fellow of American Association for the Advancement of Science, American Statistical Association, Institute of Mathematical Statistics, International Statistical Institute and the Royal Statistical Society. He received the Distinguished research awards from Texas A&M University and the Young Researcher award from the International Indian Statistical Association. Mallick’s areas of research include semiparametric classification and regression, hierarchical spatial modeling, inverse problem, uncertainty quantification and Bioinformatics. He has coauthored or co-edited six books and more than 200 research publications.