http://www.math.ucsd.edu/~rxu/math284/slect1.pdfI start by briefly reviewing the required methods of forensic genetics before the main case is presented: The missing grandchildren of Argentina. This is a well known collection of missing person cases. From 1976 to 1983, Argentina suffered a military civic dictatorship. It is estimated that 30,000 people were kidnapped, sent to clandestine centers, tortured and murdered. Many women were pregnant at the time of abduction. Children were delivered to families related to or from the military forces, and their identities were forged. In most cases their biological parents were murdered and their bodies still remain missing. The objective is to decide whether a person of interest (POI), potentially a child born in captivity, is identical to the missing person in a family, based on the DNA profile of the POI and available family members. We evaluate the statistical power of families from the DNA reference databank (Banco Nacional de Datos Genéticos). As a result we show that several of the families have poor statistical power and require additional genetic data to enable a positive identification. Time/place: Tuesday, March 10, 2020, 14:15-15, Lunchroom, sentralbygg 2, Gløshaugen Speaker: Kjell Doksum, Wisconsin/Berkeley, http://pages.stat.wisc.edu/~doksum/ Title: Regression Quantile Differences and High Dimensional Data Abstract: Statistical methods that give detailed comparisons of responses from two populations are given for studies that include confounding covariates. Let X and Y adjusted denote responses from the two populations that have been adjusted for the confounding covariates by subtracting the linear regression of the centered responses on the standardized covariates. The comparisons of the two populations are in terms of differences of the X and Y quantiles at different quantile levels a in (0,1). This difference can be represented by a shift function D(x) with the property that X+D(X) has the same distribution as Y. That is, if X is an adjusted control response and Y is an adjusted treatment response, then the model allows the treatment effect to be different for different levels of X. For instance, a medication for high blood pressure may have different effects for people with different levels of blood pressure. The usual linear model for this type of regression experiments assumes the D(x) is a constant equal to the difference of the Y and X means. The statistical methods developed are based on simple simultaneous confidence bands for the shift function D(x) computed from independent multivariate samples from the two populations. The shift function D(x) is nonparametric, while the within population models are linear, making this a partially linear model. The methods are shown to be applicable to high dimensional data where the number of variables p is larger than the two X and Y sample sizes m and n. This is joint work with Summer Yang at New York University. Rediger Time/place: Monday, March 9, 2020, 17-18, Lunchroom, sentralbygg 2, Gløshaugen Speaker: Thore Egeland, NMBU, https://www.nmbu.no/ans/thore.egeland Title: Statistical methods in forensic genetics exemplified by ‘The missing grandchildren of Argentina Rediger of forensic genetics before the main case is presented: The missing grandchildren of Argentina. This is a well known collection of missing person cases. From 1976 to 1983, Argentina suffered a military civic dictatorship. It is estimated that 30,000 people were kidnapped, sent to clandestine centers, tortured and murdered. Many women were pregnant at the time of abduction. Children were delivered to families related to or from the military forces, and their identities were forged. In most cases their biological parents were murdered and their bodies still remain missing. The objective is to decide whether a person of interest (POI), potentially a child born in captivity, is identical to the missing person in a family, based on the DNA profile of the POI and available family members. We evaluate the statistical power of families from the DNA reference databank (Banco Nacional de Datos Genéticos). As a result we show that several of the families have poor statistical power and require additional genetic data to enable a positive identification. | Time/place: Monday, March 9, 2020, 14.15-15.00, S4, sentralbygg 2, Gløshaugen Speaker: Martin Jullum, NR, https://www.nr.no/?q=publicationprofile&query=jullum Title: How to open the black box – individual prediction explanation Abstract: Why did just you get a rejection on your loan application? Why is the price of your car insurance higher than that of your neighbor? More and more such decisions are made by complex statistical/machine learning models based on relevant data. Such (regression) models are often referred to as "black boxes" due to the difficulty of understanding how they work and produce different predictions. As these methods become increasingly important for individuals in our society, there is a clear need for methods which can help us understand their predictions, that is "open the black box". In this talk, I will motivate why this is useful and important. I will further discuss how Shapley values from game theory can be used as an explanation framework. To correctly explain the predictions, it is crucial to model the dependence between the covariates. I will exemplify this by showing that even a simple linear regression model is difficult to explain when the covariates are highly dependent. Finally, I will lay out recent work and methodology for modeling such dependence and how that leads to more accurate explanations through the Shapley value framework. In exceptional cases one is able to compute $E\{\phi(X)|T=t\}$ analytically. However, typically this is not possible, thus leading to the need for approximations or simulation algorithms. Several papers deal with problems of this kind, often in connection with inference in specific models [Cheng (1984) and (1985), Langsrud (2005), Diaconis and Sturmfels (1998)]. Engen and Lilleg\aa rd (1997) considered the general problem of Monte Carlo computation of conditional expectations given a sufficient statistic. Their ideas have been further developed and generalized in Lindqvist and Taraldsen (2005) [referred to as LT (2005)] and in the technical report Lindqvist and Taraldsen (2001) where a more detailed measure theoretic approach is employed. The present paper reviews basic ideas and results from LT (2005). The main purpose is to complement LT (2005) regarding computational aspects, examples and theoretical results. In particular we consider some new examples from lifetime data analysis with connections to work by Kjell Doksum [Bickel and Doksum (1969), exponential distributions; Doksum and H\o yland (1992), inverse Gaussian distributions]. \section{Setup and basic algorithm.} Following LT (2005) we assume that there is given a random vector $U$ with a known distribution, such that $(X,T)$ for given $\theta$ can be simulated by means of $U$. More precisely we assume the existence of functions $\chi$ and $\tau$ such that, for each $\theta$, the joint distribution of $(\chi (U, \theta), \tau (U, \theta))$ equals the joint distribution of $(X,T)$ under~$\theta$. For convenience we assume in the following that the distribution of $U$ is given by the density function $f(u)$.