ϟ

Danyu Lin

Here are all the papers by Danyu Lin that you can download and read on OA.mg.
Danyu Lin’s last known institution is . Download Danyu Lin PDFs here.

Claim this Profile →
DOI: 10.1080/01621459.1989.10478874
1989
Cited 2,050 times
The Robust Inference for the Cox Proportional Hazards Model
Abstract We derive the asymptotic distribution of the maximum partial likelihood estimator β for the vector of regression coefficients β under a possibly misspecified Cox proportional hazards model. As in the parametric setting, this estimator β converges to a well-defined constant vector β*. In addition, the random vector n 1/2(β – β*) is asymptotically normal with mean 0 and with a covariance matrix that can be consistently estimated. The newly proposed robust covariance matrix estimator is similar to the so-called “sandwich” variance estimators that have been extensively studied for parametric cases. For many misspecified Cox models, the asymptotic limit β* or part of it can be interpreted meaningfully. In those circumstances, valid statistical inferences about the corresponding covariate effects can be drawn based on the aforementioned asymptotic theory of β and the related results for the score statistics. Extensive studies demonstrate that the proposed robust tests and interval estimation procedures are appropriate for practical use. In particular, the robust score tests perform quite well even for small samples. In contrast, the conventional model-based inference procedures often lead to tests with supranominal size and confidence intervals with rather poor coverage probability.
DOI: 10.1080/01621459.1989.10478873
1989
Cited 1,533 times
Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions
Abstract Many survival studies record the times to two or more distinct failures on each subject. The failures may be events of different natures or may be repetitions of the same kind of event. In this article, we consider the regression analysis of such multivariate failure time observations. Each marginal distribution of the failure times is formulated by a Cox proportional hazards model. No specific structure of dependence among the distinct failure times on each subject is imposed. The regression parameters in the Cox models are estimated by maximizing the failure-specific partial likelihoods. The resulting estimators are shown to be asymptotically jointly normal with a covariance matrix that can be consistently estimated. Simultaneous inferential procedures are then proposed. Extensive Monte Carlo studies indicate that the normal approximation is adequate for practical use. The new methods allow time-dependent covariates, missing observations, and arbitrary patterns of censorship. They are illustrated with two real-life examples. For recurrent failure time data, various regression methods have been proposed in the literature. These methods, however, generally assume stringent structures of dependence among the recurrences of each subject. Moreover, as shown in the present article, they are rather sensitive to model misspecification.
DOI: 10.1038/ng.943
2011
Cited 1,298 times
Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4
We conducted a combined genome-wide association study (GWAS) of 7,481 individuals with bipolar disorder (cases) and 9,250 controls as part of the Psychiatric GWAS Consortium. Our replication study tested 34 SNPs in 4,496 independent cases with bipolar disorder and 42,422 independent controls and found that 18 of 34 SNPs had P < 0.05, with 31 of 34 SNPs having signals with the same direction of effect (P = 3.8 × 10(-7)). An analysis of all 11,974 bipolar disorder cases and 51,792 controls confirmed genome-wide significant evidence of association for CACNA1C and identified a new intronic variant in ODZ4. We identified a pathway comprised of subunits of calcium channels enriched in bipolar disorder association intervals. Finally, a combined GWAS analysis of schizophrenia and bipolar disorder yielded strong association evidence for SNPs in CACNA1C and in the region of NEK4-ITIH1-ITIH3-ITIH4. Our replication results imply that increasing sample sizes in bipolar disorder will confirm many additional loci.
DOI: 10.1093/biomet/80.3.557
1993
Cited 1,255 times
Checking the Cox model with cumulative sums of martingale-based residuals
This paper presents a new class of graphical and numerical methods for checking the adequacy of the Cox regression model. The procedures are derived from cumulative sums of martingale-based residuals over follow-up time and/or covariate values. The distributions of these stochastic processes under the assumed model can be approximated by zero-mean Gaussian processes. Each observed process can then be compared, both visually and analytically, with a number of simulated realizations from the approximate null distribution. These comparisons enable the data analyst to assess objectively how unusual the observed residual patterns are. Special attention is given to checking the functional form of a covariate, the form of the link function, and the validity of the proportional hazards assumption. An omnibus test, consistent against any model misspecification, is also studied. The proposed techniques are illustrated with two real data sets.
DOI: 10.1038/mp.2012.21
2012
Cited 989 times
A mega-analysis of genome-wide association studies for major depressive disorder
Prior genome-wide association studies (GWAS) of major depressive disorder (MDD) have met with limited success. We sought to increase statistical power to detect disease loci by conducting a GWAS mega-analysis for MDD. In the MDD discovery phase, we analyzed more than 1.2 million autosomal and X chromosome single-nucleotide polymorphisms (SNPs) in 18 759 independent and unrelated subjects of recent European ancestry (9240 MDD cases and 9519 controls). In the MDD replication phase, we evaluated 554 SNPs in independent samples (6783 MDD cases and 50 695 controls). We also conducted a cross-disorder meta-analysis using 819 autosomal SNPs with P<0.0001 for either MDD or the Psychiatric GWAS Consortium bipolar disorder (BIP) mega-analysis (9238 MDD cases/8039 controls and 6998 BIP cases/7775 controls). No SNPs achieved genome-wide significance in the MDD discovery phase, the MDD replication phase or in pre-planned secondary analyses (by sex, recurrent MDD, recurrent early-onset MDD, age of onset, pre-pubertal onset MDD or typical-like MDD from a latent class analyses of the MDD criteria). In the MDD-bipolar cross-disorder analysis, 15 SNPs exceeded genome-wide significance (P<5 × 10(-8)), and all were in a 248 kb interval of high LD on 3p21.1 (chr3:52 425 083-53 822 102, minimum P=5.9 × 10(-9) at rs2535629). Although this is the largest genome-wide analysis of MDD yet conducted, its high prevalence means that the sample is still underpowered to detect genetic effects typical for complex traits. Therefore, we were unable to identify robust and replicable findings. We discuss what this means for genetic research for MDD. The 3p21.1 MDD-BIP finding should be interpreted with caution as the most significant SNP did not replicate in MDD samples, and genotyping in independent samples will be needed to resolve its status.
DOI: 10.1146/annurev.publhealth.20.1.145
1999
Cited 824 times
TIME-DEPENDENT COVARIATES IN THE COX PROPORTIONAL-HAZARDS REGRESSION MODEL
The Cox proportional-hazards regression model has achieved widespread use in the analysis of time-to-event data with censoring and covariates. The covariates may change their values over time. This article discusses the use of such time-dependent covariates, which offer additional opportunities but must be used with caution. The interrelationships between the outcome and variable over time can lead to bias unless the relationships are well understood. The form of a time-dependent covariate is much more complex than in Cox models with fixed (non-time-dependent) covariates. It involves constructing a function of time. Further, the model does not have some of the properties of the fixed-covariate model; it cannot usually be used to predict the survival (time-to-event) curve over time. The estimated probability of an event over time is not related to the hazard function in the usual fashion. An appendix summarizes the mathematics of time-dependent covariates.
DOI: 10.1111/1467-9868.00259
2000
Cited 754 times
Semiparametric Regression for the Mean and Rate Functions of Recurrent Events
Summary The counting process with the Cox-type intensity function has been commonly used to analyse recurrent event data. This model essentially assumes that the underlying counting process is a time-transformed Poisson process and that the covariates have multiplicative effects on the mean and rate function of the counting process. Recently, Pepe and Cai, and Lawless and co-workers have proposed semiparametric procedures for making inferences about the mean and rate function of the counting process without the Poisson-type assumption. In this paper, we provide a rigorous justification of such robust procedures through modern empirical process theory. Furthermore, we present an approach to constructing simultaneous confidence bands for the mean function and describe a class of graphical and numerical techniques for checking the adequacy of the fitted mean–rate model. The advantages of the robust procedures are demonstrated through simulation studies. An illustration with multiple-infection data taken from a clinical study on chronic granulomatous disease is also provided.
DOI: 10.2307/2533848
1998
Cited 675 times
Assessing the Sensitivity of Regression Results to Unmeasured Confounders in Observational Studies
This paper presents a general approach for assessing the sensitivity of the point and interval estimates of the primary exposure effect in an observational study to the residual confounding effects of unmeasured variable after adjusting for measured covariates. The proposed method assumes that the true exposure effect can be represented in a regression model that includes the exposure indicator as well as the measured and unmeasured confounders. One can use the corresponding reduced model that omits the unmeasured confounder to make statistical inferences about the true exposure effect by specifying the distributions of the unmeasured confounder in the exposed and unexposed groups along with the effects of the unmeasured confounder on the outcome variable. Under certain conditions, there exists a simple algebraic relationship between the true exposure effect in the full model and the apparent exposure effect in the reduced model. One can then estimate the true exposure effect by making a simple adjustment to the point and interval estimates of the apparent exposure effect obtained from standard software or published reports. The proposed method handles both binary response and censored survival time data, accommodates any study design, and allows the unmeasured confounder to be discrete or normally distributed. We describe applications on two major medical studies.
DOI: 10.1146/annurev.publhealth.20.1.125
1999
Cited 614 times
METHODS FOR ANALYZING HEALTH CARE UTILIZATION AND COSTS
Important questions about health care are often addressed by studying health care utilization. Utilization data have several characteristics that make them a challenge to analyze. In this paper we discuss sources of information, the statistical properties of utilization data, common analytic methods including the two-part model, and some newly available statistical methods including the generalized linear model. We also address issues of study design and new methods for dealing with censored data. Examples are presented.
DOI: 10.1093/biomet/81.1.61
1994
Cited 549 times
Semiparametric analysis of the additive risk model
In contrast to the proportional hazards model, the additive risk model specifies that the hazard function associated with a set of possibly time-varying covariates is the sum of, rather than the product of, the baseline hazard function and the regression function of covariates. This formulation describes a different aspect of the association between covariates and the failure time than the proportional hazards model, and is more plausible than the latter for many applications. In the present paper, simple procedures with high efficiencies are developed for making inference about the regression parameters under the additive risk model with an unspecified baseline hazard function. The subject-specific survival estimation is also studied. The proposed techniques resemble the partial-likelihood-based methods for the proportional hazards model. A real example is provided.
DOI: 10.1002/sim.4780132105
1994
Cited 534 times
Cox regression analysis of multivariate failure time data: The marginal approach
Abstract Multivariate failure time data are commonly encountered in scientific investigations because each study subject may experience multiple events or because there exists clustering of subjects such that failure times within the same cluster are correlated. In this paper, I present a general methodology for analysing such data, which is analogous to that of Liang and Zeger for longitudinal data analysis. This approach formulates the marginal distributions of multivariate failure times with the familiar Cox proportional hazards models while leaving the nature of dependence among related failure times completely unspecified. The baseline hazard functions for the marginal models may be identical or different. Simple estimating equations for the regression parameters are developed which yield consistent and asymptotically normal estimators, and robust variance‐covarinace estimators are constructed to account for the intra‐class correlation. Simulation results demonstrate that the large‐sample approximations are adequate for practical use and that ignoring the intra‐class correlation could yield rather misleading variance estimators. The proposed methodology has been fully implemented in a simple computer program which also incorporates several alternative approaches. Detailed illustrations with data from four clinical or epidemiologic studies are provided.
DOI: 10.1002/(sici)1097-0258(19970715)16:13<1515::aid-sim572>3.0.co;2-1
1997
Cited 443 times
ESTIMATING THE PROPORTION OF TREATMENT EFFECT EXPLAINED BY A SURROGATE MARKER
In this paper, we measure the extent to which a biological marker is a surrogate endpoint for a clinical event by the proportional reduction in the regression coefficient for the treatment indicator due to the inclusion of the marker in the Cox regression model. We estimate this proportion by applying the partial likelihood function to two Cox models postulated on the same failure time variable. We show that the resultant estimator is asymptotically normal with a simple variance estimator. One can construct confidence intervals for the proportion by using the direct normal approximation to the point estimator or by using Fieller's theorem. Extensive simulation studies demonstrate that the proposed methods are appropriate for practical use. We provide applications to HIV/AIDS clinical trials. © 1997 John Wiley & Sons, Ltd.
DOI: 10.1002/(sici)1097-0258(19970430)16:8<901::aid-sim543>3.0.co;2-m
1997
Cited 430 times
NON-PARAMETRIC INFERENCE FOR CUMULATIVE INCIDENCE FUNCTIONS IN COMPETING RISKS STUDIES
In the competing risks problem, a useful quantity is the cumulative incidence function, which is the probability of occurrence by time t for a particular type of failure in the presence of other risks. The estimator of this function as given by Kalbfleisch and Prentice is consistent, and, properly normalized, converges weakly to a zero-mean Gaussian process with a covariance function for which a consistent estimator is provided. A resampling technique is developed to approximate the distribution of this process, which enables one to construct confidence bands for the cumulative incidence curve over the entire time span of interest and to perform Kolmogorov–Smirnov type tests for comparing two such curves. An AIDS example is provided. © 1997 by John Wiley & Sons, Ltd. Stat. Med., Vol. 16, 901–910 (1997).
DOI: 10.2307/2533947
1997
Cited 422 times
Estimating Medical Costs from Incomplete Follow-Up Data
Estimation of the average total cost for treating patients with a particular disease is often complicated by the fact that the survival times are censored on some study subjects and their subsequent costs are unknown. The naive sample average of the observed costs from all study subjects or from the uncensored cases only can be severely biased, and the standard survival analysis techniques are not applicable. To minimize the bias induced by censoring, we partition the entire time period of interest into a number of small intervals and estimate the average total cost either by the sum of the Kaplan-Meier estimator for the probability of dying in each interval multiplied by the sample mean of the total costs from the observed deaths in that interval or by the sum of the Kaplan-Meier estimator for the probability of being alive at the start of each interval multiplied by an appropriate estimator for the average cost over the interval conditional on surviving to the start of the interval. The resultant estimators are consistent if censoring occurs solely at the boundaries of the intervals. In addition, the estimators are asymptotically normal with easily estimated variances. Extensive numerical studies show that the asymptotic approximations are adequate for practical use and the biases of the proposed estimators are small even when censoring may occur in the interiors of the intervals. An ovarian cancer study is provided.
DOI: 10.1056/nejmoa1405386
2014
Cited 400 times
Inactivating Mutations in <i>NPC1L1</i> and Protection from Coronary Heart Disease
Ezetimibe lowers plasma levels of low-density lipoprotein (LDL) cholesterol by inhibiting the activity of the Niemann-Pick C1-like 1 (NPC1L1) protein. However, whether such inhibition reduces the risk of coronary heart disease is not known. Human mutations that inactivate a gene encoding a drug target can mimic the action of an inhibitory drug and thus can be used to infer potential effects of that drug.We sequenced the exons of NPC1L1 in 7364 patients with coronary heart disease and in 14,728 controls without such disease who were of European, African, or South Asian ancestry. We identified carriers of inactivating mutations (nonsense, splice-site, or frameshift mutations). In addition, we genotyped a specific inactivating mutation (p.Arg406X) in 22,590 patients with coronary heart disease and in 68,412 controls. We tested the association between the presence of an inactivating mutation and both plasma lipid levels and the risk of coronary heart disease.With sequencing, we identified 15 distinct NPC1L1 inactivating mutations; approximately 1 in every 650 persons was a heterozygous carrier for 1 of these mutations. Heterozygous carriers of NPC1L1 inactivating mutations had a mean LDL cholesterol level that was 12 mg per deciliter (0.31 mmol per liter) lower than that in noncarriers (P=0.04). Carrier status was associated with a relative reduction of 53% in the risk of coronary heart disease (odds ratio for carriers, 0.47; 95% confidence interval, 0.25 to 0.87; P=0.008). In total, only 11 of 29,954 patients with coronary heart disease had an inactivating mutation (carrier frequency, 0.04%) in contrast to 71 of 83,140 controls (carrier frequency, 0.09%).Naturally occurring mutations that disrupt NPC1L1 function were found to be associated with reduced plasma LDL cholesterol levels and a reduced risk of coronary heart disease. (Funded by the National Institutes of Health and others.).
DOI: 10.1038/mp.2008.125
2008
Cited 367 times
Genome-wide association for major depressive disorder: a possible role for the presynaptic protein piccolo
Major depressive disorder (MDD) is a common complex trait with enormous public health significance. As part of the Genetic Association Information Network initiative of the US Foundation for the National Institutes of Health, we conducted a genome-wide association study of 435 291 single nucleotide polymorphisms (SNPs) genotyped in 1738 MDD cases and 1802 controls selected to be at low liability for MDD. Of the top 200, 11 signals localized to a 167 kb region overlapping the gene piccolo (PCLO, whose protein product localizes to the cytomatrix of the presynaptic active zone and is important in monoaminergic neurotransmission in the brain) with P-values of 7.7 × 10−7 for rs2715148 and 1.2 × 10−6 for rs2522833. We undertook replication of SNPs in this region in five independent samples (6079 MDD independent cases and 5893 controls) but no SNP exceeded the replication significance threshold when all replication samples were analyzed together. However, there was heterogeneity in the replication samples, and secondary analysis of the original sample with the sample of greatest similarity yielded P=6.4 × 10−8 for the nonsynonymous SNP rs2522833 that gives rise to a serine to alanine substitution near a C2 calcium-binding domain of the PCLO protein. With the integrated replication effort, we present a specific hypothesis for further studies.
DOI: 10.1093/biomet/90.2.341
2003
Cited 359 times
Rank-based inference for the accelerated failure time model
A broad class of rank‐based monotone estimating functions is developed for the semiparametric accelerated failure time model with censored observations. The corresponding estimators can be obtained via linear programming, and are shown to be consistent and asymptotically normal. The limiting covariance matrices can be estimated by a resampling technique, which does not involve nonparametric density estimation or numerical derivatives. The new estimators represent consistent roots of the non‐monotone estimating equations based on the familiar weighted log‐rank statistics. Simulation studies demonstrate that the proposed methods perform well in practical settings. Two real examples are provided.
DOI: 10.1038/mp.2008.25
2008
Cited 341 times
Genomewide association for schizophrenia in the CATIE study: results of stage 1
Little is known for certain about the genetics of schizophrenia. The advent of genomewide association has been widely anticipated as a promising means to identify reproducible DNA sequence variation associated with this important and debilitating disorder. A total of 738 cases with DSM-IV schizophrenia (all participants in the CATIE study) and 733 group-matched controls were genotyped for 492 900 single-nucleotide polymorphisms (SNPs) using the Affymetrix 500K two-chip genotyping platform plus a custom 164K fill-in chip. Following multiple quality control steps for both subjects and SNPs, logistic regression analyses were used to assess the evidence for association of all SNPs with schizophrenia. We identified a number of promising SNPs for follow-up studies, although no SNP or multimarker combination of SNPs achieved genomewide statistical significance. Although a few signals coincided with genomic regions previously implicated in schizophrenia, chance could not be excluded. These data do not provide evidence for the involvement of any genomic region with schizophrenia detectable with moderate sample size. However, a planned genomewide association study for response phenotypes and inclusion of individual phenotype and genotype data from this study in meta-analyses hold promise for eventual identification of susceptibility and protective variants.
DOI: 10.1111/j.1369-7412.2007.00606.x
2007
Cited 272 times
Maximum Likelihood Estimation in Semiparametric Regression Models with Censored Data
Summary Semiparametric regression models play a central role in formulating the effects of covariates on potentially censored failure times and in the joint modelling of incomplete repeated measures and failure times in longitudinal studies. The presence of infinite dimensional parameters poses considerable theoretical and computational challenges in the statistical analysis of such models. We present several classes of semiparametric regression models, which extend the existing models in important directions. We construct appropriate likelihood functions involving both finite dimensional and infinite dimensional parameters. The maximum likelihood estimators are consistent and asymptotically normal with efficient variances. We develop simple and stable numerical techniques to implement the corresponding inference procedures. Extensive simulation experiments demonstrate that the inferential and computational methods proposed perform well in practical settings. Applications to three medical studies yield important new insights. We conclude that there is no reason, theoretical or numerical, not to use maximum likelihood estimation for semiparametric regression models. We discuss several areas that need further research.
DOI: 10.1056/nejmoa2117128
2022
Cited 254 times
Effectiveness of Covid-19 Vaccines over a 9-Month Period in North Carolina
The duration of protection afforded by coronavirus disease 2019 (Covid-19) vaccines in the United States is unclear. Whether the increase in postvaccination infections during the summer of 2021 was caused by declining immunity over time, the emergence of the B.1.617.2 (delta) variant, or both is unknown.We extracted data regarding Covid-19-related vaccination and outcomes during a 9-month period (December 11, 2020, to September 8, 2021) for approximately 10.6 million North Carolina residents by linking data from the North Carolina Covid-19 Surveillance System and the Covid-19 Vaccine Management System. We used a Cox regression model to estimate the effectiveness of the BNT162b2 (Pfizer-BioNTech), mRNA-1273 (Moderna), and Ad26.COV2.S (Johnson & Johnson-Janssen) vaccines in reducing the current risks of Covid-19, hospitalization, and death, as a function of time elapsed since vaccination.For the two-dose regimens of messenger RNA (mRNA) vaccines BNT162b2 (30 μg per dose) and mRNA-1273 (100 μg per dose), vaccine effectiveness against Covid-19 was 94.5% (95% confidence interval [CI], 94.1 to 94.9) and 95.9% (95% CI, 95.5 to 96.2), respectively, at 2 months after the first dose and decreased to 66.6% (95% CI, 65.2 to 67.8) and 80.3% (95% CI, 79.3 to 81.2), respectively, at 7 months. Among early recipients of BNT162b2 and mRNA-1273, effectiveness decreased by approximately 15 and 10 percentage points, respectively, from mid-June to mid-July, when the delta variant became dominant. For the one-dose regimen of Ad26.COV2.S (5 × 1010 viral particles), effectiveness against Covid-19 was 74.8% (95% CI, 72.5 to 76.9) at 1 month and decreased to 59.4% (95% CI, 57.2 to 61.5) at 5 months. All three vaccines maintained better effectiveness in preventing hospitalization and death than in preventing infection over time, although the two mRNA vaccines provided higher levels of protection than Ad26.COV2.S.All three Covid-19 vaccines had durable effectiveness in reducing the risks of hospitalization and death. Waning protection against infection over time was due to both declining immunity and the emergence of the delta variant. (Funded by a Dennis Gillings Distinguished Professorship and the National Institutes of Health.).
DOI: 10.1038/s41588-021-00997-7
2022
Cited 163 times
Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data
Analyses of data from genome-wide association studies on unrelated individuals have shown that, for human traits and diseases, approximately one-third to two-thirds of heritability is captured by common SNPs. However, it is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular whether the causal variants are rare, or whether it is overestimated due to bias in inference from pedigree data. Here we estimated heritability for height and body mass index (BMI) from whole-genome sequence data on 25,465 unrelated individuals of European ancestry. The estimated heritability was 0.68 (standard error 0.10) for height and 0.30 (standard error 0.10) for body mass index. Low minor allele frequency variants in low linkage disequilibrium (LD) with neighboring variants were enriched for heritability, to a greater extent for protein-altering variants, consistent with negative selection. Our results imply that rare variants, in particular those in regions of low linkage disequilibrium, are a major source of the still missing heritability of complex traits and disease.
DOI: 10.1056/nejmc2215471
2023
Cited 137 times
Effectiveness of Bivalent Boosters against Severe Omicron Infection
DOI: 10.1001/jama.2022.17876
2022
Cited 84 times
Association of Primary and Booster Vaccination and Prior Infection With SARS-CoV-2 Infection and Severe COVID-19 Outcomes
Data about the association of COVID-19 vaccination and prior SARS-CoV-2 infection with risk of SARS-CoV-2 infection and severe COVID-19 outcomes may guide prevention strategies.To estimate the time-varying association of primary and booster COVID-19 vaccination and prior SARS-CoV-2 infection with subsequent SARS-CoV-2 infection, hospitalization, and death.Cohort study of 10.6 million residents in North Carolina from March 2, 2020, through June 3, 2022.COVID-19 primary vaccine series and boosters and prior SARS-CoV-2 infection.Rate ratio (RR) of SARS-CoV-2 infection and hazard ratio (HR) of COVID-19-related hospitalization and death.The median age among the 10.6 million participants was 39 years; 51.3% were female, 71.5% were White, and 9.9% were Hispanic. As of June 3, 2022, 67% of participants had been vaccinated. There were 2 771 364 SARS-CoV-2 infections, with a hospitalization rate of 6.3% and mortality rate of 1.4%. The adjusted RR of the primary vaccine series compared with being unvaccinated against infection became 0.53 (95% CI, 0.52-0.53) for BNT162b2, 0.52 (95% CI, 0.51-0.53) for mRNA-1273, and 0.51 (95% CI, 0.50-0.53) for Ad26.COV2.S 10 months after the first dose, but the adjusted HR for hospitalization remained at 0.29 (95% CI, 0.24-0.35) for BNT162b2, 0.27 (95% CI, 0.23-0.32) for mRNA-1273, and 0.35 (95% CI, 0.29-0.42) for Ad26.COV2.S and the adjusted HR of death remained at 0.23 (95% CI, 0.17-0.29) for BNT162b2, 0.15 (95% CI, 0.11-0.20) for mRNA-1273, and 0.24 (95% CI, 0.19-0.31) for Ad26.COV2.S. For the BNT162b2 primary series, boosting in December 2021 with BNT162b2 had the adjusted RR relative to primary series of 0.39 (95% CI, 0.38-0.40) and boosting with mRNA-1273 had the adjusted RR of 0.32 (95% CI, 0.30-0.34) against infection after 1 month and boosting with BNT162b2 had the adjusted RR of 0.84 (95% CI, 0.82-0.86) and boosting with mRNA-1273 had the adjusted RR of 0.60 (95% CI, 0.57-0.62) after 3 months. Among all participants, the adjusted RR of Omicron infection compared with no prior infection was estimated at 0.23 (95% CI, 0.22-0.24) against infection, and the adjusted HRs were 0.10 (95% CI, 0.07-0.14) against hospitalization and 0.11 (95% CI, 0.08-0.15) against death after 4 months.Receipt of primary COVID-19 vaccine series compared with being unvaccinated, receipt of boosters compared with primary vaccination, and prior infection compared with no prior infection were all significantly associated with lower risk of SARS-CoV-2 infection (including Omicron) and resulting hospitalization and death. The associated protection waned over time, especially against infection.
DOI: 10.1056/nejmc2302462
2023
Cited 56 times
Durability of Bivalent Boosters against Omicron Subvariants
DOI: 10.1111/j.1532-5415.1999.tb01603.x
1999
Cited 300 times
Assessment and Control for Confounding by Indication in Observational Studies
In the evaluation of pharmacologic therapies, the controlled clinical trial is the preferred design. When clinical trial results are not available, the alternative designs are observational epidemiologic studies. A traditional concern about the validity of findings from epidemiologic studies is the possibility of bias from uncontrolled confounding. In studies of pharmacologic therapies, confounding by indication may arise when a drug treatment serves as a marker for a clinical characteristic or medical condition that triggers the use of the treatment and that, at the same time, increases the risk of the outcome under study. Confounding by indication is not conceptually different from confounding by other factors, and the approaches to detect and control for confounding — matching, stratification, restriction, and multivariate adjustment — are the same. Even after adjustment for known risk factors, residual confounding may occur because of measurement error or unmeasured or unknown risk factors. Although residual confounding is difficult to exclude in observational studies, there are limits to what this “unknown” confounding can explain. The degree of confounding depends on the prevalence of the putative confounding factor, the level of its association with the disease, and the level of its association with the exposure. For example, a confounding factor with a prevalence of 20% would have to increase the relative odds of both outcome and exposure by factors of 4 to 5 before the relative risk of 1.57 would be reduced to 1.00. Observational studies have provided important scientific evidence about the risks associated with several risk factors, including drug therapies, and they are often the only option for assessing safety. Understanding the methods to detect and control for confounding makes it possible to assess the plausibility of claims that confounding is an alternative explanation for the findings of particular studies. J Am Geriatr Soc 47:749–754, 1999.
DOI: 10.1001/jama.296.22.2703
2006
Cited 248 times
Association of Polymorphisms in the CRP Gene With Circulating C-Reactive Protein Levels and Cardiovascular Events
C-reactive protein (CRP) is an inflammation protein that may play a role in the pathogenesis of cardiovascular disease (CVD).To assess whether polymorphisms in the CRP gene are associated with plasma CRP, carotid intima-media thickness (CIMT), and CVD events.In the prospective, population-based Cardiovascular Health Study, 4 tag single-nucleotide polymorphisms (SNPs) (1919A/T, 2667G/C, 3872G/A, 5237A/G) were genotyped in 3941 white (European American) participants and 5 tag SNPs (addition of 790A/T) were genotyped in 700 black (African American) participants, aged 65 years or older, all of whom were without myocardial infarction (MI) or stroke before study entry. Median follow-up was 13 years (1989-2003).Baseline CIMT; occurrence of MI, stroke, and CVD mortality during follow-up.In white participants, 461 incident MIs, 491 incident strokes, and 490 CVD-related deaths occurred; in black participants, 67 incident MIs, 78 incident strokes, and 75 CVD-related deaths occurred. The 1919T and 790T alleles were associated with higher CRP levels in white and black participants, respectively. The 3872A allele was associated with lower CRP levels in both populations, and the 2667C allele was associated with lower CRP levels in white participants only. There was no association between CIMT and any CRP gene polymorphism in either population. In white participants, the 1919T allele was associated with increased risk of stroke for TT vs AA (hazard ratio [HR], 1.40; 95% confidence interval [CI], 1.06-1.87) and for CVD mortality (HR, 1.40; 95% CI, 1.10-1.90). In black participants, homozygosity for the 790T allele was associated with a 4-fold increased risk of MI compared with homozygosity for the 790A allele (95% CI, 1.58-10.53). The minor alleles of the 2 SNPs associated with lower plasma CRP concentration in white participants (2667C and 3872A) were associated with decreased risk of CVD mortality.Genetic variation in the CRP gene is associated with plasma CRP levels and CVD risk in older adults.
DOI: 10.2307/2290085
1989
Cited 243 times
The Robust Inference for the Cox Proportional Hazards Model
Abstract We derive the asymptotic distribution of the maximum partial likelihood estimator β for the vector of regression coefficients β under a possibly misspecified Cox proportional hazards model. As in the parametric setting, this estimator β converges to a well-defined constant vector β*. In addition, the random vector n 1/2(β – β*) is asymptotically normal with mean 0 and with a covariance matrix that can be consistently estimated. The newly proposed robust covariance matrix estimator is similar to the so-called “sandwich” variance estimators that have been extensively studied for parametric cases. For many misspecified Cox models, the asymptotic limit β* or part of it can be interpreted meaningfully. In those circumstances, valid statistical inferences about the corresponding covariate effects can be drawn based on the aforementioned asymptotic theory of β and the related results for the score statistics. Extensive studies demonstrate that the proposed robust tests and interval estimation procedures are appropriate for practical use. In particular, the robust score tests perform quite well even for small samples. In contrast, the conventional model-based inference procedures often lead to tests with supranominal size and confidence intervals with rather poor coverage probability.
DOI: 10.1093/biomet/77.4.845
1990
Cited 233 times
Linear regression analysis of censored survival data based on rank tests
Recently linear rank statistics with censored data have been used as the estimating functions for the regression parameters in the linear model with an unspecified error distribution. The resulting rank estimators are consistent and asymptotically normal. However, the asymptotic variances of these estimators are complicated and are difficult to estimate well with censored data. In this paper, we propose some simple methods for making inference about a subset of the regression coefficients while regarding others as nuisance parameters. A lack-of-fit test for the linear model is also presented. The proposed procedures are illustrated with an example.
DOI: 10.1198/016214501750333018
2001
Cited 232 times
Semiparametric and Nonparametric Regression Analysis of Longitudinal Data
This article deals with the regression analysis of repeated measurements taken at irregular and possibly subject-specific time points. The proposed semiparametric and nonparametric models postulate that the marginal distribution for the repeatedly measured response variable Y at time t is related to the vector of possibly time-varying covariates X through the equations E{Y(t)|| X(t} = α0(t) + β′0X(t) and E{Y(t)||X(t)} = α0(t)+ β′0(t)X(t), where α0(t) is an arbitrary function of t, β0 is a vector of constant regression coefficients, and β0(t) is a vector of time-varying regression coefficients. The stochastic structure of the process Y(·) is completely unspecified. We develop a class of least squares type estimators for β0, which is proven to be n½-consistent and asymptotically normal with simple variance estimators. Furthermore, we develop a closed-form estimator for a cumulative function of β0(t), which is shown to be n½-consistent and, on proper normalization, converges weakly to a zero-mean Gaussian process with an easily estimated covariance function. Extensive simulation studies demonstrate that the asymptotic approximations are accurate for moderate sample sizes and that the efficiencies of the proposed semiparametric estimators are high relative to their parametric counterparts. An illustration with longitudinal CD4 cell count data taken from an HIV/AIDS clinical trial is provided.
DOI: 10.1111/j.0006-341x.2000.00554.x
2000
Cited 225 times
Nonparametric Analysis of Recurrent Events and Death
This article is concerned with the analysis of recurrent events in the presence of a terminal event such as death. We consider the mean frequency function, defined as the marginal mean of the cumulative number of recurrent events over time. A simple nonparametric estimator for this quantity is presented. It is shown that the estimator, properly normalized, converges weakly to a zero-mean Gaussian process with an easily estimable covariance function. Nonparametric statistics for comparing two mean frequency functions and for combining data on recurrent events and death are also developed. The asymptotic null distributions of these statistics, together with consistent variance estimators, are derived. The small-sample properties of the proposed estimators and test statistics are examined through simulation studies. An application to a cancer clinical trial is provided.
DOI: 10.1080/01621459.1993.10476416
1993
Cited 209 times
Cox Regression with Incomplete Covariate Measurements
Abstract This article provides a general solution to the problem of missing covariate data under the Cox regression model. The estimating function for the vector of regression parameters is an approximation to the partial likelihood score function with full covariate measurements and reduces to the pseudolikelihood score function of Self and Prentice in the special setting of case-cohort designs. The resulting parameter estimator is consistent and asymptotically normal with a covariance matrix for which a simple and consistent estimator is provided. Extensive simulation studies show that the large-sample approximations are adequate for practical use. The proposed approach tends to be more efficient than the complete-case analysis, especially for large cohorts with infrequent failures. For case-cohort designs, the new methodology offers a variance-covariance estimator that is much easier to calculate than the existing ones and allows multiple subcohort augmentations to improve efficiency. Real data taken from clinical and epidemiologic studies are analyzed.
DOI: 10.1093/biostatistics/1.1.35
2000
Cited 209 times
Linear regression analysis of censored medical costs
This paper deals with the problem of linear regression for medical cost data when some study subjects are not followed for the full duration of interest so that their total costs are unknown. Standard survival analysis techniques are ill-suited to this type of censoring. The familiar normal equations for the least-squares estimation are modified in several ways to properly account for the incompleteness of the data. The resulting estimators are shown to be consistent and asymptotically normal with easily estimated variance-covariance matrices. The proposed methodology can be used when the cost database contains only the total costs for those with complete follow-up. More efficient estimators are available when the cost data are recorded in multiple time intervals. A study on the medical cost for ovarian cancer is presented.
DOI: 10.2307/2290084
1989
Cited 185 times
Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions
DOI: 10.1093/biomet/86.1.59
1999
Cited 182 times
Nonparametric estimation of the gap time distribution for serial events with censored data
In many follow-up studies, each subject can potentially experience a series of events, which may be repetitions of essentially the same event or may be events of entirely different natures. This paper provides a simple nonparametric estimator for the multivariate distribution function of the gap times between successive events when the follow-up time is subject to right censoring. The estimator is consistent and, upon proper normalisation, converges weakly to a zero-mean Gaussian process with an easily estimated covariance function. Numerical studies demonstrate that both the distribution function estimator and its covariance function estimator perform well for practical sample sizes. An application to a colon cancer study is presented. Keywords:Bivariate distribution; Correlated failure times; Dependent censoring; Kaplan-Meier estimator; Multiple events; Multivariate failure time; Recurrent events.
DOI: 10.1001/jama.282.8.786
1999
Cited 179 times
Surrogate End Points, Health Outcomes, and the Drug-Approval Process for the Treatment of Risk Factors for Cardiovascular Disease
Our website uses cookies to enhance your experience. By continuing to use our site, or clicking "Continue," you are agreeing to our Cookie Policy | Continue JAMA HomeNew OnlineCurrent IssueFor Authors Publications JAMA JAMA Network Open JAMA Cardiology JAMA Dermatology JAMA Health Forum JAMA Internal Medicine JAMA Neurology JAMA Oncology JAMA Ophthalmology JAMA Otolaryngology–Head & Neck Surgery JAMA Pediatrics JAMA Psychiatry JAMA Surgery Archives of Neurology & Psychiatry (1919-1959) Podcasts Clinical Reviews Editors' Summary Medical News Author Interviews More JN Learning / CMESubscribeJobsInstitutions / LibrariansReprints & Permissions Terms of Use | Privacy Policy | Accessibility Statement 2023 American Medical Association. All Rights Reserved Search All JAMA JAMA Network Open JAMA Cardiology JAMA Dermatology JAMA Forum Archive JAMA Health Forum JAMA Internal Medicine JAMA Neurology JAMA Oncology JAMA Ophthalmology JAMA Otolaryngology–Head & Neck Surgery JAMA Pediatrics JAMA Psychiatry JAMA Surgery Archives of Neurology & Psychiatry Input Search Term Sign In Individual Sign In Sign inCreate an Account Access through your institution Sign In Purchase Options: Buy this article Rent this article Subscribe to the JAMA journal
DOI: 10.1080/01621459.1998.10473777
1998
Cited 175 times
Marginal Regression Models for Multivariate Failure Time Data
Abstract In this article we propose a general Cox-type regression model to formulate the marginal distributions of multivariate failure time data. This model has a nested structure in that it allows different baseline hazard functions among distinct failure types and imposes a common baseline hazard function on the failure times of the same type. We prove that the maximum "quasi-partial-likelihood" estimator for the vector of regression parameters under the independence working assumption is consistent and asymptotically normal with a covariance matrix for which a consistent estimator is provided. Furthermore, we establish the uniform consistency and joint weak convergence of the Aalen-Breslow type estimators for the cumulative baseline hazard functions, and develop a resampling technique to approximate the joint distribution of these processes, which enables one to make simultaneous inference about the survival functions over the time axis and across failure types. Finally, we assess the small-sample properties of the proposed methods through Monte Carlo simulation, and present an application to a real dental study.
DOI: 10.1006/enrs.1993.1123
1993
Cited 172 times
Pulmonary Function Changes in Children Associated with Fine Particulate Matter
During winter months many neighborhoods in the Seattle metropolitan area are heavily affected by particulate matter from residential wood burning. A study was conducted to investigate the relationship between fine particulate matter and pulmonary function in young children. The subjects were 326 elementary school children, including 24 asthmatics, who lived in an area with high particulate concentrations predominantly from residential wood burning. FEV1 and FVC were measured before, during and after the 1988-1989 and 1989-1990 winter heating seasons. Fine particulate matter was assessed using a light-scattering instrument. Analysis of the relationship between light scattering and lung function indicated that an increase in particulate air pollution was associated with a decline in asthmatic children's pulmonary function. FEV1 and FVC in the asthmatic children dropped an average of 34 and 37 ml respectively for each 10(-4) m-1 increase in sigma sp. This sigma sp increase corresponds to an increase in PM2.5 of 20 micrograms/m3. It is concluded that fine particulate matter from wood burning is significantly associated with acute respiratory irritation in young asthmatic children.
DOI: 10.1093/biomet/85.2.289
1998
Cited 171 times
Additive hazards regression with current status data
Journal Article Additive hazards regression with current status data Get access D. Y. LIN, D. Y. LIN Department of Biostatistics, Box 357232, University of WashingtonSeattle, Washington 98195, U.S.A.danyu@biostat.washington.edu Search for other works by this author on: Oxford Academic Google Scholar DAVID OAKES, DAVID OAKES Department of Biostatistics, University of Rochester Medical Center610 Elmwood Avenue, Box 630, Rochester, New York 14642, U.S.A.oakes@metro.bst.rochester.edu Search for other works by this author on: Oxford Academic Google Scholar ZHILIANG YING ZHILIANG YING Department of Statistics, Hill Center, Busch Campus, Rutgers UniversityPiscataway, New Jersey 08855, U.S.A.zying@stat.rutgers.edu Search for other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 85, Issue 2, June 1998, Pages 289–298, https://doi.org/10.1093/biomet/85.2.289 Published: 01 June 1998 Article history Received: 01 September 1996 Revision received: 01 September 1997 Published: 01 June 1998
DOI: 10.1093/bioinformatics/bti053
2004
Cited 168 times
An efficient Monte Carlo approach to assessing statistical significance in genomic studies
Abstract Motivation: Multiple hypothesis testing is a common problem in genome research, particularly in microarray experiments and genomewide association studies. Failure to account for the effects of multiple comparisons would result in an abundance of false positive results. The Bonferroni correction and Holm's step-down procedure are overly conservative, whereas the permutation test is time-consuming and is restricted to simple problems. Results: We developed an efficient Monte Carlo approach to approximating the joint distribution of the test statistics along the genome. We then used the Monte Carlo distribution to evaluate the commonly used criteria for error control, such as familywise error rates and positive false discovery rates. This approach is applicable to any data structures and test statistics. Applications to simulated and real data demonstrate that the proposed approach provides accurate error control, and can be substantially more powerful than the Bonferroni and Holm methods, especially when the test statistics are highly correlated. Contact: lin@bios.unc.edu
DOI: 10.1093/biomet/asq006
2010
Cited 152 times
On the relative efficiency of using summary statistics versus individual-level data in meta-analysis
Meta-analysis is widely used to synthesize the results of multiple studies. Although meta-analysis is traditionally carried out by combining the summary statistics of relevant studies, advances in technologies and communications have made it increasingly feasible to access the original data on individual participants. In the present paper, we investigate the relative efficiency of analyzing original data versus combining summary statistics. We show that, for all commonly used parametric and semiparametric models, there is no asymptotic efficiency gain by analyzing original data if the parameter of main interest has a common value across studies, the nuisance parameters have distinct values among studies, and the summary statistics are based on maximum likelihood. We also assess the relative efficiency of the two methods when the parameter of main interest has different values among studies or when there are common nuisance parameters across studies. We conduct simulation studies to confirm the theoretical results and provide empirical comparisons from a genetic association study.
DOI: 10.1093/bioinformatics/btq600
2010
Cited 127 times
A variable selection method for genome-wide association studies
Abstract Motivation: Genome-wide association studies (GWAS) involving half a million or more single nucleotide polymorphisms (SNPs) allow genetic dissection of complex diseases in a holistic manner. The common practice of analyzing one SNP at a time does not fully realize the potential of GWAS to identify multiple causal variants and to predict risk of disease. Existing methods for joint analysis of GWAS data tend to miss causal SNPs that are marginally uncorrelated with disease and have high false discovery rates (FDRs). Results: We introduce GWASelect, a statistically powerful and computationally efficient variable selection method designed to tackle the unique challenges of GWAS data. This method searches iteratively over the potential SNPs conditional on previously selected SNPs and is thus capable of capturing causal SNPs that are marginally correlated with disease as well as those that are marginally uncorrelated with disease. A special resampling mechanism is built into the method to reduce false positive findings. Simulation studies demonstrate that the GWASelect performs well under a wide spectrum of linkage disequilibrium patterns and can be substantially more powerful than existing methods in capturing causal variants while having a lower FDR. In addition, the regression models based on the GWASelect tend to yield more accurate prediction of disease risk than existing methods. The advantages of the GWASelect are illustrated with the Wellcome Trust Case-Control Consortium (WTCCC) data. Availability: The software implementing GWASelect is available at http://www.bios.unc.edu/~lin. Access to WTCCC data: http://www.wtccc.org.uk/ Contact: lin@bios.unc.edu Supplementary information: Supplementary data are available at Bioinformatics Online.
DOI: 10.1093/biomet/asw013
2016
Cited 119 times
Maximum likelihood estimation for semiparametric transformation models with interval-censored data
Interval censoring arises frequently in clinical, epidemiological, financial and sociological studies, where the event or failure of interest is known only to occur within an interval induced by periodic monitoring. We formulate the effects of potentially time-dependent covariates on the interval-censored failure time through a broad class of semiparametric transformation models that encompasses proportional hazards and proportional odds models. We consider nonparametric maximum likelihood estimation for this class of models with an arbitrary number of monitoring times for each subject. We devise an EM-type algorithm that converges stably, even in the presence of time-dependent covariates, and show that the estimators for the regression parameters are consistent, asymptotically normal, and asymptotically efficient with an easily estimated covariance matrix. Finally, we demonstrate the performance of our procedures through simulation studies and application to an HIV/AIDS study conducted in Thailand.
DOI: 10.1016/j.neuron.2015.05.034
2015
Cited 106 times
Genetic Differences in the Immediate Transcriptome Response to Stress Predict Risk-Related Brain Function and Psychiatric Disorders
Depression risk is exacerbated by genetic factors and stress exposure; however, the biological mechanisms through which these factors interact to confer depression risk are poorly understood. One putative biological mechanism implicates variability in the ability of cortisol, released in response to stress, to trigger a cascade of adaptive genomic and non-genomic processes through glucocorticoid receptor (GR) activation. Here, we demonstrate that common genetic variants in long-range enhancer elements modulate the immediate transcriptional response to GR activation in human blood cells. These functional genetic variants increase risk for depression and co-heritable psychiatric disorders. Moreover, these risk variants are associated with inappropriate amygdala reactivity, a transdiagnostic psychiatric endophenotype and an important stress hormone response trigger. Network modeling and animal experiments suggest that these genetic differences in GR-induced transcriptional activation may mediate the risk for depression and other psychiatric disorders by altering a network of functionally related stress-sensitive genes in blood and brain.
DOI: 10.1016/j.cels.2020.03.005
2020
Cited 57 times
SCOPE: A Normalization and Copy-Number Estimation Method for Single-Cell DNA Sequencing
Whole-genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy-number profiles at the cellular level. We propose SCOPE, a normalization and copy-number estimation method for the noisy scDNA-seq data. SCOPE’s main features include the following: (1) a Poisson latent factor model for normalization, which borrows information across cells and regions to estimate bias, using in silico identified negative control cells; (2) an expectation-maximization algorithm embedded in the normalization step, which accounts for the aberrant copy-number changes and allows direct ploidy estimation without the need for post hoc adjustment; and (3) a cross-sample segmentation procedure to identify breakpoints that are shared across cells with the same genetic background. We evaluate SCOPE on a diverse set of scDNA-seq data in cancer genomics and show that SCOPE offers accurate copy-number estimates and successfully reconstructs subclonal structure. A record of this paper’s transparent peer review process is included in the Supplemental Information.
DOI: 10.1056/nejmc2209371
2022
Cited 38 times
Effects of Vaccination and Previous Infection on Omicron Infections in Children
DOI: 10.1016/s1473-3099(23)00272-4
2023
Cited 12 times
Effects of COVID-19 vaccination and previous SARS-CoV-2 infection on omicron infection and severe outcomes in children under 12 years of age in the USA: an observational cohort study
BackgroundData on the protection conferred by COVID-19 vaccination and previous SARS-CoV-2 infection against omicron (B.1.1.529) infection in young children are scarce. We aimed to estimate the time-varying effects of primary and booster COVID-19 vaccination and previous SARS-CoV-2 infection on subsequent omicron infection and severe illness (hospital admission or death) in children younger than 12 years of age.MethodsIn this observational cohort study, we obtained individual-level records on vaccination with the BNT162b2 and mRNA-1273 vaccines and clinical outcomes from the North Carolina COVID-19 Surveillance System and the COVID-19 Vaccine Management System for 1 368 721 North Carolina residents aged 11 years or younger from Oct 29, 2021 (Oct 29, 2021 for children aged 5–11 years and June 17, 2022 for children aged 0–4 years), to Jan 6, 2023. We used Cox regression to estimate the time-varying effects of primary and booster vaccination and previous infection on the risks of omicron infection, hospital admission, and death.FindingsFor children 5–11 years of age, the effectiveness of primary vaccination against infection, compared with being unvaccinated, was 59·9% (95% CI 58·5–61·2) at 1 month, 33·7% (32·6–34·8) at 4 months, and 14·9% (95% CI 12·3–17·5) at 10 months after the first dose. Compared with primary vaccination only, the effectiveness of a monovalent booster dose after 1 month was 24·4% (14·4–33·2) and that of a bivalent booster dose was 76·7% (45·7–90·0). The effectiveness of omicron infection against reinfection was 79·9% (78·8–80·9) after 3 months and 53·9% (52·3–55·5) after 6 months. For children 0–4 years of age, the effectiveness of primary vaccination against infection, compared with being unvaccinated, was 63·8% (57·0–69·5) at 2 months and 58·1% (48·3–66·1) at 5 months after the first dose, and the effectiveness of omicron infection against reinfection was 77·3% (75·9–78·6) after 3 months and 64·7% (63·3–66·1) after 6 months. For both age groups, vaccination and previous infection had better effectiveness against severe illness as measured by hospital admission or death as a composite endpoint than against infection.InterpretationThe BNT162b2 and mRNA-1273 vaccines were effective against omicron infection and severe outcomes in children younger than 12 years, although the effectiveness decreased over time. Bivalent boosters were more effective than monovalent boosters. Immunity acquired via omicron infection was high and waned gradually over time. These findings can be used to develop effective prevention strategies against COVID-19 in children younger than 12 years.FundingUS National Institutes of Health.
DOI: 10.1001/jamanetworkopen.2023.35077
2023
Cited 11 times
Nirmatrelvir or Molnupiravir Use and Severe Outcomes From Omicron Infections
Ritonavir-boosted nirmatrelvir and molnupiravir are currently used in the US and in other countries to treat nonhospitalized patients who have mild-to-moderate COVID-19 and who are at high risk for progression to severe disease. The associations of these 2 oral antiviral drugs with hospitalization and death resulting from infection with new SARS-CoV-2 Omicron subvariants, particularly BQ.1.1 and XBB.1.5, are unknown.To assess the association of nirmatrelvir or molnupiravir use with the risks of hospitalization and death among patients infected with new Omicron subvariants.This was a cohort study of patients who received a diagnosis of COVID-19 at Cleveland Clinic from April 1, 2022, to February 20, 2023 (during which the Omicron variant evolved from BA.2 to BA.4/BA.5, then to BQ.1/BQ.1.1, and finally to XBB/XBB.1.5) and who were at high risk of progressing to severe disease, with follow-up through 90 days after diagnosis. The final date for follow-up data collection was February 27, 2023.Treatment with ritonavir-boosted nirmatrelvir or molnupiravir.The primary outcome was time to death. The secondary outcome was time to either hospitalization or death. The association of either nirmatrelvir or molnupiravir use with each outcome was measured by the hazard ratio (HR) adjusted for demographic factors, socioeconomic status, date of COVID-19 diagnosis, coexisting medical conditions, COVID-19 vaccination status, and previous SARS-CoV-2 infection.There were 68 867 patients (29 386 [42.7%] aged ≥65 years; 26 755 [38.9%] male patients; 51 452 [74.7%] non-Hispanic White patients). Thirty of 22 594 patients treated with nirmatrelvir, 27 of 5311 patients treated with molnupiravir, and 588 of 40 962 patients who received no treatment died within 90 days of Omicron infection. The adjusted HRs of death were 0.16 (95% CI, 0.11-0.23) for nirmatrelvir and 0.23 (95% CI, 0.16-0.34) for molnupiravir. The adjusted HRs of hospitalization or death were 0.63 (95% CI, 0.59-0.68) for nirmatrelvir and 0.59 (95% CI, 0.53-0.66) for molnupiravir. The associations of both drugs with both outcomes were observed across subgroups defined by age, race and ethnicity, date of COVID-19 diagnosis, vaccination status, previous infection status, and coexisting conditions.These findings suggest that the use of either nirmatrelvir or molnupiravir is associated with reductions in mortality and hospitalization in patients infected with Omicron, regardless of age, race and ethnicity, virus strain, vaccination status, previous infection status, or coexisting conditions. Both drugs can, therefore, be used to treat nonhospitalized patients who are at high risk of progressing to severe COVID-19.
DOI: 10.1093/biomet/85.3.605
1998
Cited 148 times
Accelerated failure time models for counting processes
We present a natural extension of the conventional accelerated failure time model for survival data to formulate the effects of covariates on the mean function of the counting process for recurrent events. A class of consistent and asymptotically normal rank estimators is developed for estimating the regression parameters of the proposed model. In addition, a Nelson-Aalen-type estimator for the mean function of the counting process is constructed, which is consistent and, properly normalised, converges weakly to a zeromean Gaussian process. We assess the finite-sample properties of the proposed estimators and the associated inference procedures through Monte Carlo simulation and provide an application to a well-known bladder cancer study.
DOI: 10.1093/aje/152.7.674
2000
Cited 138 times
Influenza Vaccination and the Risk of Primary Cardiac Arrest
Influenza epidemics are associated with an excess of mortality not only from respiratory diseases but also from other causes, and cardiovascular mortality increases abruptly during influenza epidemics, with little evidence of a lag period. In a population-based case-control study, the authors examined whether influenza vaccination was associated with a reduced risk of out-of-hospital primary cardiac arrest (PCA), a major contributor to cardiovascular mortality in the community. Cases of PCA (n = 342) without prior heart disease or life-threatening comorbidity that occurred in King County, Washington, were identified from paramedic incident reports from October 1988 to July 1994. Demographically similar controls (n = 549) were identified from the community by using random digit dialing. Spouses of subjects were interviewed to assess treatment with influenza vaccine during the previous year and other risk factors. After adjustment for demographic, clinical, and behavioral risk factors, influenza vaccination was associated with a reduced risk of PCA (odds ratio = 0.51, 95 percent confidence interval: 0.33, 0.79). The authors suggest that while the association of influenza vaccination with a reduced risk of PCA is consistent with cohort studies of influenza vaccination and total mortality, further studies are needed to determine whether the observed association reflects protection or selection.
DOI: 10.1093/biomet/93.1.147
2006
Cited 131 times
On least-squares regression with censored data
The semiparametric accelerated failure time model relates the logarithm of the failure time linearly to the covariates while leaving the error distribution unspecified. The present paper describes simple and reliable inference procedures based on the least-squares principle for this model with right-censored data. The proposed estimator of the vector-valued regression parameter is an iterative solution to the Buckley–James estimating equation with a preliminary consistent estimator as the starting value. The estimator is shown to be consistent and asymptotically normal. A novel resampling procedure is developed for the estimation of the limiting covariance matrix. Extensions to marginal models for multivariate failure time data are considered. The performance of the new inference procedures is assessed through simulation studies. Illustrations with medical studies are provided.
DOI: 10.1214/aos/1176324320
1995
Cited 130 times
Semiparametric Analysis of General Additive-Multiplicative Hazard Models for Counting Processes
The additive-multiplicative hazard model specifies that the hazard function for the counting process associated with a multidimensional covariate process $Z = (W^T, X^T)^T$ takes the form of $\lambda(t\mid Z) = g\{\beta^T_0 W(t)\} + \lambda_0(t)h\{\gamma^T_0X(t)\}$, where $\theta_0 = (\beta^T_0, \gamma^T_0)^T$ is a vector of unknown regression parameters, $g$ and $h$ are known link functions and $\lambda_0$ is an unspecified "baseline hazard function." In this paper, we develop a class of simple estimating functions for $\theta_0$, which contains the partial likelihood score function in the special case of proportional hazards models. The resulting estimators are shown to be consistent and asymptotically normal under appropriate regularity conditions. Weak convergence of the Aalen-Breslow type estimators for the cumulative baseline hazard function $\Lambda_0(t) = \int^t_0\lambda_0(u) du$ is also established. Furthermore, we construct adaptive estimators for $\theta_0$ and $\Lambda_0$ that achieve the (semiparametric) information bounds. Finally, a real example is provided along with some simulation results.
DOI: 10.1182/blood.v89.10.3880
1997
Cited 125 times
Cyclosporine or Cyclosporine Plus Methylprednisolone for Prophylaxis of Graft-Versus-Host Disease: A Prospective, Randomized Trial
Abstract Patients with a lymphohematopoietic malignancy considered to be at high risk for posttransplant relapse were enrolled in a study to compare the use of cyclosporine (CSP) as a single agent with a combination of methylprednisolone (MP) and CSP for graft-versus-host disease (GVHD) prophylaxis after marrow transplantation from an HLA-identical sibling donor. Sixty patients were randomized to receive CSP only and 62 were randomized to receive CSP plus MP. Daily CSP was started on day −1 (5 mg/kg/d intravenously) and administered at gradually reduced doses until day 180. MP was started on day 7 at 0.5 mg/kg/d, increased to 1.0 mg/kg/d on day 15, started on a taper schedule on day 29, and discontinued on day 72. All 104 evaluable patients (surviving ≥28 days) had sustained engraftment. The incidence rates of grades II-IV acute GVHD were 73% and 60% for patients receiving CSP and CSP plus MP, respectively (P = .01). No difference was seen for grades III-IV GVHD. However, chronic GVHD occurred somewhat more frequently in patients receiving CSP plus MP (44%) than in patients receiving only CSP (21%; P = .02). The incidence of de novo chronic GVHD was marginally higher in patients receiving CSP plus MP (P = .08). No significant differences in the risk of infections were observed. There was a suggestion that the risk of relapse was lower in patients receiving CSP plus MP (P = .10) and, although the overall survival in the two groups was not different (P = .44), there was a slight advantage in favor of CSP plus MP-treated patients for relapse-free survival (P = .07). These results suggest that prophylactic MP, when combined with CSP, has only limited efficacy in acute GVHD prevention and may increase the probability of chronic GVHD.
DOI: 10.1198/016214504000000584
2004
Cited 125 times
Improving the Efficiency of Relative-Risk Estimation in Case-Cohort Studies
The case-cohort design is a common means of reducing the cost of covariate measurements in large failure-time studies. Under this design, complete covariate data are collected only on the cases (i. e., the subjects whose failure times are uncensored) and on a subcohort randomly selected from the whole cohort. In many applications, certain covariates are readily measured on all cohort members, and surrogate measurements of the expensive covariates also may be available. The existing relative-risk estimators for the case-cohort design disregard the covariate data collected outside the case-cohort sample and thus incur loss of efficiency. To make better use of the available data, we develop a class of weighted estimators with general time-varying weights that are related to a class of estimators proposed by Robins, Rotnitzky, and Zhao. The estimators are shown to be consistent and asymptotically normal under appropriate conditions. We identify the estimator within this class that maximizes efficiency, numerical studies demonstrate that the efficiency gains of the proposed estimator over the existing ones can be substantial in realistic settings. We also study the estimation of the cumulative hazard function. An illustration with data taken from Wilms' tumor studies is provided.
DOI: 10.1093/biomet/80.3.573
1993
Cited 124 times
A simple nonparametric estimator of the bivariate survival function under univariate censoring
In the presence of univariate censoring, the bivariate survival function of paired failure times can be expressed as the ratio of the bivariate at-risk probability to the survival function of the censoring time. The use of this natural representation yields a very simple nonparametric estimator for the bivariate survival curve. The estimator is strongly consistent and, upon proper normalization, converges weakly to a zero-mean Gaussian process with an easily estimated covariance function. Numerical studies demonstrate that both the survival curve estimator and its covariance function estimator perform markedly well for practical sample sizes. Applications to the correlation problem and to the interval estimation of the difference in median survival times are also studied.
DOI: 10.1093/infdis/175.2.237
1997
Cited 121 times
Perspective: Validating Surrogate Markers--Are We Being Naive?
Because of the difficulties in conducting studies of clinical efficacy of new therapies for human immunodeficiency virus infection and other diseases, there is increasing interest in using measures of biologic activity as surrogates for clinical end points. A widely used criterion for evaluating whether such measures are reliable as surrogates requires that the putative surrogate fully captures the "net effect"-the effect aggregated over all mechanisms of action-of the treatment on the clinical end point. The variety of proposed metrics for evaluating the degree to which this criterion is met are subject to misinterpretation because of the multiplicity of mechanisms by which drugs operate. Without detailed understanding of these mechanisms, metrics of "surrogacy" are not directly interpretable. Even when all of the mechanisms are understood, these metrics are associated with a high degree of uncertainty unless either treatment effects are large in moderate-size studies or sample sizes are large in studies of moderately effective treatments.
DOI: 10.1002/gepi.20377
2008
Cited 117 times
Proper analysis of secondary phenotype data in case‐control association studies
Abstract Case‐control association studies often collect extensive information on secondary phenotypes, which are quantitative or qualitative traits other than the case‐control status. Exploring secondary phenotypes can yield valuable insights into biological pathways and identify genetic variants influencing phenotypes of direct interest. All publications on secondary phenotypes have used standard statistical methods, such as least‐squares regression for quantitative traits. Because of unequal selection probabilities between cases and controls, the case‐control sample is not a random sample from the general population. As a result, standard statistical analysis of secondary phenotype data can be extremely misleading. Although one may avoid the sampling bias by analyzing cases and controls separately or by including the case‐control status as a covariate in the model, the associations between a secondary phenotype and a genetic variant in the case and control groups can be quite different from the association in the general population. In this article, we present novel statistical methods that properly reflect the case‐control sampling in the analysis of secondary phenotype data. The new methods provide unbiased estimation of genetic effects and accurate control of false‐positive rates while maximizing statistical power. We demonstrate the pitfalls of the standard methods and the advantages of the new methods both analytically and numerically. The relevant software is available at our website. Genet. Epidemiol . 2009. © 2008 Wiley‐Liss, Inc.
DOI: 10.1002/gepi.20098
2005
Cited 117 times
Maximum likelihood estimation of haplotype effects and haplotype-environment interactions in association studies
The associations between haplotypes and disease phenotypes offer valuable clues about the genetic determinants of complex diseases. It is highly challenging to make statistical inferences about these associations because of the unknown gametic phase in genotype data. We describe a general likelihood-based approach to inferring haplotype-disease associations in studies of unrelated individuals. We consider all possible phenotypes (including disease indicator, quantitative trait, and potentially censored age at onset of disease) and all commonly used study designs (including cross-sectional, case-control, cohort, nested case-control, and case-cohort). The effects of haplotypes on phenotype are characterized by appropriate regression models, which allow various genetic mechanisms and gene-environment interactions. We present the likelihood functions for all study designs and disease phenotypes under Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We provide simple and efficient numerical algorithms to calculate the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. Extensive simulation studies demonstrate that the proposed methods perform well in realistic situations. An application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype-smoking interactions in the development of breast cancer. Genet. Epidemiol. 2005. © 2005 Wiley-Liss, Inc.
DOI: 10.1016/j.ajhg.2007.11.004
2008
Cited 89 times
Simple and Efficient Analysis of Disease Association with Missing Genotype Data
Missing genotype data arise in association studies when the single-nucleotide polymorphisms (SNPs) on the genotying platform are not assayed successfully, when the SNPs of interest are not on the platform, or when total sequence variation is determined only on a small fraction of individuals. We present a simple and flexible likelihood framework to study SNP-disease associations with such missing genotype data. Our likelihood makes full use of all available data in case-control studies and reference panels (e.g., the HapMap), and it properly accounts for the biased nature of the case-control sampling as well as the uncertainty in inferring unknown variants. The corresponding maximum-likelihood estimators for genetic effects and gene-environment interactions are unbiased and statistically efficient. We developed fast and stable numerical algorithms to calculate the maximum-likelihood estimators and their variances, and we implemented these algorithms in a freely available computer program. Simulation studies demonstrated that the new approach is more powerful than existing methods while providing accurate control of the type I error. An application to a case-control study on rheumatoid arthritis revealed several loci that deserve further investigations. Missing genotype data arise in association studies when the single-nucleotide polymorphisms (SNPs) on the genotying platform are not assayed successfully, when the SNPs of interest are not on the platform, or when total sequence variation is determined only on a small fraction of individuals. We present a simple and flexible likelihood framework to study SNP-disease associations with such missing genotype data. Our likelihood makes full use of all available data in case-control studies and reference panels (e.g., the HapMap), and it properly accounts for the biased nature of the case-control sampling as well as the uncertainty in inferring unknown variants. The corresponding maximum-likelihood estimators for genetic effects and gene-environment interactions are unbiased and statistically efficient. We developed fast and stable numerical algorithms to calculate the maximum-likelihood estimators and their variances, and we implemented these algorithms in a freely available computer program. Simulation studies demonstrated that the new approach is more powerful than existing methods while providing accurate control of the type I error. An application to a case-control study on rheumatoid arthritis revealed several loci that deserve further investigations.
DOI: 10.1038/mp.2015.50
2015
Cited 64 times
The association between lower educational attainment and depression owing to shared genetic effects? Results in ~25 000 subjects
An association between lower educational attainment (EA) and an increased risk for depression has been confirmed in various western countries. This study examines whether pleiotropic genetic effects contribute to this association. Therefore, data were analyzed from a total of 9662 major depressive disorder (MDD) cases and 14,949 controls (with no lifetime MDD diagnosis) from the Psychiatric Genomics Consortium with additional Dutch and Estonian data. The association of EA and MDD was assessed with logistic regression in 15,138 individuals indicating a significantly negative association in our sample with an odds ratio for MDD 0.78 (0.75-0.82) per standard deviation increase in EA. With data of 884,105 autosomal common single-nucleotide polymorphisms (SNPs), three methods were applied to test for pleiotropy between MDD and EA: (i) genetic profile risk scores (GPRS) derived from training data for EA (independent meta-analysis on ~120,000 subjects) and MDD (using a 10-fold leave-one-out procedure in the current sample), (ii) bivariate genomic-relationship-matrix restricted maximum likelihood (GREML) and (iii) SNP effect concordance analysis (SECA). With these methods, we found (i) that the EA-GPRS did not predict MDD status, and MDD-GPRS did not predict EA, (ii) a weak negative genetic correlation with bivariate GREML analyses, but this correlation was not consistently significant, (iii) no evidence for concordance of MDD and EA SNP effects with SECA analysis. To conclude, our study confirms an association of lower EA and MDD risk, but this association was not because of measurable pleiotropic genetic effects, which suggests that environmental factors could be involved, for example, socioeconomic status.
DOI: 10.1016/j.ajhg.2013.06.011
2013
Cited 64 times
Meta-analysis of Gene-Level Associations for Rare Variants Based on Single-Variant Statistics
Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available.
DOI: 10.1016/j.ajhg.2019.09.001
2019
Cited 47 times
Adipose Tissue Gene Expression Associations Reveal Hundreds of Candidate Genes for Cardiometabolic Traits
Genome-wide association studies (GWASs) have identified thousands of genetic loci associated with cardiometabolic traits including type 2 diabetes (T2D), lipid levels, body fat distribution, and adiposity, although most causal genes remain unknown. We used subcutaneous adipose tissue RNA-seq data from 434 Finnish men from the METSIM study to identify 9,687 primary and 2,785 secondary cis-expression quantitative trait loci (eQTL; <1 Mb from TSS, FDR < 1%). Compared to primary eQTL signals, secondary eQTL signals were located further from transcription start sites, had smaller effect sizes, and were less enriched in adipose tissue regulatory elements compared to primary signals. Among 2,843 cardiometabolic GWAS signals, 262 colocalized by LD and conditional analysis with 318 transcripts as primary and conditionally distinct secondary cis-eQTLs, including some across ancestries. Of cardiometabolic traits examined for adipose tissue eQTL colocalizations, waist-hip ratio (WHR) and circulating lipid traits had the highest percentage of colocalized eQTLs (15% and 14%, respectively). Among alleles associated with increased cardiometabolic GWAS risk, approximately half (53%) were associated with decreased gene expression level. Mediation analyses of colocalized genes and cardiometabolic traits within the 434 individuals provided further evidence that gene expression influences variant-trait associations. These results identify hundreds of candidate genes that may act in adipose tissue to influence cardiometabolic traits. Genome-wide association studies (GWASs) have identified thousands of genetic loci associated with cardiometabolic traits including type 2 diabetes (T2D), lipid levels, body fat distribution, and adiposity, although most causal genes remain unknown. We used subcutaneous adipose tissue RNA-seq data from 434 Finnish men from the METSIM study to identify 9,687 primary and 2,785 secondary cis-expression quantitative trait loci (eQTL; <1 Mb from TSS, FDR < 1%). Compared to primary eQTL signals, secondary eQTL signals were located further from transcription start sites, had smaller effect sizes, and were less enriched in adipose tissue regulatory elements compared to primary signals. Among 2,843 cardiometabolic GWAS signals, 262 colocalized by LD and conditional analysis with 318 transcripts as primary and conditionally distinct secondary cis-eQTLs, including some across ancestries. Of cardiometabolic traits examined for adipose tissue eQTL colocalizations, waist-hip ratio (WHR) and circulating lipid traits had the highest percentage of colocalized eQTLs (15% and 14%, respectively). Among alleles associated with increased cardiometabolic GWAS risk, approximately half (53%) were associated with decreased gene expression level. Mediation analyses of colocalized genes and cardiometabolic traits within the 434 individuals provided further evidence that gene expression influences variant-trait associations. These results identify hundreds of candidate genes that may act in adipose tissue to influence cardiometabolic traits.
DOI: 10.1001/archinte.159.7.686
1999
Cited 109 times
Leisure-Time Physical Activity and the Risk of Primary Cardiac Arrest
Because the risks of sudden cardiac death and myocardial infarction are transiently increased during acute bouts of high-intensity activity, it is an important question from the public health perspective whether regular participation in moderate-intensity activity confers overall protection from sudden cardiac death.We used data from a population-based case-control study to assess the associations of regular high-intensity and moderate-intensity leisure-time physical activity with primary cardiac arrest. Cases were patients with primary cardiac arrest, aged 25 to 74 years, attended by paramedics between 1988 and 1994 in King County, Washington (n = 333). Controls were randomly identified from the same community (n = 503), matched for age and sex. All case patients and controls were free of prior clinical heart disease, major comorbidity, and self-reported poor health. Spouses of case patients and controls were interviewed to assess participation in 15 high-intensity and 6 moderate-intensity physical activities during the prior year.Compared with subjects who performed none of the activities, the odds ratio for primary cardiac arrest from matched analyses was 0.34 (95% confidence interval, 0.13-0.89) among subjects who performed only gardening activities for more than 60 minutes per week; 0.27 (95% confidence interval, 0.11-0.67) among subjects who walked for exercise for more than 60 minutes per week; and 0.34 (95% confidence interval, 0.16-0.75) among subjects who engaged in any high-intensity activities, after adjustment for age, smoking, education, diabetes, hypertension, and health status.The results suggest that regular participation in moderate-intensity activities, such as walking and gardening, are associated with a reduced risk of PCA and support current exercise recommendations.
DOI: 10.1016/s0167-6296(98)00056-3
1999
Cited 106 times
On the use of survival analysis techniques to estimate medical care costs
Measurement of treatment costs is important in the evaluation of medical interventions. Accurate cost estimation is problematic, when cost records are incomplete. Methods from the survival analysis literature have been proposed for estimating costs using available data. In this article, we clarify assumptions necessary for validity of these techniques. We demonstrate how assumptions needed for valid survival analysis may be violated when these methods are applied to cost estimation. Our observations are confirmed through simulations and empirical data analysis. We conclude that survival analysis approaches are not generally appropriate for the analysis of medical costs and review several valid alternatives.
DOI: 10.1002/sim.1377
2003
Cited 104 times
Regression analysis of incomplete medical cost data
The accumulation of medical cost over time for each subject is an increasing stochastic process defined up to the instant of death. The stochastic structure of this process is complex. In most applications, the process can only be observed at a limited number of time points. Furthermore, the process is subject to right censoring so that it is unobservable after the censoring time. These special features of the medical cost data, especially the presence of death and censoring, pose major challenges in the construction of plausible statistical models and the development of the corresponding inference procedures. In this paper, we propose several classes of regression models which formulate the effects of possibly time-dependent covariates on the marginal mean of cost accumulation in the presence of death or on the conditional means of cost accumulation given specific survival patterns. We then develop estimating equations for these models by combining the approach of generalized estimating equations for longitudinal data with the inverse probability of censoring weighting technique. The resultant estimators are shown to be consistent and asymptotically normal with simple variance estimators. Simulation studies indicate that the proposed inference procedures behave well in practical situations. An application to data taken from a large cancer study reveals that the Medicare enrollees who are diagnosed with less aggressive ovarian cancer tend to accumulate medical cost at lower rates than those with more aggressive disease, but tend to have higher lifetime costs because they live longer.
DOI: 10.1111/j.0006-341x.2000.0971.x
2000
Cited 100 times
Survival Analysis in Clinical Trials: Past Developments and Future Directions
The field of survival analysis emerged in the 20th century and experienced tremendous growth during the latter half of the century. The developments in this field that have had the most profound impact on clinical trials are the Kaplan-Meier (1958, Journal of the American Statistical Association 53, 457-481) method for estimating the survival function, the log-rank statistic (Mantel, 1966, Cancer Chemotherapy Report 50, 163-170) for comparing two survival distributions, and the Cox (1972, Journal of the Royal Statistical Society, Series B 34, 187-220) proportional hazards model for quantifying the effects of covariates on the survival time. The counting-process martingale theory pioneered by Aalen (1975, Statistical inference for a family of counting processes, Ph.D. dissertation, University of California, Berkeley) provides a unified framework for studying the small- and large-sample properties of survival analysis statistics. Significant progress has been achieved and further developments are expected in many other areas, including the accelerated failure time model, multivariate failure time data, interval-censored data, dependent censoring, dynamic treatment regimes and causal inference, joint modeling of failure time and longitudinal data, and Baysian methods.
DOI: 10.1002/sim.789
2001
Cited 97 times
Incremental net benefit in randomized clinical trials
There are three approaches to health economic evaluation for comparing two therapies. These are (i) cost minimization, in which one assumes or observes no difference in effectiveness, (ii) incremental cost-effectiveness, and (iii) incremental net benefit. The latter can be expressed either in units of effectiveness or costs. When analysing data from a clinical trial, expressing incremental net benefit in units of cost allows the investigator to examine all three approaches in a single graph, complete with the corresponding statistical inferences. Furthermore, if costs and effectiveness are not censored, this can be achieved using common two-sample statistical procedures. The above will be illustrated using two examples, one with censoring and one without.
DOI: 10.1093/biomet/83.2.381
1996
Cited 96 times
Comparing two failure time distributions in the presence of dependent censoring
Journal Article Comparing two failure time distributions in the presence of dependent censoring Get access D. Y. LIN, D. Y. LIN 1Department of BiostatisticsBox 357232, University of Washington, Seattle, Washington 98195, USA. Search for other works by this author on: Oxford Academic Google Scholar J. M. ROBINS, J. M. ROBINS 2Department of Epidemiology, Harvard UniversityBoston, Massachusetts 02115, US. A. Search for other works by this author on: Oxford Academic Google Scholar L. J. WEI L. J. WEI 3Department of Biostatistics, Harvard UniversityBoston, Massachusetts 02115, US.A. Search for other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 83, Issue 2, June 1996, Pages 381–393, https://doi.org/10.1093/biomet/83.2.381 Published: 01 June 1996 Article history Received: 01 May 1994 Revision received: 01 July 1995 Published: 01 June 1996
DOI: 10.1080/01621459.1991.10475101
1991
Cited 96 times
Goodness-of-Fit Analysis for the Cox Regression Model Based on a Class of Parameter Estimators
Abstract In this article we propose a class of estimation functions for the vector of regression parameters in the Cox proportional hazards model with possibly time-dependent covariates by incorporating the weight functions commonly used in weighted log-rank tests into the partial likelihood score function. The resulting estimators behave much like the conventional maximum partial likelihood estimator in that they are consistent and asymptotically normal. When the Cox model is inappropriate, however, the estimators with different weight functions generally converge to nonidentical constant vectors. For example, the magnitude of the parameter estimator using the Kaplan–Meier survival estimator as the weight function will be stochastically larger than that of the maximum partial likelihood estimator if covariate effects diminish over time. Such facts motivate us to develop goodness-of-fit methods for the Cox regression model by comparing parameter estimators with different weight functions. Under the assumed model, the normalized difference between the maximum partial likelihood estimator and a weighted parameter estimator is shown to converge weakly to a multivariate normal with mean zero and with a covariance matrix for which a consistent estimator is proposed. The asymptotic properties of the weighted parameter estimators and those of the related goodness-of-fit tests under misspecified Cox models are also investigated. In particular, it is demonstrated that a goodness-of-fit test with a monotone weight function is consistent against monotone departures from the proportional hazards assumption. Versatile testing procedures with broad sensitivities can be developed based on simultaneous use of several weight functions. Three examples using real data are presented.
DOI: 10.1001/archinte.1997.00440330066007
1997
Cited 94 times
Duration of Estrogen Replacement Therapy in Relation to the Risk of Incident Myocardial Infarction in Postmenopausal Women
There is little information about whether an increasing duration of estrogen replacement therapy is associated with a declining risk for myocardial infarction in postmenopausal women.To conduct a population-based, case-control study among enrollees of the Group Health Cooperative (GHC) of Puget Sound, Seattle, Wash.Case subjects were all post-menopausal women who were enrolled in the GHC with an incident fatal or nonfatal myocardial infarction from July 1986 through December 1993. Control subjects were a stratified random sample of postmenopausal women who were enrolled in the GHC without myocardial infarction and matched to case subjects by age and calendar year. We reviewed the medical records of the 850 case subjects and 1974 control subjects and conducted telephone interviews with consenting survivors. Use of estrogen or estrogen and progestin was assessed using GHC's computerized pharmacy database.Among women who were currently using estrogen, a longer duration of use was inversely associated with a risk for myocardial infarction after adjustment for age, year of identification, diabetes mellitus, angina, and smoking. For categories of increasing duration of estrogen use (never, > 0-< 1.8 years, 1.8-< 4.2 years, 4.2-< 8.2 years, and > or = 8.2 years), the odds ratios for myocardial infarction were 1.00 (reference), 0.91, 0.70, 0.65, and 0.55 (for trend among the current users, P = .05). Among women who had used estrogen in the past, there was no evidence of decreasing risk with increasing duration of estrogen use.In this study, a long duration of hormone replacement therapy among women currently using estrogen was associated with a reduced risk for first myocardial infarction.
DOI: 10.1093/biomet/87.1.73
2000
Cited 91 times
Additive hazards regression for case-cohort studies
The case-cohort design is a common means of reducing cost in large epidemiological cohort studies. Under this design, covariates are measured only on the cases and a subcohort randomly selected from the entire cohort. In this paper, we demonstrate how to use the case-cohort data to estimate the regression parameter of the additive hazards model, which specifies that the conditional hazard function given a set of covariates is the sum of an arbitrary baseline hazard function and a regression function of the covariates. The proposed estimator is shown to be consistent and asymptotically normal with an easily estimated variance. The subcohort may be selected by independent Bernoulli sampling with arbitrary selection probabilities or by stratified simple random sampling. The efficiencies of various sampling schemes are investigated both analytically and by simulation. A real example is provided.
DOI: 10.1002/sim.4780120904
1993
Cited 83 times
Evaluating the role of CD4‐lymphocyte counts as surrogate endpoints in human immunodeficiency virus clinical trials
Abstract In human immunodeficiency virus clinical trials, the CD4‐lymphocyte count has been regarded as a promising surrogate endpoint for clinical efficacy measures such as time to opportunistic infection and survival time. In the present paper, we test this hypothesis according to a criterion proposed by Prentice. This criterion requires the surrogate variable to capture the entire effect of treatment on the clinical endpoint, and it is satisfied if the hazard rate of the clinical endpoint is not affected by treatment among patients with the same preceding history of the surrogate variable. We analyse data from two completed zidovudine trials using the Cox regression model with the CD4‐lymphocyte count as a time‐varying covariate. The results indicate that the CD4‐lymphocyte count captures part of the relationship between zidovudine and time to a first critical event but does not fulfil the Prentice criterion.
DOI: 10.1111/j.1541-0420.2008.01126.x
2009
Cited 77 times
Semiparametric Transformation Models with Random Effects for Joint Analysis of Recurrent and Terminal Events
Summary We propose a broad class of semiparametric transformation models with random effects for the joint analysis of recurrent events and a terminal event. The transformation models include proportional hazards/intensity and proportional odds models. We estimate the model parameters by the nonparametric maximum likelihood approach. The estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Simple and stable numerical algorithms are provided to calculate the parameter estimators and to estimate their variances. Extensive simulation studies demonstrate that the proposed inference procedures perform well in realistic settings. Applications to two HIV/AIDS studies are presented.
DOI: 10.1080/01621459.2013.842172
2014
Cited 56 times
Efficient Estimation of Semiparametric Transformation Models for Two-Phase Cohort Studies
Under two-phase cohort designs, such as case-cohort and nested case-control sampling, information on observed event times, event indicators, and inexpensive covariates is collected in the first phase, and the first-phase information is used to select subjects for measurements of expensive covariates in the second phase; inexpensive covariates are also used in the data analysis to control for confounding and to evaluate interactions. This paper provides efficient estimation of semiparametric transformation models for such designs, accommodating both discrete and continuous covariates and allowing inexpensive and expensive covariates to be correlated. The estimation is based on the maximization of a modified nonparametric likelihood function through a generalization of the expectation-maximization algorithm. The resulting estimators are shown to be consistent, asymptotically normal and asymptotically efficient with easily estimated variances. Simulation studies demonstrate that the asymptotic approximations are accurate in practical situations. Empirical data from Wilms' tumor studies and the Atherosclerosis Risk in Communities (ARIC) study are presented.
DOI: 10.1161/circgenetics.111.960096
2011
Cited 56 times
Association of Genetic Variants and Incident Coronary Heart Disease in Multiethnic Cohorts
Background— Genome-wide association studies identified several single nucleotide polymorphisms (SNP) associated with prevalent coronary heart disease (CHD), but less is known of associations with incident CHD. The association of 13 published CHD SNPs was examined in 5 ancestry groups of 4 large US prospective cohorts. Methods and Results— The analyses included incident coronary events over an average 9.1 to 15.7 follow-up person-years in up to 26 617 white individuals (6626 events), 8018 black individuals (914 events), 1903 Hispanic individuals (113 events), 3669 American Indian individuals (595 events), and 885 Asian/Pacific Islander individuals (66 events). We used Cox proportional hazards models (with additive mode of inheritance) adjusted for age, sex, and ancestry (as needed). Nine loci were statistically associated with incident CHD events in white participants: 9p21 (rs10757278; P =4.7×10 −41 ), 16q23.1 (rs2549513; P =0.0004), 6p24.1 (rs499818; P =0.0002), 2q36.3 (rs2943634; P =6.7×10 −6 ), MTHFD1L (rs6922269, P =5.1×10 −10 ), APOE (rs429358; P =2.7×10 −18 ), ZNF627 (rs4804611; P =5.0×10 −8 ), CXCL12 (rs501120; P =1.4×10 −6 ) and LPL (rs268; P =2.7×10 −17 ). The 9p21 region showed significant between-study heterogeneity, with larger effects in individuals age 55 years or younger and in women. Inclusion of coronary revascularization procedures among the incident CHD events introduced heterogeneity. The SNPs were not associated with CHD in black participants, and associations varied in other US minorities. Conclusions— Prospective analyses of white participants replicated several reported cross-sectional CHD-SNP associations.
DOI: 10.1007/s10549-011-1517-z
2011
Cited 54 times
Common genetic variation in adiponectin, leptin, and leptin receptor and association with breast cancer subtypes
Adipocytokines are produced by visceral fat, and levels may be associated with breast cancer risk. We investigated whether single nucleotide polymorphisms (SNPs) in adipocytokine genes adiponectin (ADIPOQ), leptin (LEP), and the leptin receptor (LEPR) were associated with basal-like or luminal A breast cancer subtypes. 104 candidate and tag SNPs were genotyped in 1776 of 2022 controls and 1972 (200 basal-like, 679 luminal A) of 2311 cases from the Carolina Breast Cancer Study (CBCS), a population-based case–control study of whites and African Americans. Breast cancer molecular subtypes were determined by immunohistochemistry. Genotype odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using unconditional logistic regression. Haplotype ORs and 95% CIs were estimated using Hapstat. Interactions with waist-hip ratio were evaluated using a multiplicative interaction term. Ancestry was estimated from 144 ancestry informative markers (AIMs), and included in models to control for population stratification. Candidate SNPs LEPR K109R (rs1137100) and LEPR Q223R (rs1137101) were positively associated with luminal A breast cancer, whereas ADIPOQ +45 T/G (rs2241766), ADIPOQ +276 G/T (rs1501299), and LEPR K656N (rs8129183) were not associated with either subtype. Few patterns were observed among tag SNPs, with the exception of 3 LEPR SNPs (rs17412175, rs9436746, and rs9436748) that were in moderate LD and inversely associated with basal-like breast cancer. However, no SNP associations were statistically significant after adjustment for multiple comparisons. Haplotypes in LEP and LEPR were associated with both basal-like and luminal A subtypes. There was no evidence of interaction with waist-hip ratio. Data suggest associations between LEPR candidate SNPs and luminal A breast cancer in the CBCS and LEPR intron 2 tag SNPs and basal-like breast cancer. Replication in additional studies where breast cancer subtypes have been defined is necessary to confirm these potential associations.
DOI: 10.1093/ije/dyv136
2015
Cited 53 times
New data and an old puzzle: the negative association between schizophrenia and rheumatoid arthritis
Background: A long-standing epidemiological puzzle is the reduced rate of rheumatoid arthritis (RA) in those with schizophrenia (SZ) and vice versa. Traditional epidemiological approaches to determine if this negative association is underpinned by genetic factors would test for reduced rates of one disorder in relatives of the other, but sufficiently powered data sets are difficult to achieve. The genomics era presents an alternative paradigm for investigating the genetic relationship between two uncommon disorders. Methods: We use genome-wide common single nucleotide polymorphism (SNP) data from independently collected SZ and RA case-control cohorts to estimate the SNP correlation between the disorders. We test a genotype X environment (GxE) hypothesis for SZ with environment defined as winter- vs summer-born. Results: We estimate a small but significant negative SNP-genetic correlation between SZ and RA (−0.046, s.e. 0.026, P = 0.036). The negative correlation was stronger for the SNP set attributed to coding or regulatory regions (−0.174, s.e. 0.071, P = 0.0075). Our analyses led us to hypothesize a gene-environment interaction for SZ in the form of immune challenge. We used month of birth as a proxy for environmental immune challenge and estimated the genetic correlation between winter-born and non-winter born SZ to be significantly less than 1 for coding/regulatory region SNPs (0.56, s.e. 0.14, P = 0.00090). Conclusions: Our results are consistent with epidemiological observations of a negative relationship between SZ and RA reflecting, at least in part, genetic factors. Results of the month of birth analysis are consistent with pleiotropic effects of genetic variants dependent on environmental context.
DOI: 10.1093/biomet/asx029
2017
Cited 43 times
Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data
Interval-censored multivariate failure time data arise when there are multiple types of failure or there is clustering of study subjects and each failure time is known only to lie in a certain interval. We investigate the effects of possibly time-dependent covariates on multivariate failure times by considering a broad class of semiparametric transformation models with random effects, and we study nonparametric maximum likelihood estimation under general interval-censoring schemes. We show that the proposed estimators for the finite-dimensional parameters are consistent and asymptotically normal, with a limiting covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we develop an EM algorithm that converges stably for arbitrary datasets. Finally, we assess the performance of the proposed methods in extensive simulation studies and illustrate their application using data derived from the Atherosclerosis Risk in Communities Study.
DOI: 10.1016/s0002-9149(01)01720-9
2001
Cited 88 times
Body mass index and the risk of recurrent coronary events following acute myocardial infarction
Although excess adiposity appears to increase the risk of coronary heart disease in the general population, its importance in patients with established coronary disease is less defined. We evaluated a population-based inception cohort of survivors to hospital discharge following first acute myocardial infarction (AMI) (n = 2,541) to assess the association between body mass index (BMI) and the risk of recurrent coronary events and to explore the mechanisms for this relation. Using Cox proportional-hazards regression, we assessed the risk of recurrent coronary events associated with levels of adiposity as defined by BMI and then investigated potential mechanisms through which adiposity conferred risk by examining how adjustment for diabetes mellitus, systemic hypertension, and dyslipidemia affected the association. Forty-one percent of the cohort were overweight (BMI 25 to 29.9), and 27.8% were obese (BMI ≥30). After adjustment for other risk factors, the risk of recurrent coronary events (n = 418) increased as BMI increased, especially among those who were obese. Using a BMI of 16 to 24.9 as the reference group, for mildly overweight patients (BMI 25 to 27.4), the relative risk (RR) was 0.93 (95% confidence interval [CI] 0.70 to 1.24); it was 1.16 for more severe overweight patients (BMI 27.5 to 29.9; 95% CI 0.87 to 1.55). For patients with class I obesity (BMI 30 to 34.9), the RR was 1.49 (95% CI 1.12 to 1.98), and for class II to III obesity (BMI ≥35), the RR was 1.80 (95% CI 1.30 to 2.48). We estimated that clinical measurements of diabetes, hypertension, and dyslipidemia explained approximately 43% of this risk. Thus, excess adiposity as measured by BMI was associated with an increased risk of recurrent coronary events following AMI, particularly among those who were obese.
DOI: 10.1001/archinte.161.14.1709
2001
Cited 88 times
Risk of Recurrent Coronary Events in Relation to Use and Recent Initiation of Postmenopausal Hormone Therapy
<h3>Background</h3> The finding from the Heart and Estrogen/Progestin Replacement Study (HERS) of increased coronary risk restricted to the first year after starting postmenopausal hormone therapy raises new questions about the role of hormone therapy in women with coronary heart disease. We assessed the risk of recurrent myocardial infarction or coronary heart disease death associated with the use and recent initiation of hormone therapy in women who survived a first myocardial infarction. <h3>Methods</h3> The setting for this population-based inception cohort study was Group Health Cooperative, a health maintenance organization. We studied 981 postmenopausal women who survived to hospital discharge after their first myocardial infarction between July 1, 1986, and December 31, 1996. We obtained information on hormone use from the Group Health Cooperative computerized pharmacy database and identified recurrent coronary events by medical record review. <h3>Results</h3> During median follow-up of 3.5 years, there were 186 recurrent coronary events. There was no difference in the risk of recurrent coronary events between current users of hormone therapy and other women (adjusted relative hazard [RH], 0.96; 95% confidence interval [CI], 0.62-1.50). Relative to the risk in women not currently using hormones, there was a suggestion of increased risk during the first 60 days after starting hormone therapy (RH, 2.16; 95% CI, 0.94-4.95) and reduced risk with current hormone use for longer than 1 year (RH, 0.76; 95% CI, 0.42-1.36). <h3>Conclusion</h3> These results are consistent with the findings from the HERS, suggesting a transitory increase in coronary risk after starting hormone therapy in women with established coronary heart disease and a decreased risk thereafter.
DOI: 10.1111/j.0006-341x.2003.00102.x
2003
Cited 77 times
Semiparametric Analysis of Recurrent Events Data in the Presence of Dependent Censoring
Dependent censoring occurs in longitudinal studies of recurrent events when the censoring time depends on the potentially unobserved recurrent event times. To perform regression analysis in this setting, we propose a semiparametric joint model that formulates the marginal distributions of the recurrent event process and dependent censoring time through scale-change models, while leaving the distributional form and dependence structure unspecified. We derive consistent and asymptotically normal estimators for the regression parameters. We also develop graphical and numerical methods for assessing the adequacy of the proposed model. The finite-sample behavior of the new inference procedures is evaluated through simulation studies. An application to recurrent hospitalization data taken from a study of intravenous drug users is provided.
DOI: 10.1111/j.0006-341x.2000.00775.x
2000
Cited 74 times
Proportional Means Regression for Censored Medical Costs
The semiparametric proportional means model specifies that the mean function for the cumulative medical cost over time conditional on a set of covariates is equal to an arbitrary baseline mean function multiplied by an exponential regression function. We demonstrate how to estimate the vector-valued regression parameter using possibly censored lifetime costs. The estimator is consistent and asymptotically normal with an easily estimable covariance matrix. Simulation studies show that the proposed methodology is appropriate for practical use. An application to AIDS is provided.
DOI: 10.1080/01621459.1990.10475319
1990
Cited 70 times
Statistical Inference with Data-Dependent Treatment Allocation Rules
Abstract In comparing two treatments with dichotomous responses, the randomized play-the-winner rule (Wei and Durham 1978) tends to assign more study subjects to the better treatment. For ethical reasons, this property is desirable for studies on human subjects. The randomized play-the-winner rule, which is a modification of Zelen's play-the-winner rule (Zelen 1969), is not deterministic and is less vulnerable to experimental bias than other adaptive designs. Recently, this design has been used in a trial to evaluate extracorporeal membrane oxygenation (ECMO) for treating newborns with respiratory failures at the University of Michigan. In this article, exact conditional, exact unconditional, and approximate confidence intervals for the treatment difference are studied from a frequentist point of view with the randomized play-the-winner rule. For small and moderate-sized trials, the exact unconditional procedures perform much better than the conditional ones because of the adaptive nature of the designs. Furthermore, we find that the design used for the trial should not be ignored in the analysis. The large-sample unconditional confidence intervals based on likelihood ratio statistics are not very sensitive to the design and perform well for moderate-sized trials. On the other hand, the intervals derived from the maximum likelihood estimates behave poorly under the adaptive design. All of the procedures are illustrated with the Michigan ECMO data.
DOI: 10.1198/016214506000001239
2007
Cited 66 times
Semiparametric Transformation Models With Random Effects for Recurrent Events
AbstractIn this article we study a class of semiparametric transformation models with random effects for the intensity function of the counting process. These models provide considerable flexibility in formulating the effects of possibly time-dependent covariates on the developments of recurrent events while accounting for the dependence of the recurrent event times within the same subject. We show that the nonparametric maximum likelihood estimators (NPMLEs) for the parameters of these models are consistent and asymptotically normal. The limiting covariance matrices for the estimators of the regression parameters achieve the semiparametric efficiency bounds and can be consistently estimated. The limiting covariance function for the estimator of any smooth functional of the cumulative intensity function also can be consistently estimated. We develop a simple and stable EM algorithm to compute the NPMLEs as well as the variance and covariance estimators. Simulation studies demonstrate that the proposed methods perform well in practical situations. Two medical studies are provided for illustrations.KEY WORDS: Box–Cox transformationCounting processEM algorithmIntensity functionNonparametric likelihoodSemiparametric efficiency
DOI: 10.1086/500812
2006
Cited 63 times
Evaluating Statistical Significance in Two-Stage Genomewide Association Studies
Genomewide association studies are being conducted to unravel the genetic etiology of complex human diseases. Because of cost constraints, these studies typically employ a two-stage design, under which a large panel of markers is examined in a subsample of subjects, and the most-promising markers are then examined in all subjects. This report describes a simple and efficient method to evaluate statistical significance for such genome studies. The proposed method, which properly accounts for the correlated nature of polymorphism data, provides accurate control of the overall false-positive rate and is substantially more powerful than the standard Bonferroni correction, especially when the markers are in strong linkage disequilibrium.
DOI: 10.1093/biostatistics/kxm034
2007
Cited 60 times
Efficient resampling methods for nonsmooth estimating functions
We propose a simple and general resampling strategy to estimate variances for parameter estimators derived from nonsmooth estimating functions. This approach applies to a wide variety of semiparametric and nonparametric problems in biostatistics. It does not require solving estimating equations and is thus much faster than the existing resampling procedures. Its usefulness is illustrated with heteroscedastic quantile regression and censored data rank regression. Numerical results based on simulated and real data are provided.
DOI: 10.1093/biomet/asv011
2015
Cited 41 times
On random-effects meta-analysis
Meta-analysis is widely used to compare and combine the results of multiple independent studies. To account for between-study heterogeneity, investigators often employ random-effects models, under which the effect sizes of interest are assumed to follow a normal distribution. It is common to estimate the mean effect size by a weighted linear combination of study-specific estimators, with the weight for each study being inversely proportional to the sum of the variance of the effect-size estimator and the estimated variance component of the random-effects distribution. Because the estimator of the variance component involved in the weights is random and correlated with study-specific effect-size estimators, the commonly adopted asymptotic normal approximation to the meta-analysis estimator is grossly inaccurate unless the number of studies is large. When individual participant data are available, one can also estimate the mean effect size by maximizing the joint likelihood. We establish the asymptotic properties of the meta-analysis estimator and the joint maximum likelihood estimator when the number of studies is either fixed or increases at a slower rate than the study sizes and we discover a surprising result: the former estimator is always at least as efficient as the latter. We also develop a novel resampling technique that improves the accuracy of statistical inference. We demonstrate the benefits of the proposed inference procedures using simulated and empirical data.
DOI: 10.1002/gepi.21759
2013
Cited 41 times
A General Framework for Association Tests With Multivariate Traits in Large‐Scale Genomics Studies
ABSTRACT Genetic association studies often collect data on multiple traits that are correlated. Discovery of genetic variants influencing multiple traits can lead to better understanding of the etiology of complex human diseases. Conventional univariate association tests may miss variants that have weak or moderate effects on individual traits. We propose several multivariate test statistics to complement univariate tests. Our framework covers both studies of unrelated individuals and family studies and allows any type/mixture of traits. We relate the marginal distributions of multivariate traits to genetic variants and covariates through generalized linear models without modeling the dependence among the traits or family members. We construct score‐type statistics, which are computationally fast and numerically stable even in the presence of covariates and which can be combined efficiently across studies with different designs and arbitrary patterns of missing data. We compare the power of the test statistics both theoretically and empirically. We provide a strategy to determine genome‐wide significance that properly accounts for the linkage disequilibrium (LD) of genetic variants. The application of the new methods to the meta‐analysis of five major cardiovascular cohort studies identifies a new locus ( HSCB ) that is pleiotropic for the four traits analyzed.
DOI: 10.3389/fgene.2019.00494
2019
Cited 31 times
Genetics of Chronic Kidney Disease Stages Across Ancestries: The PAGE Study
Chronic kidney disease (CKD) is common and disproportionally burdens United States ethnic minorities. Its genetic determinants may differ by disease severity and clinical stages. To uncover genetic factors associated CKD severity among high-risk ethnic groups, we performed genome-wide association studies (GWAS) in diverse populations within the Population Architecture using Genomics and Epidemiology (PAGE) study.We assembled multi-ethnic genome-wide imputed data on CKD non-overlapping cases [4,150 mild to moderate CKD, 1,105 end-stage kidney disease (ESKD)] and non-CKD controls for up to 41,041 PAGE participants (African Americans, Hispanics/Latinos, East Asian, Native Hawaiian, and American Indians). We implemented a generalized estimating equation approach for GWAS using ancestry combined data while adjusting for age, sex, principal components, study, and ethnicity.The GWAS identified a novel genome-wide associated locus for mild to moderate CKD nearby NMT2 (rs10906850, p = 3.7 × 10-8) that replicated in the United Kingdom Biobank white British (p = 0.008). Several variants at the APOL1 locus were associated with ESKD including the APOL1 G1 rs73885319 (p = 1.2 × 10-9). There was no overlap among associated loci for CKD and ESKD traits, even at the previously reported APOL1 locus (p = 0.76 for CKD). Several additional loci were associated with CKD or ESKD at p-values below the genome-wide threshold. These loci were often driven by variants more common in non-European ancestry.Our genetic study identified a novel association at NMT2 for CKD and showed for the first time strong associations of the APOL1 variants with ESKD across multi-ethnic populations. Our findings suggest differences in genetic effects across CKD severity and provide information for study design of genetic studies of CKD in diverse populations.
DOI: 10.1080/01621459.2019.1671200
2019
Cited 31 times
Optimal Designs of Two-Phase Studies
Abstract The two-phase design is a cost-effective sampling strategy to evaluate the effects of covariates on an outcome when certain covariates are too expensive to be measured on all study subjects. Under such a design, the outcome and inexpensive covariates are measured on all subjects in the first phase and the first-phase information is used to select subjects for measurements of expensive covariates in the second phase. Previous research on two-phase studies has focused largely on the inference procedures rather than the design aspects. We investigate the design efficiency of the two-phase study, as measured by the semiparametric efficiency bound for estimating the regression coefficients of expensive covariates. We consider general two-phase studies, where the outcome variable can be continuous, discrete, or censored, and the second-phase sampling can depend on the first-phase data in any manner. We develop optimal or approximately optimal two-phase designs, which can be substantially more efficient than the existing designs. We demonstrate the improvements of the new designs over the existing ones through extensive simulation studies and two large medical studies. Supplementary materials for this article are available online.
DOI: 10.1101/2021.10.25.21265304
2021
Cited 20 times
Effectiveness of Covid-19 Vaccines in the United States Over 9 Months: Surveillance Data from the State of North Carolina
ABSTRACT Background The duration of protection afforded by Covid-19 vaccines in the United States is unclear. Whether the recent increase of breakthrough infections was caused by waning immunity to the primary vaccination or by emergence of new variants that are more highly transmissible is also unknown. Methods We extracted data on vaccination histories and clinical outcomes (Covid-19, hospitalization, death) for the period from December 13, 2020 through September 8, 2021 by linking data from the North Carolina COVID-19 Surveillance System and COVID-19 Vaccine Management System covering ∼10.6 million residents statewide. We used the Kaplan-Meier method to estimate the effectiveness of the BNT162b2 (Pfizer–BioNTech), mRNA-1273 (Moderna), and Ad26.COV2.S (Janssen) vaccines in reducing the incidence of Covid-19 over successive post-vaccination time periods, producing separate estimates for individuals vaccinated during different calendar periods. In addition, we used Cox regression with time-dependent vaccination status and time-varying hazard ratios to estimate the effectiveness of the three vaccines in reducing the hazard rates or current risks of Covid-19, hospitalization, and death, as a function of time elapsed since the first dose. Results For the Pfizer two-dose regimen, vaccine effectiveness in reducing the current risk of Covid-19 ramps to a peak level of 94.9% (95% confidence interval [CI], 94.5 to 95.2) at 2 months (post the first dose) and drops to 70.1% (95% CI, 68.9 to 71.2) after 7 months; effectiveness in reducing the current risk of hospitalization ramps to a peak level of 96.4% (95% CI, 94.7 to 97.5) at 2 months and remains at 87.7% (95% CI, 84.3 to 90.4) at 7 months; effectiveness in reducing the current risk of death ramps to 95.9% (95% CI, 92.9 to 97.6) at 2 months and is maintained at 88.4% (95% CI, 83.0 to 92.1) at 7 months. For the Moderna two-dose regimen, vaccine effectiveness in reducing the current risk of Covid-19 ramps to a peak level of 96.0% (95% CI, 95.6 to 96.4) at 2 months and drops to 81.9% (95% CI, 81.0 to 82.7) after 7 months; effectiveness in reducing the current risk of hospitalization ramps to a peak level of 97.5% (95% CI, 96.3 to 98.3) at 2 months and remains at 92.3% (95% CI, 89.7 to 94.3) at 7 months; effectiveness in reducing the current risk of death ramps to 96.0% (95% CI, 91.9 to 98.0) at 3 months and remains at 93.7% (95% CI, 90.2 to 95.9) at 7 months. For the Janssen one-dose regimen, effectiveness in reducing the current risk of Covid-19 ramps to a peak level of 79.0% (95% CI, 77.1 to 80.7) at 1 month and drops to 64.3% (95% CI, 62.3 to 66.1) after 5 months; effectiveness in reducing the current risk of hospitalization ramps to a peak level of 89.8% (95% CI, 78.8 to 95.1) at 2 months and stays above 80% through 5 months; effectiveness in reducing the current risk of death ramps to 89.4% (95% CI, 52.3 to 97.6) at 3 months and stays above 80% through 5 months. For all three vaccines, the ramping and waning patterns are similar for individuals who were vaccinated at different dates, and across various demographic subgroups (age, sex, race/ethnicity, geographic region, county-level vaccination rate). Conclusions The two mRNA vaccines are remarkably effective and durable in reducing the risks of hospitalization and death. The Janssen vaccine also offers a high level of protection against hospitalization and death. The Moderna vaccine is significantly more durable than the Pfizer vaccine in reducing the risk of Covid-19. Waning vaccine effectiveness is caused primarily by declining immunity rather than emergence of new variants. It would be worthwhile to investigate the effectiveness of the Janssen vaccine as a two-dose regimen, with the second dose given approximately 1-2 months after the first dose.
DOI: 10.1016/0378-3758(94)00039-x
1995
Cited 68 times
Semiparametric inference for the accelerated life model with time-dependent covariates
The accelerated life model assumes that the failure time associated with a multi-dimensional covariate process is contracted or expanded relative to that of the zero-valued covariate process. In the present paper, the rate of contraction/expansion is formulated by a parametric function of the covariate process while the baseline failure time distribution is unspecified. Estimating functions for the vector of regression parameters are motivated by likelihood score functions and take the form of log rank statistics with time-dependent covariates. The resulting estimators are proven to be strongly consistent and asymptotically normal under suitable regularity conditions. Simple methods are derived for making inference about a subset of regression parameters while regarding others as nuisance quantities. Finite-sample properties of the estimation and testing procedures are investigated through Monte Carlo simulations. An illustration with the well-known Stanford heart transplant data is provided.
DOI: 10.17615/pt0g-y207
2002
Cited 63 times
Marginal regression models for recurrent and terminal events
A major complication in the analysis of recurrent event data from med- ical studies is the presence of death. We consider the marginal mean function for the cumulative number of recurrent events over time, acknowledging the fact that death precludes further recurrences. We specify that covariates have multiplicative effects on an arbitrary baseline mean function while leaving the stochastic structure of the recurrent event process completely unspecified. We then propose estimators for the regression parameters and the baseline mean function under this semipara- metric model. The asymptotic properties of these estimators are established. Joint inferences about recurrent events and death are also discussed. The finite-sample behavior of the proposed inference procedures is assessed through simulation stud- ies. An application to a well-known bladder tumor study is provided.
DOI: 10.1093/biomet/78.1.123
1991
Cited 54 times
Nonparametric sequential testing in clinical trials with incomplete multivariate observations
This paper addresses sequential testing in randomized clinical trials with multiple endpoints. Patients enter treatments serially and are subject to random loss to followup. The endpoints of interest may be time-to-event variables or other quantitative measurements. The proposed test statistic at a given look is a weighted sum of the linear rank statistics with respect to the marginal distributions of the multiple endpoints. The weights can be chosen to maximize asymptotic power against certain local alternatives. Stopping boundaries are obtained from the asymptotic joint distribution of the proposed test statistics calculated at different looks. This new approach preserves a single preset overall significance level and can lead to quicker termination of the trial than sequential procedures based on single endpoints. An example taken from an aids clinical trial is presented.
DOI: 10.1093/biostatistics/kxr017
2011
Cited 37 times
Checking semiparametric transformation models with censored data
Semiparametric transformation models provide a very general framework for studying the effects of (possibly time-dependent) covariates on survival time and recurrent event times. Assessing the adequacy of these models is an important task because model misspecification affects the validity of inference and the accuracy of prediction. In this paper, we introduce appropriate time-dependent residuals for these models and consider the cumulative sums of the residuals. Under the assumed model, the cumulative sum processes converge weakly to zero-mean Gaussian processes whose distributions can be approximated through Monte Carlo simulation. These results enable one to assess, both graphically and numerically, how unusual the observed residual patterns are in reference to their null distributions. The residual patterns can also be used to determine the nature of model misspecification. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. Three medical studies are provided for illustrations.
DOI: 10.1001/jamacardio.2017.0652
2017
Cited 31 times
Heterogeneity in Blood Pressure Transitions Over the Life Course
<h3>Importance</h3> Many studies have assessed racial/ethnic and sex disparities in the prevalence of elevated blood pressure (BP) from childhood to adulthood, yet few have examined differences in age-specific transitions between categories of BP over the life course in contemporary, multiracial/multiethnic populations. <h3>Objective</h3> To estimate age, racial/ethnic, and sex–specific annual net transition probabilities between categories of BP using Markov modeling of cross-sectional data from the National Health and Nutrition Examination Survey. <h3>Design, Setting, and Participants</h3> National probability sample (National Health and Nutrition Examination Survey in 2007-2008, 2009-2010, and 2011-2012) of 17 747 African American, white American, and Mexican American participants aged 8 to 80 years. The data were analyzed from September 2014 to November 2015. <h3>Main Outcomes and Measures</h3> Age-specific American Heart Association–defined BP categories. <h3>Results</h3> Three National Health and Nutrition Examination Survey cross-sectional samples were used to characterize the ages at which self-reported African American (n = 4973), white American (n = 8886), and Mexican American (n = 3888) populations transitioned between ideal BP, prehypertension, and hypertension across the life course. At age 8 years, disparities in the prevalence of ideal BP were observed, with the prevalence being lower among boys (86.6%-88.8%) compared with girls (93.0%-96.3%). From ages 8 to 30 years, annual net transition probabilities from ideal to prehypertension among male individuals were more than 2 times the net transition probabilities of their female counterparts. The largest net transition probabilities for ages 8 to 30 years occurred in African American young men, among whom a net 2.9% (95% CI, 2.3%-3.4%) of those with ideal BP transitioned to prehypertension 1 year later. Mexican American young women aged 8 to 30 years experienced the lowest ideal to prehypertension net transition probabilities (0.6%; 95% CI, 0.3%-0.8%). After age 40 years, ideal to prehypertension net transition probabilities stabilized or decreased (range, 3.0%-4.5%) for men, whereas net transition probabilities for women increased rapidly (range, 2.6%-13.0%). Mexican American women exhibited the largest ideal to prehypertension net transition probabilities after age 60 years. The largest prehypertension to hypertension net transition probabilities occurred at young ages in boys of white race/ethnicity and African Americans, approximately age 8 years and age 25 years, respectively, while net transition probabilities for white women and Mexican Americans increased over the life course. <h3>Conclusions and Relevance</h3> Heterogeneity in net transition probabilities from ideal BP emerge during childhood, with associated rapid declines in ideal BP observed in boys and African Americans, thus introducing disparities. Primordial prevention beginning in childhood and into early adulthood is necessary to preempt the development of prehypertension and hypertension, as well as associated racial/ethnic and sex disparities.
DOI: 10.1093/biostatistics/kxv050
2015
Cited 28 times
Semiparametric regression for the weighted composite endpoint of recurrent and terminal events
Recurrent event data are commonly encountered in clinical and epidemiological studies. A major complication arises when recurrent events are terminated by death. To assess the overall effects of covariates on the two types of events, we define a weighted composite endpoint as the cumulative number of recurrent and terminal events properly weighted by the relative severity of each event. We propose a semiparametric proportional rates model which specifies that the (possibly time-varying) covariates have multiplicative effects on the rate function of the weighted composite endpoint while leaving the form of the rate function and the dependence among recurrent and terminal events completely unspecified. We construct appropriate estimators for the regression parameters and the cumulative frequency function. We show that the estimators are consistent and asymptotically normal with variances that can be consistently estimated. We also develop graphical and numerical procedures for checking the adequacy of the model. We then demonstrate the usefulness of the proposed methods in simulation studies. Finally, we provide an application to a major cardiovascular clinical trial.
DOI: 10.1093/cid/ciab226
2021
Cited 18 times
Evaluating the Long-term Efficacy of Coronavirus Disease 2019 (COVID-19) Vaccines
Abstract Large-scale deployment of safe and durably effective vaccines can curtail the coronavirus disease-2019 (COVID-19) pandemic. However, the high vaccine efficacy (VE) reported by ongoing phase 3 placebo-controlled clinical trials is based on a median follow-up time of only about 2 months, and thus does not pertain to long-term efficacy. To evaluate the duration of protection while allowing trial participants timely access to efficacious vaccine, investigators can sequentially cross participants over from the placebo arm to the vaccine arm. Here, we show how to estimate potentially time-varying placebo-controlled VE in this type of staggered vaccination of participants. In addition, we compare the performance of blinded and unblinded crossover designs in estimating long-term VE.
DOI: 10.1016/j.ebiom.2020.103157
2021
Cited 17 times
Whole genome sequence analyses of eGFR in 23,732 people representing multiple ancestries in the NHLBI trans-omics for precision medicine (TOPMed) consortium
<h2>Abstract</h2><h3>Background</h3> Genetic factors that influence kidney traits have been understudied for low frequency and ancestry-specific variants. <h3>Methods</h3> We combined whole genome sequencing (WGS) data from 23,732 participants from 10 NHLBI Trans-Omics for Precision Medicine (TOPMed) Program multi-ethnic studies to identify novel loci for estimated glomerular filtration rate (eGFR). Participants included European, African, East Asian, and Hispanic ancestries. We applied linear mixed models using a genetic relationship matrix estimated from the WGS data and adjusted for age, sex, study, and ethnicity. <h3>Findings</h3> When testing single variants, we identified three novel loci driven by low frequency variants more commonly observed in non-European ancestry (<i>PRKAA2</i>, rs180996919, minor allele frequency [MAF] 0.04%, <i>P</i> = 6.1 × 10<sup>−11</sup>; <i>METTL8</i>, rs116951054, MAF 0.09%, <i>P</i> = 4.5 × 10<sup>−9</sup>; and <i>MATK</i>, rs539182790, MAF 0.05%, <i>P</i> = 3.4 × 10<sup>−9</sup>). We also replicated two known loci for common variants (rs2461702, MAF=0.49, <i>P</i> = 1.2 × 10<sup>−9</sup>, nearest gene <i>GATM</i>, and rs71147340, MAF=0.34, <i>P</i> = 3.3 × 10<sup>−9</sup>, <i>CDK12</i>). Testing aggregated variants within a gene identified the <i>MAF</i> gene. A statistical approach based on local ancestry helped to identify replication samples for ancestry-specific variants. <h3>Interpretation</h3> This study highlights challenges in studying variants influencing kidney traits that are low frequency in populations and more common in non-European ancestry.
DOI: 10.2307/2532778
1994
Cited 54 times
Regression Analysis of Multivariate Grouped Survival Data
Multivariate failure time data arise when each study subject may experience several types of event or when there are clusterings of observational units such that failure times within the same cluster are correlated. The failure times are often subject to interval grouping or have truly discrete measurements. In this paper, the marginal distribution for each discrete failure time variable is formulated by a grouped-data version of the proportional hazards model while the dependence structure is unspecified. Generalized estimating equations in the spirit of Liang and Zeger (1986, Biometrika 73, 13-22) are proposed to estimate the regression parameters and survival probabilities. The resulting estimators are consistent and asymptotically normal. Robust estimators for the limiting covariance matrices are constructed. Simulation studies demonstrate that the asymptotic approximations are adequate for practical use and that ignoring the intracluster dependence in the variance-covariance estimation would lead to invalid statistical inference. A psychological experiment is provided for illustration.
DOI: 10.1002/hep.1840220311
1995
Cited 54 times
A randomized, double-blind, placebo-controlled trial of ursodeoxycholic acid in primary biliary cirrhosis
One hundred fifty-one patients with primary biliary cirrhosis (PBC) grouped into four strata based on entry serum bilirubin (<2 mg/dL vs. 2 mg/dL or greater) and liver histology (stages I, II vs. stages III, IV–Ludwig criteria) were randomized within each stratum to ursodiol or placebo given in a single dose of 10 to 12 mg/kg at bedtime for 2 years. Placebo- (n = 74) and ursodiol-treated (n = 77) patients were well matched at baseline for demographic and prognostic factors. Ursodiol induced major improvements in biochemical tests of the liver in strata 1 and 2 (entry bilirubin <2), but had less effect on laboratory tests in patients with entry serum bilirubin of ⩾2 (strata 3 and 4). Histology was favorably affected by ursodiol in patients in strata 1 and 2 but not in strata 3 and 4. Ursodiol enrichment in fasting bile obtained at the conclusion of the trial was approximately 40% and comparable in all strata. Thus, differences in ursodiol enrichment of the bile acid pool do not explain better responses of laboratory tests and histology found in patients with less advanced PBC. Patients treated with ursodiol tended to develop a treatment failure less frequently than those who received placebo, particularly in strata 1 and 2 (ursodiol 42%, placebo 60%, P = .078). Development of severe symptoms (fatigue/pruritus) and doubling of serum bilirubin were reduced significantly in ursodiol-treated patients. Major complications of liver disease, progression to liver transplantation or death, occurred in 10.5% and 76.6%, respectively, in patients who had an entry serum bilirubin of <2 or ⩾2 mg/dL. The incidence of these complications was comparable in ursodiol- and placebo-treated patients. Treatment failure occurred sooner in placebo than in ursodiol-treated patients in strata 1 and 2 but at the same rate in similarly treated patients in strata 3 and 4. Patients with advanced disease are unlikely to benefit from ursodiol. Trials longer than 2 years will likely be needed to determine whether ursodiol reduces major complications of liver disease in patients with milder disease. (Hepatology 1995;22:759–766.)