Assistant
Professor of Statistics, Stanford University
Snail
mail: 390
Serra Mall, Department of Statistics, Stanford University, Stanford, CA 94305.
Email: [email protected]
Fax: 650-725-8977
My
research is data driven, with the data coming primarily from current biological
applications. Today's data sets are rich in structure and high in
dimension, motivating new statistical models, as well as new perspectives on
classic statistical concepts. I am currently focusing on the detection
of genomic variation from high-density SNP chips and next generation sequencing
experiments. These applications motivate new methods in change-point
detection, scan statistics, and model and variable selection.
Siegmund, D., Yakir, B. and Zhang, N.R., 2011, The false discovery rate for scan statistics, Biometrika, in press.
Zhang, N.R. and Siegmund, D., 2011, Model Selection for High Dimensional, Multi-sequence Change-point Problems, Statistica Sinica, in press.
Muralidharan, O., Natsoulis, G., Bell, J., Newburger, D., Xu, H., Keta, I., Ji, H. and Zhang, N., 2011, A Cross-Sample Statistical Model for SNP Detection in Short-Read Sequencing Data, Nucleic Acids Research, in press.
Efron, B. and Zhang, N.R., 2011, False Discovery Rates and Copy Number Variation, Biometrika, 98, 251-271.
Download pdf, Supplementary Materials.
Siegmund, D.O., Yakir, B. and Zhang, N.R., 2011, Detecting simultaneous variant intervals in aligned sequences. Annals of Applied Statistics, 5, 645-668.
Chan, H.P.*, Zhang, N.R.*, and Chen, Louis H.S., 2010,
Importance sampling of word patterns in DNA and protein sequences. Journal of Computational Biology, 17, 1697-1709.
Chen, H., Xing, H. and Zhang, N.R., 2011, Stochastic segmentation models for allele-specific copy number estimation with SNP-array data, PLoS Computational Biology, 7, e1001060.
Download pdf, software, user guide, example code.
Bickel, P., Boley, N., Brown, B., Huang, H. and Zhang, N.R., 2010,
Non-parametric methods for genomic inference. Annals
of Applied Statistics, 4,
1660-1697.
Li, F and Zhang, NR, 2010, Bayesian Variable Selection in
Structured High-Dimensional Covariate Spaces with Applications in Genomics. JASA Theory and Methods, 105, 1202-1214.
Siegmund, D.O., Yakir, B. and Zhang, N.R., 2010, Tail approximations for maxima of random fields by likelihood ratio transformations. Sequential Analysis, 29, 245 - 262.
Zhang, N.R., Siegmund, D.O., Ji, H., and Li, J. 2010, Detecting simultaneous change-points in multiple sequences. Biometrika, 97, 631-645.
Download pdf, supplementary, software.
Zhang,
N.R., 2010, DNA copy number profiling in normal and tumor genomes. Frontiers
in Computational and Systems Biology, ed. Jianfeng Feng, Wenjiang Fu and
Fengzhu Sun, pp. 259-281, Springer-Verlag: London.
Zhang,
NR, Senbabaoglu, Y and Li, J, 2010, Joint Estimation of DNA Copy Number from
Multiple Platforms. Bioinformatics, 26, 153-160.
Chan, HP, Tu, IP and Zhang, NR, 2008,
Boundary Crossing Probability Computations in the Analysis of Scan Statistics,
in Scan Statistics: Methods and
Applications, ed. Glaz, J. Pozdnyakov, V. and Wallenstein, S.,
89-105 (Boston: Birkhauser).
Lai, TL, Xing, H and Zhang, NR, 2008,
Stochastic segmentation models for array-based comparative genomic
hybridization data analysis. Biostatistics 9, 290-307.
Zhang, NR,
Wildermuth, MC, and Speed, TP, 2008, Transcription factor binding site
prediction with multivariate gene expression data. Annals of Applied
Statistics 2, 332-365.
Download pdf, software (begin by reading
file analysis_README).
The Encode Consortium, 2007, Identification and analysis of functional elements in 1% of the human
genome by the ENCODE pilot project. Nature 447, 799-816.
Chan, HP and Zhang, NR, 2007, Scan
statistics with weighted observations. 2007, JASA Theory and Methods 102, 595-602.
Download pdf, Matlab code for
analysis in paper.
Zhang, NR and Siegmund, DO, 2007, A
Modified Bayes Information Criterion with Applications to the Analysis of
Comparative Genomic Hybridization Data. Biometrics 63, 22-32.
A Flexible Approach for Targeted Human Genome Resequencing
and Variant Discovery. (with Georges Natsoulis et al.)
Change-point
model on non-homogeneous Poisson processes with application in copy number
profiling by next-generation DNA sequencing. (with Jeremy Shen.)
Detecting mutations in
mixed sample sequencing data using empirical Bayes. (with Omkar Muralidharan et al.)
Multiple hypothesis
testing, adjusting for latent variables. (with
Yunting Sun and Art Owen.)
Sloan Fellowship (2011)
New World Silver Medal for Best PhD Thesis
in the Mathematical Sciences (2007)
NSF DMS Grant 0906394 “Change-point
Problems in Genomic Profiling” (2009)
NSF DMS Grant 1043204 “Statistical
Methods for Threat Detection” (2010)
NIH R01 HG006137-01 “Statistical
Models for Genome Sequencing and Association” (2011)
Statistics 191 Applied statistics.
Statistics 203 Introduction to regression models and analysis of variance.
Statistics 205 Nonparametric statistics.
Statistics 215 Stochastic processes in Biology.
Statistics 345 Special topics course on computational biology. (Spring 2008)
Statistics 345/Genetics 245 Computational algorithms for statistical genetics. (Spring 2009)
Statistics 366 Statistical Models in Biology