Six courses are required of trainees: three statistics/computing courses and three biology courses.

Neuroscience-focused trainees will fulfill the biology requirement using neuroscience courses. Genomics-focused trainees will fulfill the biology requirement using genomics courses.

Trainees will be expected to participate in journal clubs & seminars and also receive training in reproducible research and team science.

**One course in mathematical statistics:**

All students are required to take a course in mathematical statistics. They can choose between the less technical Statistics 509 and the more technical option of Statistics 512:

STAT 509 (Introduction to Mathematical Statistics) Examines methods, tools, and theory of mathematical

statistics. Covers probability densities, transformations, moment-generating functions, conditional expectation.

Additional topics include Bayesian analysis with conjugate priors, hypothesis tests, the Neyman-Pearson

Lemma, likelihood ratio tests, confidence intervals, maximum likelihood estimation, Central Limit Theorem,

Slutsky's Theorem, and the delta method.

STAT 512 (Statistical Inference) Random variables, transformations, conditional expectation, moment generating

functions, convergence, limit theorems, estimation, Cramer-Rao lower bound, maximum likelihood

estimation, sufficiency, ancillarity, completeness, Rao-Blackwell theorem, hypothesis testing: NeymanPearson

lemma, monotone likelihood ratio, likelihood-ratio tests, large-sample theory.

**One course in statistical machine learning:**

In addition, each student is required to take a course in statistical machine learning. A student can choose between a machine learning course under development by the Department of Biostatistics and a more technical option cross-listed between Statistics and CSE.

BIOST 546 (Machine Learning for Biomedical and Public Health Big Data)

CSE 546 / STAT 535 (Foundational Machine Learning) Methods for identifying valid, novel, useful, and

understandable patterns in data. Induction of predictive models from data: classification, regression, and

probability estimation. Discovery of clusters and association rules.

**One course in data science:**

Each trainee is required to take a course on "data science". A student can choose between a data science course under development by the Biostaistics Department or a data visualization or data management course taught in CSE.

BIOST 544 Introduction to Biomedical Data Science (formerly BIOST 578 B, Introduction to Data Science)

CSE 512 (Data Visualization) Covers techniques and algorithms for creating effective visualizations based on

principles from graphic design, visual art, perceptual psychology, and cognitive science. Topics include data

and image models; visual encoding; graphical perception; color; animation; interaction techniques; graph

layout; and automated design.

CSE 544 (Data Management) Data models and query languages (SQL, datalog, OQL). Relational databases,

enforcement of integrity constraints. Object-oriented databases and object-relational databases. Principles

of data storage and indexing. Query-execution methods and query optimization algorithms. Static analysis of

queries and rewriting of queries using views. Data integration, data mining, and principles of transaction

processing.

STAT 548 / CSE 547 (Machine Learning for Big Data)

Covers machine learning and statistical techniques for analyzing datasets of massive size and dimensionality. Representations include regularized linear models, graphical models, matrix factorization, sparsity, clustering, and latent factor models. Algorithms include sketching, random projections, hashing, fast nearest-neighbors, large-scale online learning, and parallel learning (Map-Reduce, GraphLab).

Each trainee will take either three quarter-long courses in Neuroscience (one computational and two basic neuroscience) or three quarters worth of courses in Genome Sciences (one quarter-long computational course and four half-quarter courses in genetics, genomics or proteomics).

**Genomics coursework:**

All trainees specializing in genomics will take the following course:

GENOME 541 (Introduction to Computational Molecular Biology) This course provides a survey of topics

within the field of computational molecular biology. The course is divided into five two-week blocks, each

devoted to a single topic. The topics include areas such as genomics, metagenomics, epigenomics,

phylogenetics, proteomics, and protein structure. The course involves weekly, substantial programming

assignments.

Furthermore, all trainees specializing in genomics will take four of the following courses. Each of these courses lasts 5 weeks (one-half of one academic quarter). Therefore, four of these courses will amount to two quarters of coursework.

GENOME 551 (Principles of Gene Regulation) A detailed examination of the mechanisms of transcription

and translation as determined by experimental genetics, molecular biology, and biochemistry.

GENOME 552 (Technologies for Genome Analysis) Discussion of current and newly-emerging technologies

in genome analysis with regard to applications in biology and medicine and to potential advantages and

limitations.

GENOME 553 (Advanced Genetic Analysis) Classical genetic analysis is a powerful approach to dissect

complex biological processes. Selective removal, addition, or alteration of specific proteins to identify and

order genes in a pathway, define protein function, determine tissue and temporal requirements for gene

function, and distinguish among competing hypotheses to explain biological phenomena.

GENOME 555 (Protein Technology) Focuses on current and emerging technologies and approaches in

protein analysis, and considers applications of these technologies in biology, biotechnology, and medicine.

GENOME 561 (Molecular Population Genetics and Evolution) Surveys recent literature to gain an understanding

of the basic principles of molecular population genetics and evolution as applied to analysis of

genome data.

**Neuroscience coursework:**

All trainees specializing in neuroscience will take the following course:

CSE 528 (Computational Neuroscience) Introduction to computational methods for understanding nervous

systems and the principles governing their operation. Topics include representation of information by spiking

neurons, information processing in neural circuits, and algorithms for adaptation and learning.

Furthermore, all trainees specializing in neuroscience will take two of the following three courses:

NEURO 501 (Introduction to Neurobiology: Molecular & Cellular Neurobiology) Concepts and techniques of

molecular and cell biology as applied to understanding development and function of the nervous system.

NEURO 502 (Introduction to Neurobiology: Sensory & Motor Systems) Introduction to neuroanatomy and

modules on sensory and motor systems, examination of macroscopic and microscopic neural tissues.

NEURO 503 (Cognitive and Integrative Neuroscience) A discussion of higher neural processes like learning,

memory, and neuroendocrinology. Lecture and laboratory discussion of original literature, observation of

demonstrations and simulations.

BDGN trainees will receive instruction in best practices for reproducible research by enrolling in the intensive 2.5-day course on reproducible research that will be offered at University of Washington each July as part of the **Summer Institute for Statistics of Big Data** (SISBID).

Trainees must also complete at least one of the following three options to meet the responsible conduct of research requirement: (1) BIOST 532: Ethical Issues for Biostatisticians, (2) GENOME 580: Ethics, or (3) the Biomedical Research Integrity Series.

Each year, trainees will be required to register for four course credits of seminars or journal clubs related to genomics, neuroscience, statistics, or computing. **These four course credits must include:** 1) a computing seminar, 2) a statistics seminar, and 3) either a genomics or a neuroscience seminar.

For example a trainee might register for two credits of the Biostat Department seminar, one credit of the Machine Learning Lunch, and one credit of the Genome Sciences seminar.

A list of acceptable seminars and journal clubs includes:

**Statistics**

Biostatistics Department Seminar

Statistics Department Seminar

Statistical Genetics Seminar

Boeing Distinguished Colloquia (Applied Math)

Chem E 599, Topics in Data Science

**Computing**

Machine Learning Lunch

Computer Science & Engineering Colloquia

Electrical Engineering Research Colloquium

Trends in Optimization Seminar

Reduced Order Modeling Seminar

Chem E 591, Robotics and Control Systems Colloquium

**Genomics**

Genome Sciences Seminar

Reading and Research in Computational Biology

Combi Seminar

Medical Genetics Journal Club (Genome 525)

**Neuroscience**

Theoretical Neuroscience Journal Club

Computational Neuroscience Seminar

Physiology and Biophysics Seminar

Biology 581: Topics in Physiology (neuroinformatics working group)

Students who join the BDGN TG beginning in their first year of graduate study will perform three quarter-long rotations, including one rotation in genomics or neuroscience, and one working in either statistics or computing. During each rotation, each BDGN trainee will be paired with a “rotation collaborator”: a senior PhD student or post-doc in the lab in which he or she is rotating, who will help guide the trainee over the course of the rotation.

Students who have already selected a thesis lab at the time they join the BDGN TG are exempt from the rotation requirement. They are encouraged, however, to have supervisory committees including faculty with expertise in both genomics/neuroscience and statistics/computing.

Trainees are encouraged to spend the summer after the first year of their BDGN traineeships performing external internships in order to enhance their training in biomedical big data. The BDGN Steering Committee members will use their relationships with local companies to help place trainees in summer internships.