Data Science Options
Genome Sciences offers both a Data Science option and an Advanced Data Science option. The two options have very similar structures. However, the Advanced Data Science option, as the name implies, is designed for students with considerable background in computer science, whereas the courses associated with the Data Science option are less demanding. Advanced Data Science is oriented toward tool builders, whereas Data Science is oriented toward tool users. Both are official UW degree options which will appear on the student’s transcript.
Introduction:
The Data Science options aim to educate the next generation of thought leaders who will both build and apply new methods for data science. These options will help to educate and recognize PhD students whose thesis work focuses on building and using data science tools. The goal of these options is not to educate all students in the foundations of data science but rather to provide advanced education to the students who will push the stateoftheart in data science methods in their respective domains.
Students enrolled in either option can expect to interact with students enrolled in similar Data Science PhD options in Computer Science, Statistics, Oceanography, Chemical Engineering and Astronomy. In addition, the options are designed to complement the activities of the eScience Institute and to leverage ongoing activities associated with the Moore/Sloan Foundation Data Driven Discovery Initiative, involving the University of Washington, New York University and the University of California, Berkeley.
Advising:
Students with an interest in the Advanced Data Science option but only limited experience in this area should take preparatory coursework before attempting the ADS courses. Please contact Bill Noble for suggested courses.
Admission:
Genome Sciences students who choose to enroll in either Data Science option must have approval of their thesis advisor and should then let Brian Giebel (bgiebel [ a t ] uw.edu) know they are planning to follow this option. There is no additional admission procedure. Once you have completed all requirements for either option, please contact Brian so that he may have this option added to your transcript.
Faculty:
Any Genome Sciences faculty member may serve as advisor to students enrolled in either Data Science option, although the student’s committee must include at least one of the following faculty members: Brian Beliveau, Jesse Bloom, Phil Green, Kelley Harris, Gail Jarvik, SuIn Lee, Bill Noble, Larry Ruzzo, or Cole Trapnell.
Training Grant:
For information about the new Big Data in Genomics and Neuroscience Training Grant, please see the BDGN website.
Course Sequence:
Data Science option:
The structure of the Data Science option is similar to that of the Advanced Data Science option, except that students can select from a wider variety of courses, including introductory courses in each topic area with no prerequisites. Note that all courses that count toward the Advanced Data Science option may instead be applied to the Data Science option. Also, only two quarters of the eScience Community Seminar are required, rather than four.
Requirements:

One statistics course (or pair of courses) from the list below. Note that several of the courses are twoquarter series, which cover similar topics as GENOME 560 but in greater depth. If students opt to take one of these, they must complete both quarters to satisfy the Data Science option requirement.

Genome 540: Computational Molecular Biology I

One course from two of the following three areas, selected from the table below:

Data Management

Machine Learning

Data Visualization

At least two quarters in the weekly eScience Community Seminar.
Area

Course number 
Course title 
Prerequisites 
Adv 
Statistics 
GENOME 560 
Introduction to statistical genomics 
None 

Statistics 
BIOSTAT 511512 
Medical biometry I & II 
None 

Statistics 
BIOSTAT 517518 
Applied biostatistics I & II 
None 

Statistics 
STAT 509 
Introduction to mathematical statistics 
STAT 311 and (MATH 308 or 309) 
X 
Statistics 
STAT 512513 
Statistical inference 
STAT 395 and (STAT 421, 423, 504, or BIOST 512) 
X 
Data management 
BIOSTAT 544 
Introduction to data science 
BIOSTAT 511 or equivalent 

Data management 
CSE 583 
Software development for data scientists 
None 

Data management 
CHEME 546 
Software engineering for molecular data scientists 
None 

Data management 
BIOSTAT 545 
Biostatistical methods for big omics data 
BIOST 51112 or equivalent 

Data management 
CSE 414 
Introduction to database systems 
CSE 143 or CSE 163 

Data management 
CSE 544 
Principles of database management systems 
None 

Machine learning 
BIOSTAT 546 
Machine learning for biomedical and public health data 
BIOST 51112 or equivalent 

Machine learning 
CSE 416 / STAT 416 
Introduction to machine learning 
(CSE 143 or CSE 160) and (STAT 311 or STAT 390) 

Machine learning 
STAT 435 
Introduction to statistical machine learning 
STAT 341, 390, or 391 
X 
Machine learning 
CSE 546 
Machine learning 
CSE 312, STAT 341, or STAT 391 
X 
Visualization 
CSE 442 
Data visualization 
CSE 332 

Visualization 
CSE 412 
Introduction to data visualization 
CSE 143 or CSE 163 

Visualization 
CSE 512 
Data visualization 
None 
X 
Visualization 
IMT 561 
Visualization design 
None 

Visualization 
IMT 562 
Interactive information visualization 
None 

Visualization 
HCDE 511 
Information visualization / data visualization and exploratory analytics 
None 

Visualization 
HCDE 411 
Information visualization 
HCDE 308 and 310 
Advanced Data Science option:
Students who choose to follow the Advanced Data Science option of the Genome Sciences Ph.D. program should follow the regular Genome Sciences course sequence but also include the following course requirements:
1. Instead of Genome 560: Statistics for Genome Sciences (typically offered Spring Quarter), students enrolled in the Advanced Data Science option should take Statistics 509: Introduction to Mathematical Statistics. Statistics 509 was most recently offered during Autumn Quarter, but you should check the Department of Statistics website or the UW Time Schedule to see when it will next be offered. Please note that this course requires significant use of calculus. If you have not taken calculus for some number of years, you might want to consider taking a refresher course beforehand, and you should definitely take a look at the resources for review: https://www.stat.washington.edu/tsr/509review/
Alternatively, for a more advanced approach, students may choose to take Statistics 512: Statistical Inference. In this case, students may wish to consider also taking Statistics 513, the second course in this sequence.
2. Genome 540: Computational Molecular Biology (typically offered Winter Quarter each year)
3. Electives:
Students must take 2 of the following three courses:
Data Management: CSE 544.
Machine Learning, CSE 546 or STAT 535
Data Visualization: CSE 512.
4. Additionally, to further expand students’ education and create a campuswide community, students will register for at least 4 quarters in the weekly eScience Community Seminar.
Please check the UW Time Schedule or the Department of Statistics and Department of Computer Science & Engineering websites for information on when these electives are offered.
Frequently Asked Questions:
Do I need to complete this coursework during my first year?
No. You are welcome to enroll & complete the course sequence at any time during your graduate studies. A good time to enroll might be at the end of year one, once you have selected a thesis lab, although you may end up completing some of the required courses (for example, Genome 540), during your first year.
How do I apply?
Simply obtain your thesis advisor's permission and then contact Brian Giebel (bgiebel [ a t ] uw.edu) to let him know you are planning to follow this option. Once you have completed all coursework, contact Brian to let him know which courses you have taken to fulfill requirements, so that he may get this option added to your transcript.
Which is the right option for me – Data Science or Advanced Data Science?
Please contact Bill Noble for advice in which might be the best option for you.
Which courses should I take as prereqs in preparation for enrolling in this program?
Please contact Bill Noble for suggested courses.