Data Science Options
Genome Sciences offers both a Data Science option and an Advanced Data Science option. The two options have very similar structures. However, the Advanced Data Science option, as the name implies, is designed for students with considerable background in computer science, whereas the courses associated with the Data Science option are less demanding. Advanced Data Science is oriented toward tool builders, whereas Data Science is oriented toward tool users. Both are official UW degree options which will appear on the student’s transcript.
Introduction:
The Data Science options aim to educate the next generation of thought leaders who will both build and apply new methods for data science. These options will help to educate and recognize PhD students whose thesis work focuses on building and using data science tools. The goal of these options is not to educate all students in the foundations of data science but rather to provide advanced education to the students who will push the state-of-the-art in data science methods in their respective domains.
Students enrolled in either option can expect to interact with students enrolled in similar Data Science PhD options in Computer Science, Statistics, Oceanography, Chemical Engineering and Astronomy. In addition, the options are designed to complement the activities of the eScience Institute and to leverage ongoing activities associated with the Moore/Sloan Foundation Data Driven Discovery Initiative, involving the University of Washington, New York University and the University of California, Berkeley.
Advising:
Students with an interest in the Advanced Data Science option but only limited experience in this area should take preparatory coursework before attempting the ADS courses. Please contact Bill Noble for suggested courses.
Admission:
Genome Sciences students who choose to enroll in either Data Science option must have approval of their thesis advisor and should then let Brian Giebel (bgiebel [ a t ] uw.edu) know they are planning to follow this option. There is no additional admission procedure. Once you have completed all requirements for either option, please contact Brian so that he may have this option added to your transcript.
Faculty:
Any Genome Sciences faculty member may serve as advisor to students enrolled in either Data Science option, although the student’s committee must include at least one of the following faculty members: David Baker, Trevor Bedford, Brian Beliveau, Jesse Bloom, Gavin Ha, Kelley Harris, Gail Jarvik, Su-In Lee, Erick Matsen, Sara Mostafavi, Bill Noble, or Cole Trapnell.
Course Sequence:
Data Science option:
The structure of the Data Science option is similar to that of the Advanced Data Science option, except that students can select from a wider variety of courses, including introductory courses in each topic area with no prerequisites. Note that all courses that count toward the Advanced Data Science option may instead be applied to the Data Science option. Also, only two quarters of the eScience Community Seminar are required, rather than four.
Requirements:
-
One statistics course (or pair of courses) from the list below. Note that several of the courses are two-quarter series, which cover similar topics as GENOME 560 but in greater depth. If students opt to take one of these, they must complete both quarters to satisfy the Data Science option requirement.
-
Genome 540: Computational Molecular Biology I
-
One course from two of the following three areas, selected from the table below:
-
Data Management
-
Machine Learning
-
Data Visualization
-
At least two quarters in the weekly UW Data Science Seminar series.
Area
|
Course number |
Course title |
Pre-requisites |
Adv |
Statistics |
GENOME 560 |
Introduction to statistical genomics |
None |
|
Statistics |
BIOSTAT 511-512 |
Medical biometry I & II |
None |
|
Statistics |
BIOSTAT 517-518 |
Applied biostatistics I & II |
None |
|
Statistics |
STAT 509 |
Introduction to mathematical statistics |
STAT 311 and (MATH 308 or 309) |
X |
Statistics |
STAT 512-513 |
Statistical inference |
STAT 395 and (STAT 421, 423, 504, or BIOST 512) |
X |
Data management |
BIOSTAT 544 |
Introduction to data science |
BIOSTAT 511 or equivalent |
|
Data management |
CSE 583 |
Software development for data scientists |
None |
|
Data management |
CHEME 546 |
Software engineering for molecular data scientists |
None |
|
Data management |
BIOSTAT 545 |
Biostatistical methods for big omics data |
BIOST 511-12 or equivalent |
|
Data management |
CSE 414 |
Introduction to database systems |
CSE 143 or CSE 163 |
|
Data management |
CSE 544 |
Principles of database management systems |
None |
|
Data management |
Genome 569 |
Bioinformatics Workflows for High-Throughput Sequencing Experiments |
None |
|
Machine learning |
BIOSTAT 546 |
Machine learning for biomedical and public health data |
BIOST 511-12 or equivalent |
|
Machine learning |
CSE 416 / STAT 416 |
Introduction to machine learning |
(CSE 143 or CSE 160) and (STAT 311 or STAT 390) |
|
Machine learning |
STAT 435 |
Introduction to statistical machine learning |
STAT 341, 390, or 391 |
X |
Machine learning |
CSE 546 |
Machine learning |
CSE 312, STAT 341, or STAT 391 |
X |
Visualization |
CSE 442 |
Data visualization |
CSE 332 |
|
Visualization |
CSE 412 |
Introduction to data visualization |
CSE 143 or CSE 163 |
|
Visualization |
CSE 512 |
Data visualization |
None |
X |
Visualization |
IMT 561 |
Visualization design |
None |
|
Visualization |
IMT 562 |
Interactive information visualization |
None |
|
Visualization |
HCDE 511 |
Information visualization / data visualization and exploratory analytics |
None |
|
Visualization |
HCDE 411 |
Information visualization |
HCDE 308 and 310 |
Advanced Data Science option:
Students who choose to follow the Advanced Data Science option of the Genome Sciences Ph.D. program should follow the regular Genome Sciences course sequence but also include the following course requirements:
1. Instead of Genome 560: Statistics for Genome Sciences (typically offered Spring Quarter), students enrolled in the Advanced Data Science option should take Statistics 509: Introduction to Mathematical Statistics. Statistics 509 was most recently offered during Autumn Quarter, but you should check the Department of Statistics website or the UW Time Schedule to see when it will next be offered. Please note that this course requires significant use of calculus. If you have not taken calculus for some number of years, you might want to consider taking a refresher course beforehand, and you should definitely take a look at the resources for review: https://www.stat.washington.edu/tsr/509review/
Alternatively, for a more advanced approach, students may choose to take Statistics 512: Statistical Inference. In this case, students may wish to consider also taking Statistics 513, the second course in this sequence.
2. Genome 540: Computational Molecular Biology (typically offered Winter Quarter each year)
3. Electives:
Students must take 2 of the following three courses:
Data Management: CSE 544.
Machine Learning, CSE 546 or STAT 535
Data Visualization: CSE 512.
4. Additionally, to further expand students’ education and create a campus-wide community, students will register for at least 4 quarters in the weekly eScience Community Seminar.
Please check the UW Time Schedule or the Department of Statistics and Department of Computer Science & Engineering websites for information on when these electives are offered.
Frequently Asked Questions:
Do I need to complete this coursework during my first year?
No. You are welcome to enroll & complete the course sequence at any time during your graduate studies. A good time to enroll might be at the end of year one, once you have selected a thesis lab, although you may end up completing some of the required courses (for example, Genome 540), during your first year.
How do I apply?
Simply obtain your thesis advisor's permission and then contact Brian Giebel (bgiebel [ a t ] uw.edu) to let him know you are planning to follow this option. Once you have completed all coursework, contact Brian to let him know which courses you have taken to fulfill requirements, so that he may get this option added to your transcript.
Which is the right option for me – Data Science or Advanced Data Science?
Please contact Bill Noble for advice in which might be the best option for you.
Which courses should I take as prereqs in preparation for enrolling in this program?
Please contact Bill Noble for suggested courses.