Data Science Options

Genome Sciences offers both a Data Science option and an Advanced Data Science option.  The two options have very similar structures. However, the Advanced Data Science option, as the name implies, is designed for students with considerable background in computer science, whereas the courses associated with the Data Science option are less demanding.  Advanced Data Science is oriented toward tool builders, whereas Data Science is oriented toward tool users. Both are official UW degree options which will appear on the student’s transcript.

Introduction:

The Data Science options aim to educate the next generation of thought leaders who will both build and apply new methods for data science. These options will help to educate and recognize PhD students whose thesis work focuses on building and using data science tools. The goal of these options is not to educate all students in the foundations of data science but rather to provide advanced education to the students who will push the state-of-the-art in data science methods in their respective domains.

Students enrolled in either option can expect to interact with students enrolled in similar Data Science PhD options in Computer Science, Statistics, Oceanography, Chemical Engineering and Astronomy. In addition, the options are designed to complement the activities of the eScience Institute and to leverage ongoing activities associated with the Moore/Sloan Foundation Data Driven Discovery Initiative, involving the University of Washington, New York University and the University of California, Berkeley.

Advising:

Students with an interest in the Advanced Data Science option but only limited experience in this area should take preparatory coursework before attempting the ADS courses. Please contact Bill Noble for suggested courses.

Admission:

Genome Sciences students who choose to enroll in either Data Science option must have approval of their thesis advisor and should then let Brian Giebel (bgiebel [ a t ] uw.edu) know they are planning to follow this option. There is no additional admission procedure. Once you have completed all requirements for either option, please contact Brian so that he may have this option added to your transcript.

Faculty:

Any Genome Sciences faculty member may serve as advisor to students enrolled in either Data Science option, although the student’s committee must include at least one of the following faculty members: Tony Blau, Jesse Bloom, Elhanan Borenstein, Phil Green, Kelley Harris, Gail Jarvik, Su-In Lee, Bill Noble, or Larry Ruzzo.

Training Grant:

For information about the new Big Data in Genomics and Neuroscience Training Grant, please see the BDGN website.

Course Sequence:

Data Science option:

The structure of the Data Science option is similar to that of the Advanced Data Science option, except that students can select from a wider variety of courses, including introductory courses in each topic area with no prerequisites. Note that all courses that count toward the Advanced Data Science option may instead be applied to the Data Science option. Also, only two quarters of the eScience Community Seminar are required, rather than four.

  1. One statistics course (or pair of courses) from the list below. Note that several of the courses are two-quarter series, which cover similar topics as GENOME 560 but in greater depth. If students opt to take one of these, they must complete both quarters to satisfy the Data Science option requirement.

  2. Genome 540: Computational Molecular Biology I

  3. One course from two of the following three areas, selected from the table below:

    • Data Management

    • Machine Learning

    • Data Visualization

  4. At least two quarters in the weekly eScience Community Seminar.

 

Area

 

Course number

Course title

Pre-requisites

Adv

Statistics

GENOME 560

Introduction to statistical genomics

None

Statistics

BIOSTAT 511-512

Medical biometry I & II

None

Statistics

BIOSTAT 517-518

Applied biostatistics I & II

None

Statistics

STAT 509

Introduction to mathematical statistics

STAT 311 and (MATH 308 or 309)

X

Statistics

STAT 512-513

Statistical inference

STAT 395 and (STAT 421, 423, 504, or BIOST 512)

X

Data management

BIOSTAT 544

Introduction to data science

BIOSTAT 511 or equivalent

Data management

CSE 583

Software development for data scientists

None

Data management

CHEME 546

Software engineering for molecular data scientists

None

Data management

BIOSTAT 545

Biostatistical methods for big omics data

BIOST 511-12 or equivalent

Data management

CSE 414

Introduction to database systems

CSE 143 or CSE 163

Data management

CSE 544

Principles of database management systems

None

Machine learning

BIOSTAT 546

Machine learning for biomedical and public health data

BIOST 511-12 or equivalent

Machine learning

CSE 416 / STAT 416

Introduction to machine learning

(CSE 143 or CSE 160) and (STAT 311 or STAT 390)

Machine learning

STAT 435

Introduction to statistical machine learning

STAT 341, 390, or 391

X

Machine learning

CSE 546

Machine learning

CSE 312, STAT 341, or STAT 391

X

Visualization

CSE 442

Data visualization

CSE 332

Visualization

CSE 412

Introduction to data visualization

CSE 143 or CSE 163

Visualization

CSE 512

Data visualization

None

X

Visualization

INFX 561

Visualization design

None

Visualization

INFX 562

Interactive information visualization

None

Visualization

HCDE 511

Information visualization / data visualization and exploratory analytics

None

Visualization

HCDE 411

Information visualization

HCDE 308 and 310

 

Advanced Data Science option:

Students who choose to follow the Advanced Data Science option of the Genome Sciences Ph.D. program should follow the regular Genome Sciences course sequence but also include the following course requirements:

1. Instead of Genome 560: Statistics for Genome Sciences (typically offered Spring Quarter), students enrolled in the Advanced Data Science option should take Statistics 509: Introduction to Mathematical Statistics. Statistics 509 was most recently offered during Autumn Quarter, but you should check the Department of Statistics website or the UW Time Schedule to see when it will next be offered. Please note that this course requires significant use of calculus. If you have not taken calculus for some number of years, you might want to consider taking a refresher course beforehand.

Alternatively, for a more advanced approach, students may choose to take Statistics 512: Statistical Inference. In this case, students may wish to consider also taking Statistics 513, the second course in this sequence.

2. Genome 540: Computational Molecular Biology (typically offered Winter Quarter each year)

3. Electives:
Students must take 2 of the following three courses:

Data Management: CSE 544.
Machine Learning, CSE 546 or STAT 535
Data Visualization: CSE 512.

4. Additionally, to further expand students’ education and create a campus-wide community, students will register for at least 4 quarters in the weekly eScience Community Seminar.

Please check the UW Time Schedule or the Department of Statistics and Department of Computer Science & Engineering websites for information on when these electives are offered.

 

Frequently Asked Questions:

Do I need to complete this coursework during my first year?

No. You are welcome to enroll & complete the course sequence at any time during your graduate studies. A good time to enroll might be at the end of year one, once you have selected a thesis lab, although you may end up completing some of the required courses (for example, Genome 540), during your first year.

How do I apply?

Simply obtain your thesis advisor's permission and then contact Brian Giebel (bgiebel [ a t ] uw.edu) to let him know you are planning to follow this option.

Which is the right option for me – Data Science or Advanced Data Science?

Please contact Bill Noble for advice in which might be the best option for you.

Which courses should I take as prereqs in preparation for enrolling in this program?

Please contact Bill Noble for suggested courses.