Skip to main content

Data Science

[ undergraduate program | graduate program | faculty ]

All courses, faculty listings, and curricular and degree requirements described herein are subject to change or deletion without notice.

Courses

For course descriptions not found in the UC San Diego General Catalog 2023–24, please contact the department for more information.

Lower Division

DSC 10. Principles of Data Science (4)

This first course in data science introduces students to data exploration, statistical inference, and prediction. It introduces the Python programming language as a tool for tabular data manipulation, visualization, and simulation. Through homework assignments and projects, students are given an opportunity to develop their analytical skills while working with real-world datasets from a variety of domains. Prerequisites: none.

DSC 20. Programming and Basic Data Structures for Data Science (4)

Provides an understanding of the structures that underlie the programs, algorithms, and languages used in data science by expanding the repertoire of computational concepts introduced in DSC 10 and exposing students to techniques of abstraction. Course will be taught in Python and will cover topics including recursion, higher-order functions, function composition, object-oriented programming, interpreters, classes, and simple data structures such as arrays, lists, and linked lists. Prerequisites: DSC 10. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 30. Data Structures and Algorithms for Data Science (4)

Builds on topics covered in DSC 20 and provides practical experience in composing larger computational systems through several significant programming projects using Java. Students will study advanced programming techniques including encapsulation, abstract data types, interfaces, algorithms and complexity, and data structures such as stacks, queues, priority queues, heaps, linked lists, binary trees, binary search trees, and hash tables. Prerequisites: DSC 20. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 40A. Theoretical Foundations of Data Science I (4)

The sequence DSC 40A-B introduces the theoretical foundations of data science. DSC 40A, the first course in the sequence, exposes students to the mathematical theory underlying fundamental topics in machine learning. Topics include empirical risk minimization, optimization, regression, classification, and discrete probability. Students practice creative problem-solving while learning how to rigorously justify and communicate mathematical ideas. Prerequisites: DSC 10, MATH 20C or MATH 31BH, and MATH 18 or MATH 20F or MATH 31AH. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 40B. Theoretical Foundations of Data Science II (4)

The sequence DSC 40A-B introduces the theoretical foundations of data science. DSC 40B, the second course in the sequence, covers the fundamentals of computer science with applications to data science. Topics include time complexity analysis, the analysis of recursive algorithms, graph theory, and graph search algorithms. Whereas other courses in the curriculum, such as DSC 20 and DSC 30, may touch on these topics briefly, this course aims to develop a deeper, theoretical understanding. DSC 40A-B connect to DSC 10, 20, and 30 by providing the theoretical foundation for the methods that underlie data science. Prerequisites: DSC 20 and 40A. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 80. The Practice and Application of Data Science (4)

The marriage of data, computation, and inferential thinking, or “data science,” is redefining how people and organizations solve challenging problems and understand the world. This course bridges lower- and upper-division data science courses as well as methods courses in other fields. Students master the data science life-cycle and learn many of the fundamental principles and techniques of data science spanning algorithms, statistics, machine learning, visualization, and data systems. Prerequisites: DSC 30 and DSC 40A. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 90. Seminar in Data Science (2)

Students will learn about a variety of topics in data science through interactive presentations from faculty and industry professionals. May be taken for credit up to four times. Prerequisites: DSC 10. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 95. Tutor Apprenticeship in Data Science (2)

Students will receive training in skills and techniques necessary to be effective tutors for data science courses. Students will also gain practical experience in tutoring students on data science topics. Prerequisites: DSC 10. Students must have applied for and been accepted as a tutor for a DSC course for the first time; enrollment in DSC 95 is required for these students.

DSC 96. Workshop in Data Science (2)

Students will explore topics and tools relevant to the practice of data science in a workshop format. The instructor works with students on guided projects to help students acquire knowledge and skills to complement their course work in the core data science classes. Topics vary from quarter to quarter. Students are strongly recommended to enroll in either DSC 10 or DSC 20 concurrently. Prerequisites: none. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 97. Internship in Data Science (2 or 4)

Individual research on a topic related to data science, by special arrangement with and under the direction of a UC San Diego faculty member, in connection with an internship at an organization. It is the students’ responsibility to find an internship related to data science prior to enrolling in this course. Internship work will inform but not necessarily define the research topic. The research topic is expected to promote the study of the principles and techniques involved in the internship work. May be taken for credit up to two times. Prerequisites: MATH 20C or MATH 31BH and MATH 18 or MATH 31AH and DSC 20 and DSC 40A. Completion of thirty units at UC San Diego with a university GPA of 3.0. Special Studies form, consent of the instructor, and approval of the department required. Priority enrollment is given to data science majors DS25. Restricted to first-year and sophomore level students.

DSC 98. Directed Group Study in Data Science (2 or 4)

Students will investigate a topic in data science through directed reading, discussion, and project work under the supervision of a faculty member. May be taken for credit up to two times. Prerequisites: MATH 20C or MATH 31BH and MATH 18 or MATH 31AH and DSC 20 and DSC 40A. Completion of thirty units at UC San Diego with a university GPA of 3.0. Special Studies form, consent of the instructor, and approval of the department required. Priority enrollment is given to data science majors DS25. Restricted to first-year and sophomore level students.

DSC 99. Independent Study in Data Science (2 or 4)

Students will participate in independent study or research in data science under the direction of a UC San Diego faculty member. May be taken for credit up to two times. Prerequisites: MATH 20C or MATH 31BH and MATH 18 or MATH 31AH and DSC 20 and DSC 40A. Completion of thirty units at UC San Diego with a university GPA of 3.0. Special Studies form, consent of the instructor, and approval of the department required. Priority enrollment is given to data science majors DS25. Restricted to first-year and sophomore level students.

Upper Division

DSC 100. Introduction to Data Management (4)

This course is an introduction to storage and management of large-scale data using classical relational (SQL) systems, with an eye toward applications in data science. The course covers topics including the SQL data model and query language, relational data modeling and schema design, elements of cost-based query optimizations, relational data base architecture, and database-backed applications. Prerequisites: DSC 40B and DSC 80. Restricted to students with upper-division standing. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 102. Systems for Scalable Analytics (4)

This course introduces the principles of computing systems and infrastructure for scaling analytics to large datasets. Topics include memory hierarchy, distributed systems, model selection, heterogeneous datasets, and deployment at scale. The course will also discuss the design of systems such as MapReduce/Hadoop and Spark, in conjunction with their implementation. Students will also learn how dataflow operations can be used to perform data preparation, cleaning, and feature engineering. Prerequisites: DSC 100. Restricted to students with upper-division standing. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 104. Beyond Relational Data Management (4)

The course will introduce a variety of No-SQL data formats, data models, high-level query languages, and programming abstractions representative of the needs of modern data analytic tasks. Topics include hierarchical graph database systems, unrestricted graph database systems, array databases, comparison of expressive power of the data models, and parallel programming abstractions, including Map/Reduce and its descendants. Prerequisites: DSC 100. Restricted to students with upper-division standing. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 106. Introduction to Data Visualization (4)

Data visualization helps explore and interpret data through interaction. This course introduces the principles, techniques, and algorithms for creating effective visualizations. The course draws on the knowledge from several disciplines including computer graphics, human-computer interaction, cognitive psychology, design, and statistical graphics and synthesizes relevant ideas. Students will design visualization systems using D3 or other web-based software and evaluate their effectiveness. Prerequisites: DSC 80. Restricted to students with upper-division standing. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 120. Signal Processing for Data Analysis (4)

This course will focus on ideas from classical and modern signal processing, with the main themes of sampling continuous data and building informative representations using orthonormal bases, frames, and data dependent operators. Topics include sampling theory, Fourier analysis, lossy transformations and compression, time and spatial filters, and random Fourier features and connections to kernel methods. Sources of data include time series and streaming signals and various imaging modalities. Prerequisites: MATH 18 or MATH 31AH and MATH 20C and DSC 40B. Restricted to students with upper-division standing. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 140A. Probabilistic Modeling and Machine Learning (4)

The course covers learning and using probabilistic models for knowledge representation and decision-making. Topics covered include graphical models, temporal models, and online learning, as well as applications to natural language processing, adversarial learning, computational biology, and robotics. Prior completion of MATH 181A is strongly recommended. Prerequisites: DSC 80 and ECE 109 or ECON 120A or MAE 108 or MATH 180A or MATH 183 or MATH 186. Restricted to students with upper-division standing. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 140B. Representation Learning (4)

This course is an introduction to machine learning which explores techniques for learning suitable representations from data. Topics include clustering, dimensionality reduction, manifold learning, principal component analysis, spectral embeddings, multilayer perceptrons, autoencoders, convolutional and recurrent neural networks, and other aspects of deep learning. The course focuses on the underlying mathematical principles, but some attention is also given to implementation. Prerequisites: DSC 80, ECE 109 or ECON 120A or MAE 108 or MATH 180A or MATH 183 or MATH 186. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 148. Introduction to Data Mining (4)

This course mainly focuses on introducing current methods and models that are useful in analyzing and mining real-world data. It will cover frequent pattern mining, regression and classification, clustering, and representation learning. All participants should be comfortable with programming, and with basic optimization and linear algebra. Prerequisites: DSC 40B or CSE 12, DSC 80 or CSE 15L, MATH 180A or MATH 181A or MATH 183 or CSE 103 or ECE 109 or ECON 120A. Students may not receive credit for DSC 148 and CSE 158 or CSE 158R. Restricted to students within the DS25 major. All other students will be allowed as space permits.

DSC 155. Hidden Data in Random Matrices (4)

Rigorous treatment of principal component analysis, one of the most effective methods in finding signals amidst the noise of large data arrays. Topics include singular value decomposition for matrices, maximal likelihood estimation, least squares methods, unbiased estimators, random matrices, Wigner’s semicircle law, Markchenko-Pastur laws, universality of eigenvalue statistics, outliers, the BBP transition, applications to community detection, and stochastic block model. Prerequisites: MATH 180A and (MATH 18 or MATH 31AH). Students will not receive credit for both MATH 182 and DSC 155.

DSC 160. Data Science and the Arts (4)

This course addresses the intersection of data science and contemporary arts and culture, exploring four main themes of authorship, representation, visualization, and data provenance. The course is not solely an introduction to data science techniques nor merely an arts practice course, but explores significant new possibilities for both fields arising from their intersection. Students will examine problems from complementary perspectives of artist-researchers and data scientists. Prerequisites: DSC 80.

DSC 161. Text as Data (4)

This class explores statistical and computational methods to enable students to use text as a data source in the social sciences. Hands-on examples will equip students to work with text data in final projects. Prerequisites: ECON 5 or POLI 170A or POLI 171 or POLI 30 or POLI 30D or POLI 5 or POLI 5D. Students will not receive credit for both POLI 176 and DSC 161. Restricted to junior and senior level students. Restricted to students with upper-division standing.

DSC 167. Fairness and Algorithmic Decision-Making (4)

This course examines the greater context under which the practice of data science exists and explores concrete ways these issues surface in technical work. Students learn frameworks for understanding how individuals relate to social institutions, how to use them to identify how issues of fairness arise in the “life of a data scientist,” and use them to propose and critique potential solutions. Prerequisites: DSC 80.

DSC 170. Spatial Data Science and Applications (4)

Spatial data science is a set of concepts and methods that deal with accessing, managing, visualizing, analyzing, and reasoning about spatial data in applications where location, shape and size of objects, and their mutual arrangement are important. This upper-division course explores advanced data science concepts for spatial data, introducing students to principles and techniques of spatial data analysis, including geographic information systems, spatial big data management, and geostatistics. Prerequisites: DSC 80. Restricted to students with upper-division standing.

DSC 180A. Data Science Project I (4)

In this two-course sequence students will investigate a topic and design a system to produce statistically informed output. The investigation will span the entire lifecycle, including assessing the problem, learning domain knowledge, collecting/cleaning data, creating a model, addressing ethical issues, designing the system, analyzing the output, and presenting the results. 180A deals with research, methodology, and system design. Students will produce a research summary and a project proposal. Prerequisites: DSC 102 and MATH 189 and CSE 151A or COGS 188 or CSE 158 or CSE 158R or DSC 148 and DSC 106. Restricted to students with upper-division standing. Restricted to students within the DS25 major.

DSC 180B. Data Science Project II (4)

In this two-course sequence students will investigate a topic and design a system to produce statistically informed output. The investigation will span the entire lifecycle, including assessing the problem, learning domain knowledge, collecting/cleaning data, creating a model, addressing ethical issues, designing the system, analyzing the output, and presenting the results. 180B will consist of implementing the project while studying the best practices for evaluation. Prerequisites: DSC 180A. Students are only cleared to enroll into the discussion section associated with their DSC 180A discussion section. Restricted to students with upper-division standing. Restricted to students within the DS25 major.

DSC 190. Topics in Data Science (4)

Topics of special interest in data science. Topics vary from quarter to quarter. May be taken for credit up to four times when topic varies. Prerequisites: department and instructor approval required to monitor enrollment and to ensure that students have the sufficient educational background for a given topic. Restricted to students with upper-division standing.

DSC 191. Seminar in Data Science (1 or 2)

A seminar course on topics of current interest in data science. Topics may vary from quarter to quarter. May be taken for credit three times. Prerequisites: restricted to students with upper-division standing. Department and instructor approval is required to monitor enrollment and to ensure that students have the sufficient educational background for a given topic.

DSC 197. Data Science Internship (1–4)

Directed study and research at laboratories/institutions outside of campus. It is the students’ responsibility to find an internship related to data science prior to enrolling in this course. Prerequisites: restricted to students with upper-division standing. Consent of the instructor and approval of the department. An application for Special Studies must be filed with the Registrar’s office after approval from the instructor and the department chair.

DSC 198. Directed Group Study in Data Science (2 or 4)

Data science topics whose study involves reading and discussion by a small group of students under supervision of a faculty member. May be taken for credit up to two times. Prerequisites: restricted to students with upper-division standing. Consent of the instructor and approval of the department. Department stamp required. An application for Special Studies must be filed with the Registrar’s office after approval from the instructor and the department chair.

DSC 199. Independent Study for Data Science Undergraduates (2 or 4)

Independent reading or research on a topic related to data science by special arrangement with a faculty member. May be taken for credit up to two times. Prerequisites: restricted to students with upper-division standing. Consent of the instructor and approval of the department. An application for Special Studies must be filed with the Registrar’s Office after approval from the instructor and the department chair.

DSC 200. Data Science Programming (4)

Computing structures and programming concepts such as object orientation, data structures such as queues, heaps, lists, search trees, and hash tables. Laboratory skills include Jupyter notebooks, RESTful interfaces, and various software development kits (SDKs). Prerequisites: none. Restricted to students within the DS75 and DS76 major.

DSC 202. Data Management for Data Science (4)

Principles of data management, relational data model, relational algebra, SQL for data science, NoSQL databases (document, key-value, graph, column-family), multidimensional data management (data warehousing, OLAP Queries, OLAP Cubes, visualizing multidimensional data). Prerequisites: none. Restricted to students within the DS75 and DS76 major.

DSC 203. Data Visualization and Scalable Visual Analytics (4)

Commonly used algorithms and techniques in data visualization. Interactive reasoning and exploratory analysis through visual interfaces. Application of data visualization in various domains including science, engineering, and medicine. Scalable interactive methods involving exploring big data and visualization methods. Techniques to evaluate effectivity and interpretability of analytical products for diverse users to obtain insights in support of assessment, planning, and decision making. Prerequisites: DSC 202. Restricted to students within the DS75 and DS76 major.

DSC 204A. Scalable Data Systems (4)

Storage/memory hierarchy, distributed scalable computing (cluster, cloud, edge) principles. Big data storage, management, and processing at scale. Dataflow programming systems and programming models (MapReduce/Hadoop and Spark). Prerequisites: DSC 202. Restricted to students within the DS75 and DS76 major.

DSC 205. Geometry of Data (4)

This course will cover graph-based data modeling, analysis, and representation. Topics include spectral graph theory, spectral clustering, kernel-based manifold learning, dimensionality reduction and visualization, multiway data analysis, graph signal processing, graph neural networks. Prerequisites: DSC 210 or ECE 269 and DSC 212 and DSC 240. Restricted to students within the DS75 and DS76 major.

DSC 206. Algorithms for Data Science (4)

This course studies the mathematical foundations of massive data processing, developing algorithms, and analyzing them. We explore methods for sampling, sketching, and distributed processing of large scale databases, clustering, dimensionality reduction, and methods of optimization for the purpose of scalable statistical description, querying, pattern mining, and learning from data. Prerequisites: DSC 212. Restricted to students within the DS75 and DS76 major.

DSC 207R. Python for Data Science (4)

Essential tools for data science including the basic process of data science; Python and Jupyter notebooks; finding answers within large datasets by using Python to import data, explore it, analyze it, learn from it, visualize it, and generate easily shareable reports. An applied understanding of how to manipulate and analyze uncurated datasets; basic statistical analysis and machine learning methods; and how to effectively visualize results. This is a distance education course. Prerequisites: none. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.

DSC 208R. Data Management for Analytics (4)

Principles, techniques, and tools for organizing, storing, querying, transforming, and using data for analytics and machine learning computations at scale; including the relational data model, relational algebra, database system features for faster querying, and non-relational data systems. Introduction to data quality issues, data cleansing, cluster and cloud computing, and transformation for feature engineering. Evaluation of analytics results including reasoning about bias and fairness. This is a distance education course. Prerequisites: DSC 207R. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.

DSC 210. Numerical Linear Algebra (4)

Linear algebraic systems, least squares problems and regularization, orthogonalization methods, ill-conditioned problems, eigenvalue and singular value decomposition, principal component analysis, structured matrix factorization and fast algorithms, randomized linear algebra, JL lemma, sparse approximations. Prerequisites: none. Restricted to students within the DS75 and DS76 major.

DSC 211. Introduction to Optimization (4)

Continuity and differentiability of a function of several variables, gradient vector, Hessian matrices, Taylor approximation, fundamentals of optimization, Lagrange multipliers, convexity, gradient descent. Prerequisites: DSC 210. Restricted to students within the DS75 and DS76 major.

DSC 212. Probability and Statistics for Data Science (4)

Probability, random variables, distributions, central limit theorem, maximum likelihood estimation, method of moments, confidence intervals, hypothesis testing, Bayesian estimation, introduction to simulation and the bootstrap. Prerequisites: none. Restricted to students within the DS75 and DS76 major.

DSC 213. Statistics on Manifolds (4)

This is a graduate topics course covering statistics with manifold constraints. Topics include Frechet means and variances, principal geodesic analysis, directional statistics, random fields on manifolds, statistical distances between distributions, transport problems, and information geometry. Manifold constraints will be considered on simplexes, spheres, Stiefel manifold, stratified manifolds, cone of positive definite matrices, trees, compositional data, and other relevant manifolds. Prerequisites: DSC 210 and DSC 212. Restricted to students within the DS75 and DS76 major.

DSC 214. Topological Data Analysis (4)

Topology provides a powerful way to describe essential features of functions and spaces. In recent years topological methods have attracted much attention for analyzing complex data. Fundamental developments have been made and the resulting methods have been applied in many fields, e.g., graphics, visualization, neuroscience, and material science. This course introduces basic concepts and topological structures behind these developments, algorithms for them, and examples of applications. Prerequisites: none. Restricted to students within the DS75 and DS76 major.

DSC 215. Statistical Thinking and Experimental Design (4)

The goal of this course is to evaluate any paper in data science, regardless of application area. Topics include experimental design, claims, evidence, and statistical significance, the replication crisis, falsifiability, philosophy of science, history of probability and statistics. This class will be in the form of an open discussion based on provided reading materials. Prerequisites: none. Restricted to students within the DS75 and DS76 major.

DSC 215R. Foundations of Probability and Statistics in Data Science (4)

Foundations of probability and statistics needed for data science, including mathematical theory, and hands-on experience of applying this theory to actual data using Jupyter notebooks. Random variables, dependence, correlation, regression, PCA, entropy and MDL. This course is a distance education course. Prerequisites: none. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.

DSC 231. Embedded Sensing, IOT Data Models, and Methods (4)

Sensory data and control is mediated by devices near the edge of sensor networks, referred to as IOT (Internet of Things) devices. Components of IOT platforms: signal processing, communications/networking, control, real-time operating systems. Interfaces to cloud computing stack, publish-subscribe protocols such as MQTT, embedded software/middleware components, metadata schema, metadata normalization methods, applications in selected CPS (cyber-physical system) applications. Prerequisites: none. The class is designed for electronic enthusiasts who are quick learners on new embedded, sensor network devices. Restricted to students within the DS75 and DS76 major.

DSC 232R. Big Data Analytics Using Spark (4)

This course covers techniques for achieving scalability in data analysis, using tools such as MapReduce, Hadoop, and Spark. Topics include programming Spark using PySpark; identifying the computational tradeoffs in a Spark application; performing data loading and cleaning using Spark and Parquet; modeling data through statistical and machine learning methods, and mitigating bottlenecks that arise in massive parallel computations by using the Spark framework. This is a distance education course. Prerequisites: DSC 255R. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.

DSC 240. Machine Learning (4)

A graduate level course in machine learning algorithms: decision trees, principal component analysis, k-means, clustering, logistic regression, random forests, boosting, neural networks, kernel methods, deep learning. Prerequisites: DSC 210 and DSC 212. Restricted to students within the DS75 and DS76 major.

DSC 241. Statistical Models (4)

Linear/nonlinear models, generalized linear models, model fitting and model selection (cross-validation, knockoffs, etc.), regularization and penalization (ridge regression, lasso, etc.), robust methods, nonparametric regression, conformal prediction, causal inference. Prerequisites: DSC 210 and DSC 212. Restricted to students within the DS75 and DS76 major.

DSC 242. High-Dimensional Probability and Statistics (4)

Concentration inequalities, Markov processes and ergodicity, martingale inequalities, empirical processes, sparse linear models in high dimensions, principal component analysis in high dimensions, estimation of large covariance matrices. Prerequisites: none. Restricted to DS75 and DS76 students.

DSC 243. Advanced Optimization (4)

Linear/quadratic programming, optimization under constraints, gradient descent (deterministic and stochastic), convergence rate of gradient descent, acceleration phenomena in convex optimization, stochastic optimization with large data sets, complexity lower bounds for convex optimization. Prerequisites: DSC 211 and DSC 212. Restricted to DS75 and DS76 students.

DSC 244. Large-Scale Statistical Analysis (4)

Exploratory data analysis, diagnostics, bootstrap, large-scale (multiple) hypothesis testing, false discovery rate, empirical Bayes methods. Prerequisites: DSC 210 and DSC 212 and DSC 241. Restricted to DS75 and DS76 students.

DSC 245. Introduction to Causal Inference (4)

Causal versus predictive inference, potential outcomes and randomized experiments (A/B testing), structural causal models (interventions, counterfactuals, causal diagram, do-operator, d-separation), causal structure learning, identification of causal effect, estimation of causal effect, causal discovery and inference in presence of distribution shifts, selection bias, hidden confounders, cycles, nonlinear causal mechanisms, missing values, causal representation learning. Prerequisites: DSC 212 and DSC 240. Restricted to DS75 and DS76 students.

DSC 250. Advanced Data Mining (4)

Graph mining and basic text analysis (including keyphrase extraction and generation), set expansion and taxonomy construction, graph representation learning, graph convolutional neural networks, heterogeneous information networks, label propagation, and truth findings. Prerequisites: none. Restricted to DS75 and DS76 students.

DSC 251. Machine Learning in Control (4)

Estimation of stability and uncertainty, optimal control, and sequential decision making. Prerequisites: DSC 211 and DSC 240. Restricted to DS75 and DS76 students.

DSC 252. Statistical Natural Language Processing (4)

Diving deep to the classical NLP pipeline: tokenization, stemming, lemmatization, part-of-speech tagging, named entity recognition, parsing, and machine translation. Finite-state transducer, context-free grammar, hidden Markov models (HMM), and conditional random fields (CRF) will be covered in detail. Prerequisites: none. Restricted to DS75 and DS76 students.

DSC 253. Advanced Data-Driven Text Mining (4)

Unsupervised, weakly supervised, and distantly supervised methods for text mining problems, including information retrieval, open-domain information extraction, text summarization (both extractive and generative), and knowledge graph construction. Bootstrapping, comparative analysis, learning from seed words and existing knowledge bases will be the key methodologies. Prerequisites: none. Restricted to DS75 and DS76 students.

DSC 254. Statistical Signal and Image Analysis (4)

A graduate level course on signal and image analysis spanning three main themes. Statistical signal processing: random processes, stochasticity, stationarity, Wiener filter, Kalman filter, matched filter; signal processing: time-frequency representations, wavelets, signal processing with sparse representation (dictionary learning); image processing: registration, image degradation and restoration: noise models + denoising, image pyramids, random fields. Prerequisites: DSC 210 or ECE 269 and DSC 212. Need to verify background in signal processing from undergraduate or other graduate courses. Restricted to DS75 and DS76 students.

DSC 255R. Machine Learning Fundamentals (4)

Supervised and unsupervised learning algorithms, and the theory behind those algorithms. Using case studies, covered topics include classification, regression, and conditional probability estimation; generative and discriminative models; linear models and extensions to non-linearity using kernel methods; ensemble methods: boosting, bagging, random forests; representation learning: clustering, dimensionality reduction, auto-encoders, deep neural networks. This is a distance education course. Prerequisites: DSC 215R. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.

DSC 256R. Data Mining on the Web (4)

Application of machine learning and data mining techniques, recommender systems, data mining, and predictive analytics. Building models to understand data in order to gain insights and make predictions. Covered topics include regression; classification; unsupervised learning and dimensionality reduction; recommender systems; text mining; social network analysis; data visualization; and online advertising. This course is a distance education course. Prerequisites: DSC 215R. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.

DSC 257R. Unsupervised Learning (4)

Broad view of the field of unsupervised learning, in particular its most common methods and use cases. Topics include descriptive statistics; clustering; projection, singular value decomposition, and spectral embedding; common probability distributions; density estimation; graphical models and latent variable modeling; sparse coding and dictionary learning; autoencoders, shallow and deep; and self-supervised learning. Prerequisites: DSC 255R. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.

DSC 258R. Natural Language Processing (4)

Introduction to classical methods that are useful in analyzing and mining real-world text data, including tokenization, stemming and lemmatization, bag-of-words classification, word embedding, language models, sentiment analysis, part-of-speech tagging, named entity recognition, and sequence-to-sequence models. Consideration of possible biases, privacy, and societal implications in these models. This is a distance education course. Prerequisites: DSC 255R. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.

DSC 260. Data Science, Ethics, and Society (4)

This course will consider foundational concepts including power, justice, bias, privacy, and explainability; societal practices including delegation, organizational incentives, and accountability; and governance mechanisms including law, regulation, and norms. Prerequisites: none. Restricted to DS75 and DS76 students.

DSC 261. Responsible Data Science (4)

Data science lifecycle, data cleaning and quality management, data profiling, causal inference, algorithmic fairness (fairness definitions, impossibility results, causal fairness, building fair ML models, fairness beyond classification), algorithmic transparency (interpretability versus explainability, auditing-black-box algorithms, algorithmic recourse). Prerequisites: DSC 210 and DSC 240. Restricted to DS75 and DS76 students.

DSC 267R. Data Fairness and Ethics (4)

Examination of the inevitable ethical questions and issues that arise in all stages of data science, including issues of privacy, bias, trust, and more. Conceptual and mathematical tools that can be used both to recognize and address these ethical issues as they arise in real-world practice. This is a distance education course. Prerequisites: DSC 208R and DSC 255R. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.

DSC 290. Seminar in Data Science (1 or 2)

A graduate seminar course in which topics of special interest in data science will be presented by faculty or by graduate students under faculty direction. Restricted to graduate level students. May be repeated for credit twenty-four times as topics vary. Prerequisites: none.

DSC 291. Topics in Data Science (4)

Topics of special interest in data science. Topics may vary quarter to quarter. Restricted to graduate level students. Prerequisites: none.

DSC 293. Faculty Research Seminar (1)

Weekly faculty research seminar. Individual HDSI colloquia and distinguished lecturers may be included at the discretion of the instructor. May be taken for credit up to twenty-four times. Prerequisites: none.

DSC 294. Research Rotation (4)

Research rotations provide the opportunity for first-year PhD students to obtain research experience in data analysis under the guidance of HDSI affiliated faculty members. Through the rotations, students can identify a faculty member under whose supervision their dissertation research will be completed. Each research rotation is quarter-long; working with a HDSI faculty member and students are required to complete three separate rotations. Each student is required to take the course three times with at least two different instructors. Prerequisites: none. Only first-year PhD students are eligible.

DSC 295. Academia Survival Skills (1)

Basic skills necessary to succeed as a researcher in data science including scripting, cloud computing skills, fellowship proposal preparation, CV preparation, writing reviews, preparing posters, etc. Prerequisites: none.

DSC 298R. Graduate Capstone in Data Science (4)

Following the life cycle of a data science project, students apply advanced data science knowledge and techniques to a specific domain. Cleaning and structuring data for hypothesis generation and data analysis; creating a scalable big date pipeline from data ingestion and exploration, to modeling and evaluation; building machine learning models and evaluating insights; communicating finding through visualizations and reports; fully considering ethical implications throughout. Prerequisites: DSC 208R, DSC 255R, and DSC 256R. Restricted to students within the DS77 degree program. All other students with graduate standing may be considered as space permits.

DSC 299. Graduate Research (1–16)

Graduate research. May be taken for credit up to twenty-four times. Prerequisites: none.

DSC 500. Teaching Assistantship (2 or 4)

A course in which teaching assistants are aided in learning proper teaching methods by means of supervision of their work by the faculty: handling of discussions, preparation and grading of examinations and other written exercises, and student relations. Number of units for credit depends on number of hours devoted to class or section assistance. May be taken for credit up to seventy-two units. Prerequisites: DSC 500 is for selected teaching assistants only; therefore, consent of instructor is required.

DSC 599. Teaching Methods in Data Science (2)

Training in teaching methods in the field of data science. This course examines theoretical and practical communication and teaching techniques particularly appropriate to data science. Prerequisites: Consent of faculty required. Only graduate students who are TAing for the first time in the data science program are eligible to enroll.