Graduate

Data Science Courses 2017-18

A list of Data Science online courses offered in current/upcoming semesters is available through the tabs below.

Follow this link to register for classes.

Note: Unless otherwise specified, all courses listed are worth 3 credit hours.

Computer Science

Applied Algorithms
Class: CSCI B505
Section: 33026 (online), 30571
Syllabus: View document
Instructor: Funda Ergun
Synopsis: The course studies the design, implementation, and analysis of algorithms and data structures as applied to real world problems. Topics include divide-and-conquer, optimization, and randomized algorithms applied to problems such as sorting, searching, and graph analysis. Students will learn about trees, hash tables, heaps, and graphs.

Elements of Artificial Intelligence
Class: CSCI B551
Section: 33013 (online), 12190, 1664
Instructor: David Crandall
Synopsis: Introduction to major issues and approaches in artificial intelligence. Principles of reactive, goal-based, and utility-based agents. Problem-solving and search. Knowledge representation and design of representational vocabularies. Inference and theorem proving, reasoning under uncertainty, and planning. Overview of machine learning.



Information and Library Science

Social Media Mining
Class: ILS Z639
Section: 32775 (online), 12144
Instructor: Vincent Malic (online), A. Riddell
Synopsis: This course provides a graduate-level introduction to social media mining and methods. It offers hands-on experience mining social data for social meaning extraction (focusing on sentiment analysis) using automated methods and machine learning technologies. We will read, discuss, and critique claims and findings from contemporary research related to SMM.



Statistics

Introduction to Statistics
Class: STAT S520
Section: 36267 (online), 9845, 14505, 31275
Instructor: Jianyu Wang (online), Arturo Valdivia, Brad Luen, Jaime Ramos
Synopsis:  TBA



Informatics

Security for Networked Systems
Class: INFO I520
Section: 14127 (online), 11366
Instructor: Raquel Hill
Synopsis: This course is an extensive survey of system and network security. Course materials cover the threats to information confidentiality, integrity and availability, and the defense mechanisms that control such threats. It provides the foundation for more advanced security courses and hands-on experiences through course projects.

Big Data Applications and Analytics
Class: INFO I523
Section: 13310 (online), 13307
Instructor: Gregor Von Laszewski
Synopsis: The Big Data Applications & Analytics course is an overview course in Data Science and covers the applications and technologies (data analytics and clouds) needed to process the application data. It is organized around this rallying cry: Use Clouds running Data Analytics Collaboratively processing Big Data to solve problems in XInformatics.

Organizational Informatics & Economics of Security
Class: INFO I525
Section: 33034 (online), 32915
Instructor: Jean Camp
Synopsis: Security technologies make explicit organizational choices that allocate power. Security implementations allocate risk, determine authority, reify or alter relationships, and determine trust extended to organizational participants. The course begins with an introduction to relevant definitions (security, privacy, trust) and then moves to a series of timely case studies of security technologies.... show more

Management, Access, and Use of Big and Complex Data
Class: INFO I535
Section: 33630 (online), 33628
Instructor: Inna Kouper
Synopsis: Data is abundant, offering potential for new discovery along with economic and social gain. But data has its difficulties. It can be noisy and inadequately contextualized. There can be too big a gap from data to knowledge, or due to limits in technology or policy not easily combined with other data. This course will examine the underlying principles and technologies needed to capture data, as well as clean, contextualize, store, access, and trust it for a repurposed use. Specifically we will cover 1) distributed systems and database concepts underlying noSQL and graph databases, 2) best practices in data pipelines, 3) foundational concepts in metadata and provenance plus examples, and 4) developing theory in data trust and its role in reuse.... show more

Applied Data Mining
Class: INFO I590
Section: 33631
Instructor: Mehmet Dalkilic
Synopsis: The learning objective for this course is to broadly familiarize students with the elements of data mining. Students are expected to be proficient in algebra, as well as have familiarity with probability and calculus. While proficiency in R is necessary, the first week will be a refresher. Additionally, since there will be a fair amount of writing, knowledge of LATEX, a type-setting language, is required. Although there are many freely available, MikTeX is strongly suggested for its ease of use.
There are five major learning outcomes of equal importance:

· Knowledge Area
· Overall Data Mining Process
· Elements of the Process
· Machine Learning Algorithms
· Interpretation of Data Mining

More plainly, the student should be able to assess a potential data mining problem, employ the process (which includes the appropriate algorithm), interpret the results, and suggest an outcome clearly and succinctly.
... show more

Applied Data Science
Class: INFO I590
Section: 33632 (online), 33627
Syllabus: View document
Instructor: Joanne Luciano
Synopsis: The aim of the Applied Data Science course is to provide the skills needed to apply data science principles on real world applications at every stage in the data science workflow. The course is organized around each stage covering the algorithms, best practices, and evaluation criteria. Both good and bad application examples will be discussed to help the student develop an intuition and deeper understanding of the choice of algorithm for the data, and the development of the best practices and methods for evaluating results of different approaches. Students will learn Tableau and use it to to visually analyze and report data.... show more

Data Science for Drug Discovery
Class: INFO I590
Section: 33633
Syllabus View document
Instructor: Joanne Luciano
Synopsis: With exploding healthcare costs, greater longevity and the widespread health challenges of diabetes, obesity, cancer and cardiovascular disease, today's medicine and healthcare will be a primary scientific and economic focus for the remainder of this century. Informatics and big data promise an understanding of health, disease and treatment on a scale never before imagined. This course will address the big data techniques that are being used in the drug discovery, healthcare and translational medicine domains. Some specific topics covered will include large-scale, integrated molecular datasets; cheminformatics and bioinformatics in a big data domain; storing and data mining of electronic medical records; visualization and mapping of diseases; bridging the clinical and molecular; smart devices for smart health; and data mining for healthcare economics.... show more

Data Science On-Ramp
Class: INFO I590
Section: 33634
Syllabus:View Document
Instructor: Ying Ding
Credit Hours: 1 - 3
Synopsis: A course dealing with self-paced modules to build and strengthen core competencies necessary for Data Science curriculum. Individual lessons vary from beginner to intermediate and will cover C++, MongoDB, R, Java, Python, Tableau, SQL, Hadoop/MapReduce, Spark, Scala, Github, Web Scraping, and Text Mining (NLP). If you would like descriptions of each lesson and how these will be mapped to credit, please consult Professor Ying Ding for more information.... show more

Data Semantics
Class: INFO I590
Section: 13972
Instructor: Ying Ding
Synopsis: The class explores the technologies of the Semantic Web by examining the application of technologies to WWW information delivery and the principles of formal logic and computation guiding their development.

Data Visualization
Class: INFO I590
Section: 14120 (online), 33510, 9363
Instructor: Yong-Yeol Ahn
Synopsis: From dashboards in a car to cutting-edge scientific papers, we extensively use visual representation of data. As our world becomes increasingly connected and digitized and as more decisions are being driven by data, data visualization is becoming a critical skill for every knowledge worker. In this course we will learn fundamentals of data visualization and create visualizations that can provide insights into complex datasets.

Python
Class: INFO I590
Section: 33636
Instructor: Vel Melbasa
Synopsis: This course provides a gentle yet intense introduction to programming with Python for students who have little or no prior experience in programming. Python, an open-source language that allows rapid application development of both large and small software ystems, is object-oriented by design and provides an excellent platform for learning the basics of language programming. The course will focus on planning and organizing programs, and developing high quality working software that solves real problems.... show more

Graduate Internship
Class: INFO I591
Section: 14072, 14073
Instructor: David Wild
Section: 7839, 7840
Instructor: Steven Myers
Credit Hours: 0 - 6
Synopsis: Students gain professional work experience in an industry or research organization setting, using skills and knowledge acquired in Informatics course work. May be repeated for a maximum of 6 credit hours.

Independent Study
Class: INFO I699
Section: 14074, 14075
Instructor: David Wild
Section: 6736
Instructor: Martin Siegel
Credit Hours: 1 - 3
Synopsis: Independent readings and research for MS students under the direction of a faculty member, culminating in a written report.



School of Public and Environmental Affairs

Statistical Analysis for Effective Decision-Making
Class: SPEA V506
Section:9717
Syllabus:View document
Instructor:Kand McQueen
Synopsis: An introduction to statistics. Nature of statistical data. Ordering and manipulation of data. Measures of central tendency and dispersion. Elementary probability. Concepts of statistical inference decision: estimation and hypothesis testing. Special topics discussed may include regression and correlation, analysis of variance, nonparametric methods.This course will provide an introduction to the analysis of quantitative data via statistical analyses. Topics covered include, but are not limited to, descriptive statistics, z-scores, probability, z-tests, ttests, correlation, regression. The focus is on the practical interpretation and application of statistics.... show more



Computer Science

Elements of Artificial Intelligence
Class: CSCI B551
Section: 13963
Instructor: David Crandall
Synopsis: Introduction to major issues and approaches in artificial intelligence. Principles of reactive, goal-based, and utility-based agents. Problem-solving and search. Knowledge representation and design of representational vocabularies. Inference and theorem proving, reasoning under uncertainty, and planning. Overview of machine learning. ... show more



Data Science

Data Science in Practice
Class: DSCI D590
Section: 33205
Instructor: David Wild, Kyle Stirling
Synopsis: This course connects interested data science project sponsors with data science students, so that together both can accomplish something neither could achieve alone. The overarching goal for the course is for the students to experience the real-world work of Data Science and to complete short consulting/technical projects in small teams. This course is for anyone who applies their expertise to the demands of data-driven decision making and analysis. This is a “learning by doing” course on the practice of delivering data science expertise. ... show more

Graduate Internship
Class: DSCI D591
Section: 33206, 33207
Credits: 0-3 Instructor: David Wild
Synopsis: Students gain professional work experience in an industry or research organization setting, using skills and knowledge acquired in Informatics course work. May be repeated for a maximum of 6 credit hours.

Independent Study
Class: DSCI D699
Section: 33208, 33209
Credits: 1-3 Instructor: David Wild
Synopsis: Independent readings and research for M.S. students under the direction of a faculty member, culminating in a written report.



Engineering

Cloud Computing
Class: ENGR E516
Section:
Instructor: Gregor Von Laszewski
Synopsis: The course covers all aspects of the cloud architecture stack, from Software as a Service (large-scale biology and graphics applications), Platform as a Service (MapReduce (Hadoop), Iterative MapReduce (Twister) and NoSQL (HBase)), to Infrastructure as a Service (low-level virtualization technologies). At the end of this course, you will have learned key concepts in cloud computing and enough programming to be able to solve data analysis problems on your own. ... show more

Intro to High Performance Computing
Class: ENGR E517
Section: 14273
Instructor: Thomas Sterling
Synopsis: Students will learn about the development, operation, and application of HPC systems, making them prepared to address future challenges demanding capability and expertise. The course combines critical elements from hardware technology and architecture, system software and tools, and programming models and application algorithms with the cross-cutting theme of performance management and measurement. ... show more

Information Visualization
Class: ENGR E583
Section:
Instructor: Katy Börner, Michael Ginda
Synopsis: Introduces information visualization, highlighting processes which produce effective visualizations. Topics include perceptual basis of information visualization, data analysis to extract relationships, and interaction techniques.



Information and Library Science

Search
Class: ILS Z534
Section: 33038
Instructor: Zheng Gao
Synopsis: The success of commercial search engines shows that information retrieval is key to helping users find the information they seek. This course provides an introduction to information retrieval theories and concepts underlying all search applications. We investigate techniques used in modern search engines and demonstrate their significance via experiment. ... show more

Social Media Mining
Class: ILS Z639
Section: 12908
Instructor: Vincent Malic
Synopsis: Those taking this course will receive a graduate-level introduction to social media mining and methods, as well as hands-on experience mining social data for social meaning extraction (focus on sentiment analysis) using automated methods and machine learning technologies. We will read, discuss, and critique claims and findings from contemporary research related to SMM. ... show more



Informatics

Big Data Software and Projects
Class: INFO I524
Section: 13054
Instructor: Gregor Von Laszewski
Synopsis: This course studies software HPC-ABDS used in either High Performance Computing or open source commercial Big Data cloud computing. The student builds analysis systems using this software on clouds and then uses it in a project either chosen by the student or selected from a list given by the instructor. Credit given for only one of INFO-I424 or I524. ... show more

Applied Machine Learning
Class: INFO I526
Section:
Instructor: James Shanahan
Synopsis: The main aim of the course is to provide skills to apply machine learning algorithms on real applications. We will devote less time to learning algorithms and math/theory, and instead spend more time with hands-on skills required for algorithms to work on a variety of datasets.

Systems & Protocol Security & Info Assurance
Class: INFO I533
Section: 13926
Instructor: Steve Myers
Synopsis: This course looks at systems and protocols, how to design threat models for them and how to use a large number of current security technologies and concepts to block specific vulnerabilities. Students will use numerous systems and programming security tools in the laboratories.

Data Science for Drug Discovery, Health and Translational Medicine
Class: INFO I590
Section: 11892
Instructor: Joanne Luciano
Synopsis: With exploding healthcare costs, greater longevity and the widespread health challenges of diabetes, obesity, cancer and cardiovascular disease, today's medicine and healthcare will be a primary scientific and economic focus for the remainder of this century. Informatics and big data promise an understanding of health, disease and treatment on a scale never before imagined. This course will address the big data techniques that are being used in the drug discovery, healthcare and translational medicine domains. Some specific topics covered will include large-scale, integrated molecular datasets; cheminformatics and bioinformatics in a big data domain; storing and data mining of electronic medical records; visualization and mapping of diseases; bridging the clinical and molecular; smart devices for smart health; and data mining for healthcare economics. ... show more

Data Science On-Ramp
Class: INFO I590
Section: 14150
Credits: 1-3 Instructor: Ying Ding
Synopsis: Self-paced modules to build and strengthen core competencies necessary for Data Science curriculum. Individual lessons vary from beginner to intermediate and will cover C++, MongoDB, R, Java, Python, Tableau, SQL, Hadoop/MapReduce, Spark, Scala, Github, Web Scraping, and Text Mining (NLP). If you would like descriptions of each lesson and how these will be mapped to credit, please consult Professor Ying Ding for more information. ... show more

Intro to Business Analytics Modeling
Class: INFO I590
Section: 14152
Instructor: Doug Blocher, Rex Cutshall
Synopsis: In this course, we develop analytical models using simulation and optimization to analyze and recommend sound solutions to complex business problems. Models are discussed to solve sophisticated problems using various tools on spreadsheets, including Excel solver for linear, integer and genetic programming problems, probabilistic simulations, and risk analysis including statistical analysis of simulation models. ... show more

Network Science
Class: INFO I590
Section: 14149
Instructor: Yong-Yeol Ahn
Synopsis: Networks are everywhere. We can easily find network structure in many complex systems around us: our cells, brains, society, etc.The inherent generality of network approach allowed wide applications of network theory to flourish across diverse fields including biology, sociology, and epidemiology. The questions that we will address in the class are the following: Why do networks matter? What are the fundamental theories to understand the structure and dynamics of networks? How has it been applied to other fields? What are the frontiers of the research? We will explore key papers ranging from the fundamental theory to the various applications of network theory. This course will focus more on round-table discussion between students than presentation. Students will work on research projects in groups and finish a paper at the end of the class. ... show more

Python
Class: INFO I590
Section: 14262
Instructor: Vel Melbasa
Synopsis: This course provides a gentle, yet intense, introduction to programming using Python for students with little or no prior experience in programming. Python, an open-source language that allows rapid application development of both large and small software systems, is object-oriented by design and provides an excellent platform for learning the basics of language programming. The course will focus on planning and organizing programs, and developing high quality, working software that solves real problems.... show more

Real World Data Science
Class: INFO I590
Section: 14530
Instructor: Joanne Luciano
Synopsis: The purpose of this course is to provide Data Science graduate students with practical experience applying their data science skillsets to real-world datasets. Data for the first offering of this course in 2017 used a deidentified clinical trials dataset provided by Eli Lilly, but subsequent offerings could include public data or data provided by other industry partners. Students will be led through the full data analysis process of data preparation, model planning, model building, analysis, and communication of results. Students will meet (virtually or physically) daily to devise a plan. ... show more

SQL and NoSQL
Class: INFO I590
Section: 14149
Instructor: Ying Ding
Synopsis: A database is the central focus in data science to store and manage data. Relational databases have empowered major industries for decades and are still widely adopted. In our new era of Big Data, the database landscape is undergoing significant change. Many non-relational databases become an important part of the enterprise data architecture of companies. Relational databases were developed long before the Internet and the Web to tackle the issues of central-controlled data storage and management. NoSQL databases emerged with the rise of Internet and Web applications to connect companies with customers (i.e., online or mobile) and to develop the agility to adapt faster. The new challenges of being agile and being able to accommodate data variability/data integration drove enterprises to turn to NoSQL database technology. It is important for every data scientist to master the skills of current databases and know about the future of databases in a world of NoSQL. This course aims to provide the basic overview of the current database landscape, starting with relational databases and SQL, and moving to several different NoSQL databases, such as XML database and MongoDB. ... show more



School of Public and Environmental Affairs

Data Analysis and Modeling in Public Affairs
Class: SPEA P507
Section: 13838
Instructor: Barry Rubin
Synopsis: P507 provides students of public and environmental affairs and related disciplines with a detailed, intermediate-level perspective on statistical concepts and techniques for analyzing and modeling complex systems. The course content includes estimating the parameters of such models based on existing data, testing hypotheses about these systems, and forecasting. The context of the course is the application of these techniques to problems and policies in public and environmental affairs. Multivariate regression analysis is one of the primary tools for statistical modeling for purposes of policy analysis, program evaluation, simulation of systems, and general forecasting. Thus, most of the course is devoted to single equation regression models and the extension of these models to a variety of situations. A prerequisite for the class is a graduate-level, introductory statistics course that includes coverage of the simple (two-variable) regression model and an introduction to multivariate regression. ... show more

Statistical Analysis for Effective Decision-Making
Class: SPEA V506
Section: 13839
Instructor: Kand McQueen
Synopsis: An introduction to statistics. Nature of statistical data. Ordering and manipulation of data. Measures of central tendency and dispersion. Elementary probability. Concepts of statistical inference decision: estimation and hypothesis testing. Special topics discussed may include regression and correlation, analysis of variance, nonparametric methods. This course will provide an introduction to the analysis of quantitative data via statistical analyses. Topics covered include, but are not limited to, descriptive statistics, z-scores, probability, z-tests, t-tests, correlation, regression. The focus is on the practical interpretation and application of statistics. ... show more



School of Public Health

Semiparametric Regression with R
Class: SPH Q650
Section:
Instructor: Jaroslaw Harezlak
Synopsis: Semiparametric regression methods build on parametric regression models by allowing more flexible relationships between the predictors and the response variables. Examples of semiparametric regression include generalized additive models, additive mixed models and spatial smoothing. Our goal is to provide an easy-to-follow applied course on semiparametric regression methods using R. There is a vast body of literature on the semiparametric regression methods. However, most of it is geared towards researchers with advanced knowledge of statistical methods. This course explains the techniques and benefits of semiparametric regression in a concise and modular fashion. Spline functions, linear mixed models and hierarchical models are shown to play an important role in semiparametric regression. There will be a strong emphasis on implementation in R with a lot of computing exercises. This course is based on the book ‘Semiparametric Regression with R’ by J. Harezlak, D. Ruppert, and M.P. Wand (Springer). ... show more



Statistics

Introduction to Statistics
Class: STAT S520
Section: 13452
Instructor: Jianyu Wang
Synopsis: This course introduces the basic concepts of statistical inference through a careful study of several important procedures. Topics include 1- and 2-sample location problems, the one-way analysis of variance, and simple linear regression. Most assignments involve applying probability models and/or statistical methods to practical situations and/or actual datasets. S320 is the basic version of this course, intended for undergraduates. It is the gateway to more advanced courses offered by the Department of Statistics. S520 is an expanded version of S320 that covers additional material. S520 serves two constituencies: Graduate students in quantitative disciplines who are looking for a solid introduction to statistics and who may want to take additional courses in statistics, and graduate students pursuing an M.S. in Applied Statistics who desire a more gentle introduction to the fundamental principles of statistical inference than is provided in the more theoretical STAT S620. ... show more



Data Science

Graduate Internship
Class: DSCI D591
Section:
Credits: 0-3
Instructor: David Wild
Synopsis: Students gain professional work experience in an industry or research organization setting, using skills and knowledge acquired in Informatics course work. May be repeated for a maximum of 6 credit hours.

Independent Study
Class: DSCI D699
Section:
Credits: 1-3
Instructor: David Wild
Synopsis: Independent readings and research for MS students under the direction of a faculty member, culminating in a written report.



Informatics

Applied Data Science
Class: INFO I590
Section: 8128
Instructor: Joanne Luciano
Synopsis: The aim of the Applied Data Science course is to provide the skills needed to apply data science principles on real world applications at every stage in the data science workflow. The course is organized around each stage covering the algorithms, best practices, and evaluation criteria. Both good and bad applications examples will be discussed to help the student develop an intuition and deeper understanding of the choice of algorithm for the data, the development of the best practices and methods for evaluating results of different approaches. Students will learn Tableau and use it to to visually analyze and report data. ... show more

Data Science On-Ramp
Class: INFO I590
Section: 8430
Credits: 1-3
Instructor: Ying Ding
Synopsis: Self-paced modules to build and strengthen core competencies necessary for Data Science curriculum. Individual lessons vary from beginner to intermediate and will cover C++, MongoDB, R, Java, Python, Tableau, SQL, Hadoop/MapReduce, Spark, Scala, Github, Web Scraping, and Text Mining (NLP). If you would like descriptions of each lesson and how these will be mapped to credit, please consult Professor Ying Ding for more information ... show more

Network Science
Class: INFO I590
Section: 8433
Instructor: Yeong-Yeol Ahn
Synopsis: Networks are everywhere. We can easily find network structure in many complex systems around us: Topics include perceptual basis of information visualization, data analysis to extract relationships, and interaction techniques. ... show more

Python
Class: INFO I590
Section: 8431
Instructor: Vel Malbasa
Synopsis: This course provides a gentle, yet intense, introduction to programming using Python for students with little or no prior experience in programming. Python, an open-source language that allows rapid application development of both large and small software ystems, is object-oriented by design and provides an excellent platform for learning the basics of language programming. The course will focus on planning and organizing programs, and developing high quality, working software that solves real problems. ... show more

SQL and NOSQL
Class: INFO I590
Section: 8432
Instructor: Ying Ding
Synopsis: Database is the central focus in data science to store and manage data. Relational database has empowered the main industries for decades and is still widely adopted. In the new era of big data, database landscape is undergoing significant change. Many non-relational databases become an important part of the enterprise data architecture of companies. Relational databases were developed long before the Internet and the Web to tackle the issues of central-controlled data storage and management. NoSQL databases emerged with the rise of Internet and Web applications to connect companies with customers (i.e., online or mobile) and to develop with agility to adapt to faster changes. The new challenges of being agile and being able to accommodate data variablity/data integration drive enterprises to turn to NoSQL database technology. It is important for every data scientist to master the skills of current database and know about the future of databases in a world of NoSQL. This course aims to provide the basic overview of the current database landscape, starting with relational databases, SQL, and moving to several different NoSQL databases, such as XML database and MongoDB. ... show more



Statistics

Introduction to Statistics
Class: STAT S520
Section:
Instructor: Jianyu Wang
Synopsis: This course introduces the basic concepts of statistical inference through a careful study of several important procedures. Topics include 1- and 2-sample location problems, the one-way analysis of variance, and simple linear regression. Most assignments involve applying probability models and/or statistical methods to practical situations and/or actual data sets. S320 is the basic version of this course, intended for undergraduates. It is the gateway to more advanced courses offered by the Department of Statistics. S520 is an expanded version of S320 that covers additional material (see syllabus below). S520 serves two constituencies: Graduate students in quantitative disciplines who are looking for a solid introduction to statistics and who may want to take additional courses in statistics, and graduate students pursuing an M.S. in Applied Statistics who desire a more gentle introduction to the fundamental principles of statistical inference than is provided in the more theoretical STAT S620. ... show more