On this page:
A Data-Centric Introduction to Computing
Monday, August 11th, 2025 12:21:41pm

A Data-Centric Introduction to Computing

Kathi Fisler, Shriram Krishnamurthi, Benjamin S. Lerner, Joe Gibbs Politz

    I Introduction

      1 Overview

        1.1 What This Book is About

        1.2 The Values That Drive This Book

        1.3 Our Perspective on Data

        1.4 What Makes This Book Unique

        1.5 Who This Book is For

        1.6 The Structure of This Book

        1.7 Organization of the Material

        1.8 Our Programming Language Choice

        1.9 Sending Us Feedback, Errors, and Comments

        1.10 Staying Up-To-Date

      2 Acknowledgments

    II Introduction to Programming

      3 Basic Data

        3.1 Getting Started

          3.1.1 Motivating Example: Flags

          3.1.2 Numbers

          3.1.3 Expressions

          3.1.4 Terminology

          3.1.5 Strings

          3.1.6 Images

            3.1.6.1 Combining Images

            3.1.6.2 Making a Flag

          3.1.7 Stepping Back: Types, Errors, and Documentation

            3.1.7.1 Types and Contracts

            3.1.7.2 Format and Notation Errors

            3.1.7.3 Finding Other Functions: Documentation

        3.2 Naming Values

          3.2.1 The Definitions Pane

          3.2.2 Naming Values

            3.2.2.1 Names Versus Strings

            3.2.2.2 Expressions versus Statements

          3.2.3 The Program Directory

            3.2.3.1 Understanding the Run Button

          3.2.4 Using Names to Streamline Building Images

        3.3 From Repeated Expressions to Functions

          3.3.1 Example: Similar Flags

          3.3.2 Defining Functions

            3.3.2.1 How Functions Evaluate

            3.3.2.2 Type Annotations

            3.3.2.3 Documentation

          3.3.3 Functions Practice: Moon Weight

          3.3.4 Documenting Functions with Examples

          3.3.5 Functions Practice: Cost of pens

          3.3.6 Recap: Defining Functions

        3.4 Conditionals and Booleans

          3.4.1 Motivating Example: Shipping Costs

          3.4.2 Conditionals: Computations with Decisions

          3.4.3 Booleans

            3.4.3.1 Other Boolean Operations

            3.4.3.2 Combining Booleans

          3.4.4 Asking Multiple Questions

          3.4.5 Evaluating by Reducing Expressions

          3.4.6 Composing Functions

            3.4.6.1 How Function Compositions Evaluate

            3.4.6.2 Function Composition and the Directory

          3.4.7 Nested Conditionals

          3.4.8 Recap: Booleans and Conditionals

      4 Tabular Data

        4.1 Introduction to Tabular Data

          4.1.1 Creating Tabular Data

          4.1.2 Extracting Rows and Cell Values

          4.1.3 Functions over Rows

          4.1.4 Processing Rows

            4.1.4.1 Finding Rows

            4.1.4.2 Ordering Rows

            4.1.4.3 Adding New Columns

            4.1.4.4 Calculating New Column Values

          4.1.5 Examples for Table-Producing Functions

          4.1.6 Lambda: Anonymous Functions

        4.2 Processing Tables

          4.2.1 Cleaning Data Tables

            4.2.1.1 Loading Data Tables

            4.2.1.2 Dealing with Missing Entries

            4.2.1.3 Normalizing Data

            4.2.1.4 Normalization, Systematically

            4.2.1.5 Using Programs to Detect Data Errors

          4.2.2 Task Plans

          4.2.3 Preparing Data Tables

            4.2.3.1 Creating bins

            4.2.3.2 Splitting Columns

          4.2.4 Managing and Naming Data Tables

          4.2.5 Visualizations and Plots

          4.2.6 Summary: Managing a Data Analysis

      5 Lists

        5.1 From Tables to Lists

          5.1.1 Basic Statistical Questions

          5.1.2 Extracting a Column from a Table

          5.1.3 Understanding Lists

            5.1.3.1 Lists as Anonymous Data

            5.1.3.2 Creating Literal Lists

          5.1.4 Operating on Lists

            5.1.4.1 Built-In Operations on Lists of Numbers

            5.1.4.2 Built-In Operations on Lists in General

            5.1.4.3 An Aside on Naming Conventions

            5.1.4.4 Getting Elements By Position

            5.1.4.5 Transforming Lists

            5.1.4.6 Recap: Summary of List Operations

          5.1.5 Lambda: Anonymous Functions

          5.1.6 Combining Lists and Tables

        5.2 Processing Lists

          5.2.1 Making Lists and Taking Them Apart

          5.2.2 Some Example Exercises

          5.2.3 Structural Problems with Scalar Answers

            5.2.3.1 my-len: Examples

            5.2.3.2 my-sum: Examples

            5.2.3.3 From Examples to Code

          5.2.4 Structural Problems that Transform Lists

            5.2.4.1 my-doubles: Examples and Code

            5.2.4.2 my-str-len: Examples and Code

          5.2.5 Structural Problems that Select from Lists

            5.2.5.1 my-pos-nums: Examples and Code

            5.2.5.2 my-alternating: Examples and Code

          5.2.6 Structural Problems Over Relaxed Domains

            5.2.6.1 my-max: Examples

            5.2.6.2 my-max: From Examples to Code

          5.2.7 More Structural Problems with Scalar Answers

            5.2.7.1 my-avg: Examples

          5.2.8 Structural Problems with Accumulators

            5.2.8.1 my-running-sum: First Attempt

            5.2.8.2 my-running-sum: Examples and Code

            5.2.8.3 my-alternating: Examples and Code

          5.2.9 Dealing with Multiple Answers

            5.2.9.1 uniq: Problem Setup

            5.2.9.2 uniq: Examples

            5.2.9.3 uniq: Code

            5.2.9.4 uniq: Reducing Computation

            5.2.9.5 uniq: Example and Code Variations

            5.2.9.6 uniq: Why Produce a List?

          5.2.10 Monomorphic Lists and Polymorphic Types

        5.3 Recursive Data

          5.3.1 Functions to Process Recursive Data

          5.3.2 A Template for Processing Recursive Data

          5.3.3 The Design Recipe

      6 Structured Data

        6.1 Introduction to Structured Data

          6.1.1 Understanding the Kinds of Compound Data

            6.1.1.1 A First Peek at Structured Data

            6.1.1.2 A First Peek at Conditional Data

          6.1.2 Defining and Creating Structured and Conditional Data

            6.1.2.1 Defining and Creating Structured Data

            6.1.2.2 Annotations for Structured Data

            6.1.2.3 Defining and Creating Conditional Data

          6.1.3 Programming with Structured and Conditional Data

            6.1.3.1 Extracting Fields from Structured Data

            6.1.3.2 Telling Apart Variants of Conditional Data

            6.1.3.3 Processing Fields of Variants

        6.2 Collections of Structured Data

          6.2.1 Lists as Collective Data

          6.2.2 Sets as Collective Data

            6.2.2.1 Picking Elements from Sets

            6.2.2.2 Computing with Sets

          6.2.3 Combining Structured and Collective Data

          6.2.4 Data Design Problem: Representing Quizzes

      7 Trees

        7.1 Trees

          7.1.1 Data Design Problem – Ancestry Data

            7.1.1.1 Computing Genetic Parents from an Ancestry Table

            7.1.1.2 Computing Grandparents from an Ancestry Table

            7.1.1.3 Creating a Datatype for Ancestor Trees

          7.1.2 Programs to Process Ancestor Trees

          7.1.3 Summarizing How to Approach Tree Problems

          7.1.4 Study Questions

      8 Foundations: Bonus Materials

        8.1 Functions as Data

          8.1.1 A Little Calculus

          8.1.2 A Helpful Shorthand for Anonymous Functions

          8.1.3 Streams From Functions

          8.1.4 Combining Forces: Streams of Derivatives

        8.2 Queues from Lists

          8.2.1 Using a Wrapper Datatype

          8.2.2 Combining Answers

          8.2.3 Using a Picker

          8.2.4 Using Tuples

          8.2.5 A Picker Method

        8.3 Examples, Testing, and Program Checking

          8.3.1 From Examples to Tests

          8.3.2 More Refined Comparisons

          8.3.3 When Tests Fail

          8.3.4 Oracles for Testing

    III From Pyret to Python

      9 From Pyret to Python

        9.1 From Pyret to Python

          9.1.1 Expressions, Functions, and Types

          9.1.2 Returning Values from Functions

          9.1.3 Examples and Test Cases

          9.1.4 An Aside on Numbers

          9.1.5 Conditionals

          9.1.6 Creating and Processing Lists

            9.1.6.1 Filters, Maps, and Friends

          9.1.7 Data with Components

            9.1.7.1 Accessing Fields within Dataclasses

          9.1.8 Traversing Lists

            9.1.8.1 Introducing For Loops

            9.1.8.2 An Aside on Order of Processing List Elements

            9.1.8.3 Using For Loops in Functions that Produce Lists

            9.1.8.4 Summary: The List-Processing Template for Python

            9.1.8.5 for each loops in Pyret

              9.1.8.5.1 Variables that can change

              9.1.8.5.2 block notation

              9.1.8.5.3 How for each works

              9.1.8.5.4 Testing and variables that can change

        9.2 Dictionaries

          9.2.1 Creating and Using a Dictionary

          9.2.2 Searching Through the Values in a Dictionary

          9.2.3 Dictionaries with More Complex Values

          9.2.4 Dictionaries versus Dataclasses

            Summary

        9.3 Arrays

          9.3.1 Two Memory Layouts for Ordered Items

          9.3.2 Iterating Partly through an Ordered Datum

      10 Tables in Python via Pandas

        10.1 Introduction to Pandas

          10.1.1 Pandas Table Basics

            10.1.1.1 Core Datatypes: DataFrame and Series

            10.1.1.2 Creating and Loading DataFrames

            10.1.1.3 Using Labels and Indices to Access Cells

          10.1.2 Filtering Rows

          10.1.3 Cleaning and Normalizing Data

            10.1.3.1 Clearing out unknown values

            10.1.3.2 Repairing Values and Column Types

          10.1.4 Computing New Columns

          10.1.5 Aggregating and Grouping Columns

          10.1.6 Wide Versus Tall Data

            Converting Between Wide and Tall Data

          10.1.7 Plotting Data

          10.1.8 Takeaways

        10.2 Reshaping Tables

          10.2.1 Binning Rows

          10.2.2 Wide versus Tall Datasets

      11 File Input and Output in Python

        11.1 File Input and Output in Python

          11.1.1 Basic File Operations

          11.1.2 Reading CSV Files Step by Step

          11.1.3 Processing and Filtering Data

          11.1.4 Writing CSV Files

    IV Programming With State

      12 Programming with State (in Both Pyret and Python)

        12.1 State, Change, and Testing

          12.1.1 Example: Bank Accounts

          12.1.2 Testing Functions that Mutate Data

          12.1.3 Aliasing

          12.1.4 Data Mutation and the Directory

            12.1.4.1 Introducing the Heap

            12.1.4.2 Basic Data and the Heap

        12.2 Understanding Equality

          12.2.1 Equality of Data

          12.2.2 Different Equality Operations

            12.2.2.1 Equality in Python

            12.2.2.2 Equality in Pyret

        12.3 Arrays and Lists in Memory

        12.4 Cyclic Data

          12.4.1 Creating Cyclic Data

          12.4.2 Testing Cyclic Data

          12.4.3 Cycles in Practice

        12.5 Modifying Variables

          12.5.1 Modifying Variables in Memory

          12.5.2 Variable Updates and Aliasing

          12.5.3 Updating Variables versus Updating Data Fields

          12.5.4 Updating Parameters in Function Calls

          12.5.5 Updating Top-Level Variables within Function Calls

          12.5.6 The Many Roles of Variables

      13 More Programming with State: Python

        13.1 Mutable Lists

    V Algorithm Analysis

      14 Predicting Growth

        14.1 A Little (True) Story

        14.2 The Analytical Idea

        14.3 A Cost Model for Pyret Running Time

        14.4 The Size of the Input

        14.5 The Tabular Method for Singly-Structurally-Recursive Functions

        14.6 Creating Recurrences

        14.7 A Notation for Functions

        14.8 Comparing Functions

        14.9 Combining Big-Oh Without Woe

        14.10 Solving Recurrences

      15 Halloween Analysis

        15.1 A First Example

        15.2 The New Form of Analysis

        15.3 An Example: Queues from Lists

          15.3.1 List Representations

          15.3.2 A First Analysis

          15.3.3 More Liberal Sequences of Operations

          15.3.4 A Second Analysis

          15.3.5 Amortization Versus Individual Operations

        15.4 Reading More

    VI Data Structures with Analysis

      16 Directed Acyclic Graphs

        16.1 Sharing and Equality

          16.1.1 Re-Examining Equality

          16.1.2 The Cost of Evaluating References

          16.1.3 Notations for Equality

          16.1.4 On the Internet, Nobody Knows You’re a DAG

          16.1.5 It’s Always Been a DAG

          16.1.6 From Acyclicity to Cycles

        16.2 The Size of a DAG

          16.2.1 Stage 1

          16.2.2 Stage 2

          16.2.3 Stage 3

          16.2.4 Stage 4

          16.2.5 Stage 5

          16.2.6 What We’ve Learned

          16.2.7 More on Value Printing: An Aside from Racket

      17 Graphs

        17.1 Introducing Graphs

          17.1.1 Understanding Graphs

          17.1.2 Representations

            17.1.2.1 Links by Name

            17.1.2.2 Links by Indices

            17.1.2.3 A List of Edges

            17.1.2.4 Abstracting Representations

          17.1.3 Measuring Complexity for Graphs

        17.2 Basic Graph Traversals

          17.2.1 Reachability

            17.2.1.1 Simple Recursion

            17.2.1.2 Cleaning up the Loop

            17.2.1.3 Traversal with Memory

            17.2.1.4 A Better Interface

          17.2.2 Depth- and Breadth-First Traversals

        17.3 Graphs With Weighted Edges

        17.4 Shortest (or Lightest) Paths

        17.5 Moravian Spanning Trees

          17.5.1 The Problem

          17.5.2 A Greedy Solution

          17.5.3 Another Greedy Solution

          17.5.4 A Third Solution

          17.5.5 Checking Component Connectedness

      18 Several Variations on Sets

        18.1 Representing Sets as Lists

          18.1.1 Representation Choices

          18.1.2 Time Complexity

          18.1.3 Choosing Between Representations

          18.1.4 Other Operations

        18.2 Making Sets Grow on Trees

          18.2.1 Using Binary Trees

          18.2.2 Checking the Complexity

          18.2.3 A Fine Balance: Tree Surgery

            18.2.3.1 Left-Left Case

            18.2.3.2 Left-Right Case

            18.2.3.3 Any Other Cases?

        18.3 Union-Find

          18.3.1 Implementing with State

          18.3.2 Optimizations

          18.3.3 Analysis

        18.4 Hashes, Sets, and Key-Values

          18.4.1 A Hash Function for Strings

          18.4.2 Sets from Hashing

          18.4.3 Arrays

          18.4.4 Sets from Hashing and Arrays

          18.4.5 Collisions

          18.4.6 Resolving Collisions

          18.4.7 Complexity

          18.4.8 Bloom Filters

          18.4.9 Generalizing from Sets to Key-Values

        18.5 Equality, Ordering, and Hashing

          18.5.1 Converting Values to Ordered Values

          18.5.2 Hashing in Practice

          18.5.3 Equality and Ordering

        18.6 Sets as a Case Study

          18.6.1 Nature of the Data

          18.6.2 Nature of the Operations

          18.6.3 Nature of the Guarantee

    VII Advanced Topics

      19 State and Equality

        19.1 Boxes: A Canonical Mutable Structure

        19.2 Mutation and Types

        19.3 Mutation and Equality

        19.4 Another Equality Predicate

        19.5 A Hierarchy of Equality

        19.6 Space and Time Complexity

        19.7 What it Means to be Identical

        19.8 Comparing Functions

      20 Recursion and Cycles from Mutation

        20.1 Partial Definitions

        20.2 Recursive Functions

        20.3 Premature Evaluation

        20.4 Cyclic Lists Versus Streams

      21 Detecting Cycles

        21.1 A Running Example

        21.2 Types

        21.3 A First Checker

        21.4 Complexity

        21.5 A Fabulous Improvement

        21.6 Testing

      22 Avoiding Recomputation by Remembering Answers

        22.1 An Interesting Numeric Sequence

          22.1.1 Using State to Remember Past Answers

          22.1.2 From a Tree of Computation to a DAG

          22.1.3 The Complexity of Numbers

          22.1.4 Abstracting Memoization

        22.2 Edit-Distance for Spelling Correction

        22.3 Nature as a Fat-Fingered Typist

        22.4 Dynamic Programming

          22.4.1 Catalan Numbers with Dynamic Programming

          22.4.2 Levenshtein Distance and Dynamic Programming

        22.5 Contrasting Memoization and Dynamic Programming

      23 Partial Domains

        23.1 A Non-Solution

        23.2 Exceptions

        23.3 The Option Type

        23.4 Total Domains, Dynamically

        23.5 Total Domains, Statically

        23.6 Summary

        23.7 A Note on Notation

      24 Staging

        24.1 Problem Definition

        24.2 Initial Solution

        24.3 Refactoring

        24.4 Separating Parameters

        24.5 Context

      25 Factoring Numbers

      26 Deconstructing Loops

        26.1 Setup: Two Functions

        26.2 Abstracting a Loop

        26.3 Is It Really a Loop?

        26.4 Re-Examining for

        26.5 Rewriting Pollard-Rho

        26.6 Nested Loops

        26.7 Loops, Values, and Customization

    VIII Interactive Programs

      27 Interactive Games as Reactive Systems

        27.1 About Reactive Animations

        27.2 Preliminaries

        27.3 Version: Airplane Moving Across the Screen

          27.3.1 Updating the World State

          27.3.2 Displaying the World State

          27.3.3 Observing Time (and Combining the Pieces)

        27.4 Version: Wrapping Around

        27.5 Version: Descending

          27.5.1 Moving the Airplane

          27.5.2 Drawing the Scene

          27.5.3 Finishing Touches

        27.6 Version: Responding to Keystrokes

        27.7 Version: Landing

        27.8 Version: A Fixed Balloon

        27.9 Version: Keep Your Eye on the Tank

        27.10 Version: The Balloon Moves, Too

        27.11 Version: One, Two, ..., Ninety-Nine Luftballons!

    IX Appendices

      28 Pyret for Racketeers and Schemers

        28.1 Numbers, Strings, and Booleans

        28.2 Infix Expressions

        28.3 Function Definition and Application

        28.4 Tests

        28.5 Variable Names

        28.6 Data Definitions

        28.7 Conditionals

        28.8 Lists

        28.9 First-Class Functions

        28.10 Annotations

        28.11 What Else?

      29 Pyret vs. Python

      30 Comparing This Book to HtDP

      31 Release Notes

        31.1 Version 2025-02-09

        31.2 Version 2024-09-03

        31.3 Version 2023-02-21

        31.4 Version 2022-08-28

        31.5 Version 2022-01-25

        31.6 Version 2021-08-21

      32 Glossary