Select a class to start practicing
Advanced Data Science
Midterm Prep · 70+ Questions
Intro to Java
Exam 1 Prep · 55+ Questions
Advanced Data Science · Midterm Prep · Every Homework Question
How data is classified determines which statistical methods are appropriate to use.
| Type | Ordered? | Arithmetic? | Examples |
|---|---|---|---|
| Ordinal Categorical | Yes — meaningful | No | S/M/L, months, survey ratings |
| Nominal Categorical | No | No | Colors, flavors, pet species |
| Discrete Numeric | Yes | Yes | Counts, die rolls, number of heads |
| Continuous Numeric | Yes | Yes | Height, temperature, time |
The foundation of probability — understanding how uncertainty arises and is formalized.
A mathematical bridge that maps experimental outcomes to numbers on the real line.
Any subset of the sample space we care about assigning a probability to.
How probability is spread across the possible values of a random variable.
Measures that summarize the center and spread of a dataset.
How spread out the data is — and how robust each measure is to outliers.
| Measure | Robustness to Outliers | Why |
|---|---|---|
| IQR | Most robust | Uses only middle 50%; ignores tails |
| Median Abs Dev (MAD) | Very robust | Uses median twice |
| Mean Abs Dev | Moderate | Uses the mean, which outliers pull |
| Sample Std Dev (s) | Sensitive | Squared deviations amplify outliers |
| Variance (s²) | Most sensitive | Squaring makes one outlier dominant |
How extreme values distort statistics and reveal the shape of a distribution.
| Statistic | Outlier Effect | Robust? |
|---|---|---|
| Mean | Strongly pulled toward outlier | No |
| Median | Little to no effect | Yes |
| Trimmed Mean (10%) | Removed if outlier is in the tail | Yes |
| Variance / Std Dev | Heavily inflated (squared deviations) | No |
| IQR | No effect (outlier is in the tail) | Yes |
| Extreme %iles (90th+) | Heavily distorted by interpolation | No |
| Median percentile (50th) | Same as median — robust | Yes |
The critical difference in formulas — and why Bessel's correction exists.
| Statistic | Population (÷ N) | Sample (÷ n−1) |
|---|---|---|
| Mean | \(\mu = \frac{1}{N}\sum X_i\) | \(\bar{x} = \frac{1}{n}\sum X_i\) |
| Variance | \(\sigma^2 = \frac{\sum(X_i-\mu)^2}{N}\) | \(s^2 = \frac{\sum(X_i-\bar{x})^2}{n-1}\) |
| Std Dev | \(\sigma = \sqrt{\sigma^2}\) | \(s = \sqrt{s^2}\) |
Computing probabilities under any Gaussian by standardizing to N(0,1).
| Z | Φ(Z) ≈ | Meaning |
|---|---|---|
| 0 | 0.500 | 50% below mean |
| 1.0 | 0.841 | ±1σ covers 68.3% of data |
| 1.2 | 0.885 | John's exam example |
| 1.645 | 0.950 | 90th percentile of N(0,1) |
| 1.96 | 0.975 | 95% CI boundary (two-sided) |
| 2.0 | 0.977 | ±2σ covers 95.4% of data |
| 3.0 | 0.9987 | ±3σ covers 99.7% of data |
Measuring how two variables move together — direction and strength.
| r value | Interpretation |
|---|---|
| +1 | Perfect positive linear relationship |
| 0.7 to 1 | Strong positive correlation |
| 0 to 0.3 | Weak positive correlation |
| ≈ 0 | No linear relationship (may have nonlinear!) |
| −1 to 0 | Negative correlation |
| −1 | Perfect negative linear relationship |
Understanding systematic and random sources of error in data collection.
| Bias Type | Who is biased? | Key Signal |
|---|---|---|
| Measurement | The instrument | Consistent systematic offset in all readings |
| Observer | The researcher | Prior expectations alter observations |
| Selection | The sampling process | Some population groups excluded |
| Non-response | The respondents | Low response rate; non-responders differ |
| Self-reporting | The participant | Inaccurate recall or reporting |
| Social desirability | The participant | Reporting what sounds good, not true |
Why sample means behave predictably — even when individual data doesn't.
| Standard Deviation (σ) | Standard Error (SE) | |
|---|---|---|
| Measures | Spread of individual data points | Spread of sample means across experiments |
| Formula | \(\sqrt{\frac{\sum(x_i-\bar{x})^2}{n-1}}\) | \(\sigma/\sqrt{n}\) |
| As n → ∞ | Stays the same (individual variation is real) | → 0 (means converge to μ) |
| Use when | Describing how variable individuals are | Describing precision of the mean estimate |
Distributed computing for datasets too large for a single machine.
spark = SparkSession.builder.getOrCreate(). Use it to read data, execute SQL, and create DataFrames.| Method | Purpose |
|---|---|
df.select("col1","col2") | Choose specific columns |
df.filter("condition") / df.where(...) | Filter rows by condition |
df.groupBy("col").agg(...) | Group rows and aggregate |
df.withColumn("new", expr) | Add or replace a column |
df.orderBy("col") | Sort rows |
df.join(other, on, how) | Join two DataFrames |
| Method | Purpose |
|---|---|
df.show(n) | Print first n rows to screen |
df.count() | Count total rows |
df.collect() | Return all rows to driver as Python list |
df.first() | Return first row |
df.write.*() | Write results to storage |
| Library | Best For | Limitation |
|---|---|---|
| NumPy | Fast math on arrays; SIMD-vectorized; C backend | Single machine only |
| Pandas | Tabular data manipulation on one machine; rich API | Single machine; limited by RAM |
| Spark | Datasets too big for one machine; cluster-scale | Overhead: scheduling, network, serialization — overkill for small data |
# Select specific columns
df.select("PassengerId", "Survived", "Pclass").show(5)
# Filter rows (two equivalent ways)
df.filter((col("Survived")==1) & (col("Pclass")==1)).show()
df.filter("Survived == 1").filter("Pclass == 1").show()
# Add a new column
df.withColumn("FamilySize", col("SibSp") + col("Parch")).show()
# Group and aggregate
df.groupBy("Pclass").count().orderBy("count", ascending=False).show()
# Read a CSV file
df = spark.read.csv("/path/file.csv", header=True, inferSchema=True)
A regular object class in Java has exactly 4 types of methods.
public Person(String name, int age){ this.name = name; this.age = age; }getFieldName().public String getName(){ return name; }
public int getAge(){ return age; }setFieldName(type param).public void setName(String name){ this.name = name; }
public void setAge(int age){ this.age = age; }String. No parameters. Called automatically when you print an object.public String toString(){
return "Person: " + name + ", Age: " + age;
}| Method Type | Return Type | Parameters | Purpose |
|---|---|---|---|
| Constructor | none (not void) | 0 or more | Initialize object |
| Accessor | matches field type | none | Get field value |
| Mutator | void | 1 (matching field) | Set field value |
| toString | String | none | Text representation |
this Keyword & ShadowingShadowing is one of the most common bugs in object classes.
// BUG — shadowing! name = name assigns param to itself
public void setName(String name){ name = name; }
// FIX — use this.name
public void setName(String name){ this.name = name; }this. keyword to explicitly refer to the instance field.name, param: n).
this keywordthis.fieldName always means the instance field, not a local variable. Also used to call another constructor: this(args).Same name, different parameters — Java picks the right version automatically.
public void print(int x){ ... } // version 1
public void print(String s){ ... } // version 2 — different param type
public void print(int x, int y){ ... } // version 3 — different param countreturn keywordreturn statement. The return type in the method signature must match exactly. void methods do not return values (but can use bare return; to exit early).public int add(int a, int b){ return a + b; } // returns int
public String greet(){ return "Hello"; } // returns String
public void print(){ System.out.println("Hi"); } // void, no return valuestatic KeywordStatic members belong to the class, not any particular instance.
ClassName.fieldName.public class Counter {
public static int count = 0; // one copy, shared
public Counter(){ count++; } // every new object increments the same count
}public static int square(int n){ return n * n; }
// called as: ClassName.square(5);static final), counters shared across all objects, utility/helper methods like Math.max().this — there is no object instance in scope.One class uses another class as a field — modeling real-world ownership.
University has a Person array.public class University {
private String name;
private Person[] faculty; // aggregation: University HAS-A Person[]
private int count;
...
}public void addPerson(String name, int age){
faculty[count] = new Person(name, age);
count++;
}Fixed-size, ordered collection of elements of the same type.
type[] name = new type[size]; — allocates a fixed block of memory. Size cannot change after creation.int[] scores = new int[5]; // size 5, all zeros String[] names = new String[3]; // size 3, all null
arr.length — total slots allocated.int[] data = new int[10]; // physical = 10 int count = 0; // logical starts at 0 data[count] = 42; count++; // logical becomes 1
// Print all elements (full array):
for(int i = 0; i < arr.length; i++){
System.out.println(arr[i]);
}
// Enhanced for-loop:
for(int val : arr){ System.out.println(val); }Dynamic array that grows and shrinks automatically. From java.util.ArrayList.
import java.util.ArrayList; ArrayList<String> list = new ArrayList<String>();
| Method | Purpose |
|---|---|
list.add(element) | Append element to end |
list.add(index, element) | Insert at position |
list.get(index) | Get element at index |
list.set(index, element) | Replace element at index |
list.remove(index) | Remove element at index |
list.size() | Number of elements (logical size) |
list.contains(obj) | Check if element exists |
.length. ArrayList uses .size().ArrayList<Integer> not ArrayList<int>. Autoboxing converts automatically.Object versions of Java primitives. Required for collections like ArrayList.
| Primitive | Wrapper Class |
|---|---|
| int | Integer |
| double | Double |
| char | Character |
| boolean | Boolean |
| long | Long |
| float | Float |
Integer.parseInt("42") // String → int
Integer.toString(42) // int → String
Double.parseDouble("3.14") // String → double
Character.isLetter('a') // true
Character.toUpperCase('a') // 'A'int to ArrayList<Integer>).ArrayList<Integer> nums = new ArrayList<>(); nums.add(5); // autoboxing: int 5 → Integer(5) int x = nums.get(0); // unboxing: Integer(5) → int 5
Strings are objects in Java — immutable and full of useful methods.
| Method | Returns | Description |
|---|---|---|
str.length() | int | Number of characters |
str.charAt(i) | char | Character at index i |
str.substring(a,b) | String | Chars from index a up to (not including) b |
str.toUpperCase() | String | All uppercase |
str.toLowerCase() | String | All lowercase |
str.trim() | String | Remove leading/trailing whitespace |
str.equals(other) | boolean | Content equality (use instead of ==) |
str.equalsIgnoreCase(other) | boolean | Case-insensitive equality |
str.contains(s) | boolean | True if s is a substring |
str.indexOf(s) | int | First index of s, or -1 |
str.replace(old,new) | String | Replace all occurrences |
str.split(regex) | String[] | Split into array by delimiter |
String.valueOf(x) | String | Convert any type to String |
+ to join strings. "Hello" + " " + "World" → "Hello World". Any type concatenated with a String is automatically converted..equals() not == to compare String content. == compares references (memory addresses), not values.Three ways to control decimal precision and field width.
| Specifier | Meaning | Example |
|---|---|---|
%d | Integer | printf("%d", 42) → 42 |
%f | Floating point | printf("%f", 3.14) → 3.140000 |
%.2f | 2 decimal places | printf("%.2f", 3.14159) → 3.14 |
%8d | Width 8, right-align | pads with spaces on left |
%-8d | Width 8, left-align | pads with spaces on right |
%s | String | printf("%s", "hi") → hi |
%n | Newline | platform-independent newline |
java.text.DecimalFormat. Formats numbers using a pattern string.import java.text.DecimalFormat;
DecimalFormat df = new DecimalFormat("0.00");
System.out.println(df.format(3.14159)); // → 3.14
DecimalFormat df2 = new DecimalFormat("#,##0.00");
System.out.println(df2.format(1234567.8)); // → 1,234,567.80String s = String.format("%.2f", 3.14159); // s = "3.14"
String t = String.format("%-10s|", "Hi"); // t = "Hi |"Understanding how Java passes data to methods.
byte · short · int · long · float · double · boolean · charvoid addTen(int x){ x += 10; } // only changes the copy
int n = 5;
addTen(n);
System.out.println(n); // still 5!void rename(Person p){ p.setName("Bob"); }
Person person = new Person("Alice", 20);
rename(person);
System.out.println(person.getName()); // "Bob" — object was changed!Generating random numbers and using nested iteration.
import java.util.Random; Random rng = new Random(); rng.nextInt(n) // random int: 0 to n-1 (inclusive) rng.nextInt(10) // 0–9 rng.nextInt(10) + 1 // 1–10 (shift range up) rng.nextInt(6) + 1 // 1–6 (simulate die roll) // General range [min, max] inclusive: rng.nextInt(max - min + 1) + min
rng.nextInt(b - a + 1) + afor(int i = 1; i <= 3; i++){
for(int j = 1; j <= 3; j++){
System.out.print(i * j + "\t");
}
System.out.println();
}
// Output: multiplication table 1..3Testing individual methods in isolation.
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;
public class PersonTest {
@Test
public void testGetName(){
Person p = new Person("Alice", 20);
assertEquals("Alice", p.getName());
}
}| Assert | What it checks |
|---|---|
assertEquals(expected, actual) | Two values are equal |
assertNotEquals(a, b) | Two values are not equal |
assertTrue(condition) | Condition is true |
assertFalse(condition) | Condition is false |
assertNull(obj) | Object is null |
assertNotNull(obj) | Object is not null |
@Test. If an assertion fails, the test immediately stops and reports failure. One assert per test is a good practice.