Prompt Detail:
Statistics in STEM Study Sheet
Module 1 - Descriptive Statistics and Probability
1.1 What is data?
- Definition: Raw facts and figures representing quantities, characters, or symbols.
- Types:
- Qualitative (descriptive, non-numeric)
- Quantitative (numeric)
1.2 Data Visualization
- Definition: Graphical representation of information and data to understand trends, outliers, and patterns.
- Tools: Charts, graphs, maps.
1.3 Python for Data Visualization
- Libraries: matplotlib, seaborn, plotly.
- Purpose: Create static, animated, and interactive visualizations.
1.4 Data Frames
- Definition: Two-dimensional, size-mutable, heterogeneous tabular data with labeled axes.
- Use: Essential in data analysis, especially with Python's pandas library.
1.5 Bar Charts
- Represents: Categorical data.
- Features: Rectangular bars with lengths proportional to the values they represent.
1.6 Pie Charts
- Represents: Numerical proportion.
- Features: Circular charts divided into slices or segments; each slice represents a percentage of the whole.
1.7 Scatter Plots
- Represents: Values of two different variables.
- Features: Dots plotted on x and y axes to observe relationships.
1.8 Line Charts
- Represents: Information over intervals or time.
- Features: Data points connected by straight line segments.
1.9 Box Plots (Whisker Plots)
- Represents: Data through quartiles.
- Features: Shows median, interquartile range, and potential outliers.
1.10 Histograms
- Represents: Frequency of data points in specified ranges.
- Features: No gaps between bars, unlike bar charts.
1.11 Survey Sampling
- Definition: Selecting a subset from a population to estimate characteristics of the whole.
- Techniques: Random sampling, stratified sampling, cluster sampling.
1.12 Measures of Center
- Definition: Values describing the center point of a dataset.
- Types:
- Mean (average)
- Median (middle value)
- Mode (most frequent value)
1.13 Measures of Variability
- Definition: Quantify the spread or dispersion of data.
- Key Measures:
- Range
- Variance
- Standard Deviation
1.14 Probability
- Definition: Measures the likelihood of an event occurring.
- Scale: 0 (impossibility) to 1 (certainty).
1.15 Addition Rule and Complements
- Addition Rule: Finds the probability of the union of two events.
- Complement: The event that does not occur, denoted A′.
1.16 Multiplication Rule and Independence
- Multiplication Rule: Computes the probability of both events occurring.
- Independence: Two events are independent if one's occurrence doesn't affect the other's.
1.17 Conditional Probability
- Definition: Probability of an event given another event has occurred.
1.18 Bayes' Theorem
- Purpose: Revise predictions based on new evidence.
- Formula: Relates current to prior probability using a likelihood and a normalizing constant.
1.19 Combinations and Permutations
- Combinations: Arrangement where order doesn't matter.
- Permutations: Arrangement where order matters.
Key Terms & Formulas
- Describe: Summarizing main characteristics of a dataset.
- Mean:
- Median: Middle value in an ordered dataset.
- Variance:
- Standard Deviation:
- Distribution: How data points are spread. Types include uniform, normal, skewed.
- Measure of Central Tendency: Mean, median, mode.
- Deviation:
Examples:
- Mean of [2, 4, 6] = = 4
- Standard Deviation: If variance is 9, then standard deviation is 3.
Remember to practice with real datasets to solidify these concepts and always refer back to this sheet for quick reference!