Prompt Detail:
Certainly! With a tabular dataset containing 100K rows and 3 columns (Age, Gender, and Income), there are several insights and statistical analyses you can extract using Python's Pandas library. Here's a step-by-step guide:
Summary Statistics:
Descriptive Statistics: Use the
describe()
function to compute summary statistics for numeric columns like Age and Income, including count, mean, standard deviation, minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum.pythonimport pandas as pd # Assuming your DataFrame is named df summary_stats = df.describe()
Group-wise Summary Statistics: Calculate summary statistics based on Gender groups. This will give insights into differences in Age and Income between different genders.
pythongroup_stats = df.groupby('Gender').describe()
Meaningful Graphs:
Histograms by Age and Income: Create histograms to visualize the distribution of Age and Income separately.
pythonimport matplotlib.pyplot as plt # Histogram for Age df['Age'].plot(kind='hist', bins=20, color='skyblue', edgecolor='black') plt.title('Age Distribution') plt.xlabel('Age') plt.ylabel('Frequency') plt.show() # Histogram for Income df['Income'].plot(kind='hist', bins=20, color='lightgreen', edgecolor='black') plt.title('Income Distribution') plt.xlabel('Income') plt.ylabel('Frequency') plt.show()
Box Plots by Gender: Generate box plots to compare the distribution of Age and Income between different genders.
python# Box plot for Age by Gender df.boxplot(column='Age', by='Gender') plt.title('Age Distribution by Gender') plt.ylabel('Age') plt.show() # Box plot for Income by Gender df.boxplot(column='Income', by='Gender') plt.title('Income Distribution by Gender') plt.ylabel('Income') plt.show()
Scatter Plot of Income vs. Age: Plot Income against Age to explore any potential relationships between the two variables.
pythondf.plot.scatter(x='Age', y='Income', color='teal') plt.title('Income vs. Age') plt.xlabel('Age') plt.ylabel('Income') plt.show()
Additional Analysis:
Correlation: Calculate the correlation coefficient between Age and Income to quantify the strength and direction of the relationship (if any).
pythoncorrelation = df['Age'].corr(df['Income']) print("Correlation between Age and Income:", correlation)
Group-wise Mean Income: Compute the mean Income for each Gender group to compare the average income between different genders.
pythonmean_income_by_gender = df.groupby('Gender')['Income'].mean()
Age Distribution by Income Bracket: Divide the dataset into income brackets and analyze the distribution of Age within each bracket.
These analyses and visualizations provide insights into the distribution, relationships, and characteristics of the Age and Income data in your dataset, helping you to better understand the underlying patterns and trends.