Prompt Detail:
Certainly! Here's an R program that takes a list of continuous variables, determines their distribution, and replaces normally distributed variables with their mean and skewed variables with their median:
R# Function to check if a variable is normally distributed
is_normal <- function(x) {
shapiro_test <- shapiro.test(x)
return(shapiro_test$p.value > 0.05) # Check if p-value is greater than 0.05
}
# Function to replace normally distributed variables with mean and skewed variables with median
replace_with_mean_or_median <- function(data) {
replaced_data <- data
for (col in names(data)) {
if (is.numeric(data[[col]])) {
if (is_normal(data[[col]])) {
replaced_data[[col]] <- mean(data[[col]], na.rm = TRUE)
} else {
replaced_data[[col]] <- median(data[[col]], na.rm = TRUE)
}
}
}
return(replaced_data)
}
# Example data (replace this with your own data)
data <- data.frame(
var1 = rnorm(100), # Normally distributed
var2 = rpois(100, 5), # Skewed (Poisson distribution)
var3 = runif(100), # Normally distributed
var4 = rgamma(100, 2) # Skewed (Gamma distribution)
)
# Replace normally distributed variables with mean and skewed variables with median
replaced_data <- replace_with_mean_or_median(data)
# Print the original and replaced data
cat("Original Data:\n")
print(data)
cat("\nReplaced Data:\n")
print(replaced_data)
Save to grepper
In this program:
We define a function
is_normal
that checks if a variable is normally distributed using the Shapiro-Wilk test (p-value > 0.05 indicates normal distribution).We define a function
replace_with_mean_or_median
that iterates through each column of the input data frame and replaces normally distributed variables with their mean and skewed variables with their median.We create an example data frame (
data
) with four continuous variables, two normally distributed (var1
andvar3
) and two skewed (var2
andvar4
).We apply the
replace_with_mean_or_median
function to the example data and print both the original and replaced data frames.
You can replace the data
variable with your own dataset to perform the same operation on your data.