Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data. This process can significantly improve the performance of machine learning models.
Types of Feature Engineering
- Feature Creation: Generating new features from existing ones.
- Feature Transformation: Changing the scale or distribution of features.
- Feature Selection: Choosing the most relevant features for your model.
Feature Creation Techniques
Combining Features
# Creating a BMI feature from height and weight
df['BMI'] = df['weight'] / (df['height'] / 100) ** 2
# Binning age into categories
df['age_group'] = pd.cut(df['age'], bins=[0, 18, 35, 50, 65, 100], labels=['0-18', '19-35', '36-50', '51-65', '65+'])
# Scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df['scaled_feature'] = scaler.fit_transform(df['feature'].values.reshape(-1, 1))
# One-hot encoding
df = pd.get_dummies(df, columns=['categorical_column'])
# Feature Selection
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import RandomForestClassifier
selector = SelectFromModel(RandomForestClassifier(n_estimators=100))
X_new = selector.fit_transform(X, y)