shap

1.原理

https://github.com/shap/shap

output=base rate+shap(AGE)+shap(SEX)+shap(BP)+shap(BMI)

0.4=0.1+0.4+(-0.3)+(0.1)+(0.1)

shap值0说明这个特征对最终预测没有起作用,正数说明是正向作用,负数说明是负向作用,值越大说明该特征对预测值影响越大

2.使用

1.单样本,不同特征的shap值

2.单特征,特征值和shap的关系

3.两个特征,特征值和shap的关系

3.问题

特征工程

https://zhuanlan.zhihu.com/p/111296130

1.特征预处理

0.是否去重

1.缺失值

均值补全

2.异常值

检测异常值

数值范围

sigma准则

knn

箱线图

处理异常值

剔除

均值补全

2.特征表示

特征分类:数值特征,文本特征,类别特征

1.数值特征

1.直接使用数值

2.离散化

分桶

2.类别特征

1.one hot

2.embedding

3.其他

catboost

3.特征选择

https://blog.csdn.net/Datawhale/article/details/120582526

大致分为3种,filter,wrapper,embedded

特征稀疏

What are sparse features?

Features with sparse data are features that have mostly zero values. This is different from features with missing data.

Why is machine learning difficult with sparse features?

Common problems with sparse features include:

  1. If the model has many sparse features, it will increase the space and time complexity of models. Linear regression models will fit more coefficients, and tree-based models will have greater depth to account for all features.
  2. Model algorithms and diagnostic measures might behave in unknown ways if the features have sparse data. Kuss [2002] shows that goodness-of-fit tests are flawed when the data is sparse.
  3. If there are too many features, models fit the noise in the training data. This is called overfitting. When models overfit, they are unable to generalize to newer data when they are put in production. This negatively impacts the predictive power of models.
  4. Some models may underestimate the importance of sparse features and given preference to denser features even though the sparse features may have predictive power. Tree-based models are notorious for behaving like this. For example, random forests overpredict the importance of features that have more categories than those features that have fewer categories.

Methods for dealing with sparse features

  1. Removing features from the model

  2. Make the features dense

  3. Using models that are robust to sparse features

参考

https://www.kdnuggets.com/2021/01/sparse-features-machine-learning-models.html#:~:text=%20Methods%20for%20dealing%20with%20sparse%20features%20,that%20are%20robust%20to%20sparse%20features%20More%20


:D 一言句子获取中...