特征稀疏

What are sparse features?

Features with sparse data are features that have mostly zero values. This is different from features with missing data.

Why is machine learning difficult with sparse features?

Common problems with sparse features include:

  1. If the model has many sparse features, it will increase the space and time complexity of models. Linear regression models will fit more coefficients, and tree-based models will have greater depth to account for all features.
  2. Model algorithms and diagnostic measures might behave in unknown ways if the features have sparse data. Kuss [2002] shows that goodness-of-fit tests are flawed when the data is sparse.
  3. If there are too many features, models fit the noise in the training data. This is called overfitting. When models overfit, they are unable to generalize to newer data when they are put in production. This negatively impacts the predictive power of models.
  4. Some models may underestimate the importance of sparse features and given preference to denser features even though the sparse features may have predictive power. Tree-based models are notorious for behaving like this. For example, random forests overpredict the importance of features that have more categories than those features that have fewer categories.

Methods for dealing with sparse features

  1. Removing features from the model

  2. Make the features dense

  3. Using models that are robust to sparse features

参考

https://www.kdnuggets.com/2021/01/sparse-features-machine-learning-models.html#:~:text=%20Methods%20for%20dealing%20with%20sparse%20features%20,that%20are%20robust%20to%20sparse%20features%20More%20

Author

Lavine Hu

Posted on

2021-09-30

Updated on

2022-04-12

Licensed under

# Related Post
  1.特征重要性
  2.shap
  3.feature scale
  4.特征向量化
  5.特征工程
Comments

:D 一言句子获取中...