Aspiring Data Science
373 subscribers
425 photos
11 videos
10 files
1.87K links
Заметки экономиста о программировании, прогнозировании и принятии решений, научном методе познания.
Контакт: @fingoldo

I call myself a data scientist because I know just enough math, economics & programming to be dangerous.
Download Telegram
#masters #featureselection #redundancy #multicollinearity

"In any application involving a large number of variables, it’s nice to be able to identify sets of variables that have significant redundancy. Of course, we may be unlucky and have a situation in which the small differences between largely redundant variables contain the useful information. However, this is the exception. In most applications,it is the redundant information that is most important; if some type of effect impacts multiple variables, it’s probably important. Because dealing with fewer variables is always better, if we can identify groups of variables that have great intra-group redundancy, we may be able to eliminate many variables from consideration, focusing on a weighted average of representatives from each group, or perhaps focusing on a single factor that is highly correlated with a redundant group. Or we might just be interested in the fact of redundancy, garnering useful insight from it."

О таком же подходе, когда несколько коллинеарных факторов заменяются одним, говорил и Эрни Чан. Тим для поиска групп связанных факторов использует PCA.