October 29, 2019 at 7:26 pm #149070
I am a newbie in regression especially with categorical features. In Python we can do this by pd.getdummies() which creates a 1/0 dummy variable.
However, if the no of categories is small like Male/Female for Gender, then it makes sense to create these dummy variables.
Here we have a lot of categories here for eg Title would be unique for each book, so there would be as many categories of “Title” column as number of rows. And it will generalize to the test set, which would contain a completely different set of Book Titles. Or should we treat Title like an ID column which cant be used in the modelling?
And similarly with the other categorical variables like Author, Edition,etc.
And can reviews be treated as a numeric column?
And could someone explain the Ratings column? is it the number of reviews?
And how can we use the Synopsis column?
I guess the Genre column can be tretead with one-hot encoding. But do all categories of Genre occur in the test set? And I have the same question for Book Category as well.
- Registered Users
- Topic Tags