library(tidymodels)
tidymodels_prefer()
Introduction to TidyModel in R
Basic Introduction to Tidymodels using R
In this blog post, we will explore Tidymodels, a collection of packages for modeling and machine learning using R. This is part of what I learnt in the R for Data Science online Learning Community.
What is Tidymodels?
Tidymodels is a suite of packages that provides a consistent and flexible approach to modeling in R. It is part of the tidyverse, an ecosystem of R packages designed for data science.
Installing Tidymodels
To install Tidymodels, you can use the install.packages()
function in R.
Basic Usage of Tidymodels
Let’s go through a simple example of using Tidymodels for linear regression.
Loading the necessary libraries
Preparing the data
For this example, we’ll use the mtcars
dataset that comes with R. Let’s split this data into a training set and a testing set.
data(mtcars)
set.seed(123) # For reproducibility
<- initial_split(mtcars, prop = 0.75)
car_split
<- training(car_split)
car_train
<- testing(car_split) car_test
Building the model
We’ll try to predict miles per gallon (mpg
) based on the other variables in the dataset. First, let’s specify our model:
<- linear_reg() %>%
lm_spec set_engine("lm") %>%
set_mode("regression")
Next, let’s fit our model to the training data:
<- lm_spec %>%
lm_fit fit(mpg ~ ., data = car_train)
We can now use this model to make predictions on the test data:
<- lm_fit %>%
predictions predict(new_data = car_test)
<- car_test %>%
predictions select(mpg) %>%
bind_cols(predict(lm_fit, car_test))
predictions
mpg .pred
Mazda RX4 Wag 21.0 21.71246
Valiant 18.1 20.64933
Merc 450SE 16.4 12.94019
Merc 450SL 17.3 14.67981
Lincoln Continental 10.4 10.79525
Toyota Corona 21.5 25.01139
Camaro Z28 13.3 13.08460
Pontiac Firebird 19.2 16.27870
Conclusion
In this post, we have introduced Tidymodels, a powerful tool for modeling in R. We have seen how to install and use Tidymodels, and how it integrates with the tidyverse ecosystem. With Tidymodels, you can streamline your modeling workflow and make it more consistent and reproducible. In my next blog post I will explain how to tune hyper-parameters and also how to perform cross validation.