reading-notes

reading notes for code fellows


Project maintained by dLeigh01 Hosted on GitHub Pages — Theme by mattgraham

Linear Regression

Linear regression will be important going forwards in analyzing and predicting data from data sets.

How to Run Linear Regression in Python

Scikit-learn is a Python module used for machine learning that contains functions for regression, classification, clustering, model selection, and dimensionality reduction. For linear regression, you may want to use the linear_model module, which contains methods intended for regression in which the target value is expected to be a linear combination of the input values. The first step is always to import your data, but once you’ve done that and set it up as a pandas dataframe you can import LinearRegression from the linear_model module and create a variable for it. The most important functions within this for what we’re doing are

With the coefficients and intercept, you can create a useful dataframe to see their correlation and put the corresponding data into a .scatter() plot. Using .predict(x)[range], you can plot your predictions against the actual data. Since this often produces some errors, you can use .mean((original data - prediction) ** 2) to get the mean squared error.

Generally, when actually implementing linear regression on real data, you won’t use the entire data set, it will instead be split into training and test data sets. To properly randomize the data split, scikit has a function called train_test_split to do it for you. To test whether you still have errors in your data, you can use a residual plot, which should leave your data points randomly scattered around the zero line without structure.

Things I’d Like to Know More About

I’m still not 100% clear on where some of the commands in the demo were coming from, and I will be excited to go a bit more in depth into all of the steps here.

Discussion

Linear regression within coding seems a lot like linear regression in mathematics in that I think the hardest piece of it is just going to be figuring out all the values rather than figuring out how to code it.

[< table of contents]

[< home]