Linear regression, in essence, is about computing the line of best fit given some data points. We can use NumPy's polyfit(~) method to find this line of best fit easily.

Here's some toy dataset, which we will visualize using matplotlib:


        
        
            
                
                
                    import matplotlib.pyplot as plt

x = [1,2,4,5]
y = [1,4,5,6]
plt.scatter(x, y)
plt.show()

This produces the following:

Our goal is to fit a linear line through the data points. We do this by using Numpy's polyfit(~) method:


        
        
            
                
                
                    fitted_coeff = np.polyfit(x, y, deg=1)
print(fitted_coeff)
                
            
            array([1.1, 0.7])

Here, the deg=1 just means that we want to fit a degree 1 polynomial, that is, the line y=mx+b. The returned values are the coefficients of the line of best fit, and the first value is the coefficient of the largest degree, that is, m=1.1 and b=0.7.

Let's graph the line of best fit to see how good it is:


        
        
            
                
                
                    x = [1,2,4,5]
y = [1,4,5,6]
plt.scatter(x, y)

line_x = np.linspace(1, 5, 100)
plt.plot(line_x, line_x * fitted_coeff[0] + fitted_coeff[1])

plt.show()

This produces the following:

This looks like a solid fit.

WARNING

Numpy's polyfit(~) method is just for computing the line of best fit

Numpy's polyfit(~) method does not compute any statistical measures like residuals and p-values. This method is only used when you just need the coefficients - nothing more, nothing less.

To perform linear regression at a more comprehensive level, use scipy.stats.linregress.

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!