df = pd.DataFrame({"A":[2,4,6],"B":[3,4,5]})
df
                
            
               A  B
0  2  3
1  4  4
2  6  5

To compute the covariance of two columns:


        
        
            
                
                
                    df.cov()
                
            
               A    B
A  4.0  2.0
B  2.0  1.0

Here, we get the following results:

the sample covariance of columns A and B is 2.0.
the sample variance of column A is 4.0 and that of column B is 1.0.

Specifying min_periods

Consider the following DataFrame with some missing values:


        
        
            
                
                
                    df = pd.DataFrame({"A":[3,np.NaN,4],"B":[5,6,7]})
df
                
            
               A    B
0  3.0  5.0
1  NaN  6.0
2  4.0  7.0

Setting min_periods=3 yields:


        
        
            
                
                
                    df.cov(min_periods=3)
                
            
               A    B
A  NaN  NaN
B  NaN  1.0

Here, the reason why we get NaN is that, since the method ignores NaN, column A only has 2 values. Since we've set the minimum threshold to compute the covariance to be 3, we end up with a DataFrame filled with NaN.

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official Pandas Documentation

https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.cov.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!