Pandas DataFrame | var method
Start your free 7-days trial now!
Pandas DataFrame.var(~)
method computes the variance of each row or column of the source DataFrame. The (unbiased) variance is computed using the following formula:
Where,
$N$ is the size of the row or column
$x_i$ is the value of the $i$-th index in the row or column
$\bar{x}$ is the mean of the values in the row or column.
The var(~)
method can also compute the population variance. We do this by setting ddof=0
.
Parameters
1. axis
link | int
or string
| optional
Whether to compute the variance column-wise or row-wise:
Axis | Description |
---|---|
| Variance is computed for each column. |
| Variance is computed for each row. |
By default, axis=0
.
2. skipna
| boolean
| optional
Whether or not to skip NaN
. Skipped NaN
would not count towards the total size ($N$). By default, skipna=True
.
3. level
| string
or int
| optional
The name or the integer index of the level to consider. This is needed only if your DataFrame is Multi-index.
4. ddof
| int
| optional
The delta degree of freedom. This can be used to modify the denominator:
By default, ddof=1
.
5. numeric_only
link | None
or boolean
| optional
The allowed values are as follows:
Value | Description |
---|---|
| Only numeric rows/columns will be considered (e.g. |
| Attempt computation with all types (e.g. strings and dates), and throw an error whenever the variance cannot be computed. |
| Attempt computation with all types, and ignore all rows/columns whose variance cannot be computed without raising an error. |
Note that the variance can only be computed when the +
operator is well-defined between the types.
By default, numeric_only=None
.
Return Value
If the level
parameter is specified, then a DataFrame
will be returned. Otherwise, a Series
will be returned.
Examples
Consider the following DataFrame:
df = pd.DataFrame({"A":[3,5,7], "B":[2,5,8]})df
A B0 3 21 5 52 7 8
Column-wise variance
To compute the variance for each column:
df.var() # axis=0
A 4.0B 9.0dtype: float64
Row-wise variance
To compute the variance for each row:
df.var(axis=1)
0 0.51 0.02 0.5dtype: float64
Specifying numeric_only
Consider the following DataFrame:
df = pd.DataFrame({"A":[3,5], "B":[True,5], "C":["x",7]})df
A B C0 3 True x1 5 5 7
Here, columns B
and C
are of mixed-type.
None
By default, numeric_only=None
, which means that rows/columns with mixed types will also be considered:
df.var() # numeric_only=None
A 2.0B 8.0dtype: float64
The reason why the variance is still computable for column B
is that, True
is internally represented as a 1
in Pandas. In contrast, the variance for column C
cannot be computed since "x"+7
is undefined.
False
numeric_only=False
means that the rows/columns of mixed type will also be considered, but an error will be raised if the variance is not computable:
df.var(numeric_only=False)
TypeError: could not convert string to float: 'x'
True
To compute the variance of numeric rows/columns only:
df.var(numeric_only=True)
A 2.0dtype: float64