NumPy | var method
Start your free 7-days trial now!
NumPy's var(~)
method computes the variance of values in the input array. The variance is computed using the following formula:
Where:
$N$ is the size of the given array (i.e. the sample size)
$x_i$ is the value of the $i$th index in the Numpy array
$\bar{x}$ is the sample mean
var(~)
method can also compute the unbiased estimate of the variance. We do this by setting ddof=1
in the parameters, as we shall see later in the examples.
Parameters
1. a
| array-like
The array on which to perform the method.
2. axis
link | int
or tuple
| optional
The axis along which we compute the variance. For 2D arrays, the allowed values are as follows:
Axis | Meaning |
---|---|
0 | Variance will be computed column-wise |
1 | Variance will be computed row-wise |
None | Variance will be computed on a flattened array |
By default, axis=None
.
3. dtype
| string
or type
| optional
The type used to compute the variance. If the input array is of type int
, then float32
will be used. If the input array is of another numerical type, then its type will be used.
4. ddof
link | int
| optional
The delta degree of freedom. This can be used to modify the denominator in the front:
By default, ddof=0
.
Return value
If axis=None
, then a single float
representing the variance of all the values in the array is returned. Otherwise, a Numpy array is returned.
Examples
Variance of a 1D array
np.var([1,2,3,4])
1.25
Computing sample variance
To compute the sample variance, set ddof=1
:
np.var([1,2,3,4], ddof=1)
1.6666666666666667
Computing population variance
To compute the population variance, leave out the ddof
parameter or explicitly set ddof=0
:
np.var([1,2,3,4]) # By default, ddof=0
1.25
Variance of a 2D array
Entire array
Without specifying the axis parameter, Numpy will just regard your Numpy array as a flattened array.
np.var([[1,2],[3,4]])
1.25
This code is fundamentally the same as np.var([1,2,3,4])
.
Column-wise
To compute the variance column-wise, specify axis=0
in the parameters:
np.var([[1,4],[2,6], [3,8]], axis=0)
array([0.66666667, 2.66666667])
Here, we're computing the variance of [1,2,3]
(i.e. the first column) as well as [4,6,8]
(i.e. the second column).
Row-wise
To compute the variance column-wise, specify axis=1
in the parameters:
np.var([[1,4],[2,6], [3,8]], axis=1)
array([2.25, 4. , 6.25])
Here, we're computing three variances: first row (i.e. [1,4]
), second row (i.e. [2,6]
) and third row (i.e. [3,8]
).
Sometimes the numerical type float32
may not be accurate enough for your needs. If your application requires more accurate numbers, then set dtype=np.float64
in the argument. This will take up more memory, but will provide a more accurate result.