Difference between a group's count and size in Pandas
Start your free 7-days trial now!
The difference between a group's count()
and size()
is the following:
count()
returns the number of non-nan
values for each column. If there is more than one column, then a DataFrame is returned.size()
returns the length, that is, the number of rows of a group. This method does not differentiate betweennan
and non-nan
values.
Example
Consider the following DataFrame about some products:
df = pd.DataFrame({"price":[500,300,700, 200,np.nan], "brand": ["apple", "google", "apple", "google","apple"], "device":["phone","phone","computer","phone","phone"]}, index=["a","b","c","d","e"])df
price brand devicea 500.0 apple phoneb 300.0 google phonec 700.0 apple computerd 200.0 google phonee NaN apple phone
Notice how we have a missing value (nan
) for the last product.
Here's the count()
of each brand
group:
df.groupby("brand").count()
price devicebrand apple 2 3google 2 2
Note the following:
the return type is
DataFrame
,the count for apple's price is
2
, since only non-nan
values are counted.
Now, consider the size()
of each brand group:
df.groupby("brand").size()
brandapple 3google 2dtype: int64
Note the following:
the return type is
Series
.the size of brand
apple
is 3 since the size just counts the number of rows of each group.