Pandas DataFrame | memory_usage method
Start your free 7-days trial now!
Pandas DataFrame.memory_usage(~)
returns the amount of memory each column occupies in bytes.
Parameters
1. index
| boolean
| optional
Whether to include the memory usage of the index (row labels) as well. By default, index=True
.
2. deep
| boolean
| optional
Whether to look into actual memory usage of object
types. For DataFrames that contain object types (e.g. strings), the memory usage would be not be accurate. This is because the method takes a crude estimate on memory consumed by object
types. By default, deep=False
.
Return Value
A Series
that holds the memory usage of each column of the source DataFrame in bytes.
Examples
Basic usage
Consider the following DataFrame:
df = pd.DataFrame({"A":[4,5,6],"B":["K","KK","KKK"], "C": [True,False,True], "D":[4.0,5.0,6.0]}, index=[10,11,12])df
A B C D10 4 K True 4.011 5 KK False 5.012 6 KKK True 6.0
Here's the breakdown of the data-types:
df.dtypes
A int64B objectC boolD float64dtype: object
Computing memory usage of each column:
df.memory_usage()
Index 24A 24B 24C 3D 24dtype: int64
Columns A
and D
use types int64
and float64
respectively. 64
bits is equal to 8
bytes, and since we have 3 values in each column, we have a total memory usage of 8*3=24
for columns A
and D
.
Next, let's tackle the boolean column. A boolean occupies 1
byte each, so again, that column uses 1*3=3
bytes in total.
Finally, let's look at column B
, which holds the data-type string
. In Pandas, all strings are classified as objects. The method memory_usage, by default, naively assumes that each object takes up 8 bytes of memory without doing any form of inspection. However, the actual memory consumed obviously varies depending on the internals of the object (e.g. a long string occupies more space than a short one). We can get a more accurate representation of the memory usage by setting deep=True
.
Specifying deep=True
To get a more accurate representation of the memory consumption of object
types:
df.memory_usage(deep=True)
Index 24A 24B 185C 3D 24dtype: int64
We see that column B
actually takes up 185 bytes.
Specifying index=False
To exclude the memory usage of the index (row labels):
df.memory_usage(index=False)
A 24B 24C 3D 24dtype: int64