Reducing DataFrame memory size in Pandas
Start your free 7-days trial now!
There are two main ways to reduce DataFrame memory size in Pandas without necessarily compromising the information contained within the DataFrame:
Use smaller numeric types
Convert object columns to categorical columns
Examples
Consider the following DataFrame:
df
A B1 7 A2 8 B3 9 A4 10 B5 11 A6 12 B
To check the memory usage of the DataFrame:
<class 'pandas.core.frame.DataFrame'>Int64Index: 6 entries, 1 to 6Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 6 non-null int64 1 B 6 non-null objectdtypes: int64(1), object(1)memory usage: 444.0 bytes
Note here that:
The memory usage of the DataFrame is 444 bytes
Datatype of column
A
isint64
Datatype of column
B
isobject
Smaller numeric types
To reduce the memory usage we can convert column A
to int8
:
<class 'pandas.core.frame.DataFrame'>Int64Index: 6 entries, 1 to 6Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 6 non-null int8 1 B 6 non-null objectdtypes: int8(1), object(1)memory usage: 402.0 bytes
Note that:
Column
A
has been converted toint8
The memory usage of the DataFrame has decreased from 444 bytes to 402 bytes
You should always check the minimum and maximum numbers in the column you would like to convert to a smaller numeric type. By using a smaller numeric type you are able to reduce memory usage, however, at the same time you will lose precision which may be significant depending on the analysis you are trying to perform. Below is a reference for the range of numbers supported by each datatype:
Datatype | Integer range supported |
---|---|
| -128 to 127 |
| -32768 to 32767 |
| -9223372036854775808 to 9223372036854775807 |
Categorical columns
Here is the DataFrame we are working with again:
df
A B1 7 A2 8 B3 9 A4 10 B5 11 A6 12 B
To reduce the memory usage we can convert datatype of column B
from object
to category
:
<class 'pandas.core.frame.DataFrame'>Int64Index: 6 entries, 1 to 6Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 6 non-null int64 1 B 6 non-null categorydtypes: category(1), int64(1)memory usage: 326.0 bytes
Note here that:
Column
B
has been converted fromobject
tocategory
The memory usage of the DataFrame has decreased from 444 bytes to 326 bytes
For object
columns, each value in the column is stored as a Python string in memory. Even if the same value appears multiple times in the column, each time a new string will be stored in memory. By converting to a categorical column, a single string is only stored once in memory, even if it appears multiple times within the column. This allows us to save memory usage.
Categorical columns are suited for columns that only take on a fixed number of possible values. Examples include blood type, marital status, etc.