df = pd.DataFrame({"A":[7,8,9,10,11,12],"B":["A","B","A","B","A","B"]}, index = [1,2,3,4,5,6])
df
                
            
                A  B
1   7  A
2   8  B
3   9  A
4  10  B
5  11  A
6  12  B

To check the memory usage of the DataFrame:


        
        
            
                
                
                    df.info(memory_usage="deep")
                
            
            <class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 1 to 6
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       6 non-null      int64 
 1   B       6 non-null      object
dtypes: int64(1), object(1)
memory usage: 444.0 bytes

Note here that:

The memory usage of the DataFrame is 444 bytes
Datatype of column A is int64
Datatype of column B is object

Smaller numeric types

To reduce the memory usage we can convert column A to int8:


        
        
            
                
                
                    df["A"] = df["A"].astype('int8')
df.info(memory_usage="deep")
                
            
            <class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 1 to 6
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       6 non-null      int8  
 1   B       6 non-null      object
dtypes: int8(1), object(1)
memory usage: 402.0 bytes

Note that:

Column A has been converted to int8
The memory usage of the DataFrame has decreased from 444 bytes to 402 bytes

You should always check the minimum and maximum numbers in the column you would like to convert to a smaller numeric type. By using a smaller numeric type you are able to reduce memory usage, however, at the same time you will lose precision which may be significant depending on the analysis you are trying to perform. Below is a reference for the range of numbers supported by each datatype:

Datatype	Integer range supported
`int8`	-128 to 127
`int16`	-32768 to 32767
`int64`	-9223372036854775808 to 9223372036854775807

Categorical columns

Here is the DataFrame we are working with again:


        
        
            
                
                
                    df = pd.DataFrame({"A":[7,8,9,10,11,12],"B":["A","B","A","B","A","B"]}, index = [1,2,3,4,5,6])
df
                
            
                A  B
1   7  A
2   8  B
3   9  A
4  10  B
5  11  A
6  12  B

To reduce the memory usage we can convert datatype of column B from object to category:


        
        
            
                
                
                    df["B"] = df["B"].astype('category')
df.info(memory_usage="deep")
                
            
            <class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 1 to 6
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   A       6 non-null      int64   
 1   B       6 non-null      category
dtypes: category(1), int64(1)
memory usage: 326.0 bytes

Note here that:

Column B has been converted from object to category
The memory usage of the DataFrame has decreased from 444 bytes to 326 bytes

For object columns, each value in the column is stored as a Python string in memory. Even if the same value appears multiple times in the column, each time a new string will be stored in memory. By converting to a categorical column, a single string is only stored once in memory, even if it appears multiple times within the column. This allows us to save memory usage.

WARNING

Categorical columns are suited for columns that only take on a fixed number of possible values. Examples include blood type, marital status, etc.

Published by Arthur Yanagisawa

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!