Difference between copy and view in Pandas
Start your free 7-days trial now!
Main differences
If object A
is a copy of object B
, then each object will be allocated their own memory block for its data. This means that modifying the copy will not mutate the original data and vice versa.
If object A
is a view of object B
, then they both share a single memory block for their data. This means that modifying the copy will mutate the original data and vice versa.
In the context of Pandas, when you access values of Series or a DataFrame, what is returned can either be a copy or view. This distinction is important for two reasons:
if you don't know whether the return value is a copy or view, then you will not know what happens when you modify the return value - will it mutate the original data or not?
if you're dealing with large datasets, then you might not want the return value to be a copy since copies take up more memory.
Accessing values of a Series/DataFrame
Unfortunately, when you access values from a Series or a DataFrame, the rule that decides whether a copy or view is returned is quite complicated.
Here is a general rule of thumb:
if you access a single column, then a view is returned (e.g.
df["A"]
).if you access multiple columns, then a copy is returned (e.g.
df[["A","B"]]
)
Here's a quick demo - consider the following DataFrame:
df = pd.DataFrame({"A":[3,4],"B":[5,6]})df
A B0 3 51 4 6
Getting a view
To illustrate that accessing a single column returns a view:
col_A = df["A"] # col_A is a viewcol_A[0] = 9df
A B0 9 61 4 7
Notice how modifying col_A
mutated the original df
. Note that modifying df
will also mutate col_A
.
Getting a copy
To illustrate that accessing multiple columns returns a copy:
cols_A_B = df[["A","B"]] # cols_A_B is a copycols_A_B.iloc[0,0] = 9df
A Ba 3 6b 4 7
Notice how df
did not get mutated.
Other cases
For other cases, whether a copy or view is returned depends on the situation. Whenever in doubt, it is good practise to use the _is_view
property to verify:
df = pd.DataFrame({"A":[3,4],"B":[6,7]})df[["A","B"]]._is_view
False
Methods that can return copies
Many Pandas methods allow you to specify whether you want a view or a copy. For instance, consider the method to_numpy(~)
, which returns a NumPy array representation of a DataFrame.
By default, copy=False
, which means that modifying the returned NumPy array would mutate the original DataFrame:
numpy_view = df.to_numpy() # copy=Falsenumpy_view[0,0] = 9 # modify top-left entrydf
A Ba 9 6b 4 7
Note that modifying df
would also affect numpy_view
.
To get a new copy of NumPy array that is independent of the original DataFrame, set copy=True
like so:
numpy_copy = df.to_numpy(copy=True)numpy_copy[0,0] = 9df
A Ba 3 6b 4 7
Notice how modifying the returned value does not mutate the original df
.
Copying Pandas object
Pandas has the method copy(~)
that makes a copy of a Pandas object:
df = pd.DataFrame({"A":[3,4],"B":[6,7]}, index=["a","b"])df_copy = df.copy()df_copy.iloc[0,0] = 10df
A Ba 3 6b 4 7