As of August 2025, I find the Pandas 2.3 documentation on Copy-on-Write and views versus copies very confusing. I guess I am not alone. This note covers:
pandas.DataFrame
and pandas.Series
are the two core objects. Series
contains a one-dimensional array, behaving like a numpy.ndarray
, plus a pandas.Index
object. Just like built-in container objects, think of a Series
as just a bunch of references to Python objects that serve as the underlying data. These data could be anything, including mutable or immutable objects.
Similarly, the DataFrame
is best thought of (for copy semantics purposes) as a two-dimensional numpy.ndarray
containing data, along with two Index
objects, which label the rows and columns.
If DataFrame
or Series
object b
is created from a
, it may be:
a is b
is true, the name a
references the same place in memory as b
a
. If any of these are modified, both a
and b
show the modification.b
are copied to a new place in memory. Modifying the underlying data in one object now leaves the other untouched. (However, if mutable objects are stored in DataFrames or Series, we could access and one of the objects and use its methods to make modifications, which would be reflected in both copies).When Copy-on-Write is turned off in global options, any operation like
newdf = df['column1']
newdf = df.loc[:10, 'column1']
newdf = df[:10]
newdf = df.iloc[0]
is, according to the (confusingly written) docs1, ambiguous about whether newdf
is a view or copy of df
. If we now run newdf['column1'] = newdata
, we don't know whether df
is modified too. Similarly, "chained assignment" operations like df['column1'][:10] = newdata
obviously try to modify df
but do not actually guarantee that df
is modified. However, any operation like
df['column1'] = newdata
df.loc[:10, 'column1'] = newdata
df[:10] = newdata
df.iloc[0] = newdata
gets translated into a call to __setitem__
, which modifies df
in-place.
Again, I find the docs confusing2. In the "Description" section the docs say:
CoW means that any DataFrame or Series derived from another in any way always behaves as a copy. As a consequence, we can only change the values of an object through modifying the object itself. CoW disallows updating a DataFrame or a Series that shares data with another DataFrame or Series object inplace.
I would rewrite this as:
With CoW enabled, a new DataFrame or Series that is derived from another in any way (e.g. by subsetting, appending, updating another) will unambiguously behave as a copy of the old one:
Of course, things like passing a DataFrame
to a function, or writing df1 = df2
will continue to just create new references to the same DataFrame
, not create a new copy of it.
Users frequently get warnings that look like
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
This is a warning that you have created some new DataFrame derived from an existing one, and are now modifying the new one. Once CoW is turned on, the behavior of your code may change. It also may mean the code gives unexpected behavior with CoW turned off; as noted above, it is typically ambiguous (without CoW) whether the new DataFrame is a view or copy of the existing one.
For example, the documentation has this code:
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
df2 = df.reset_index(drop=True)
df2.iloc[0, 0] = 100
With CoW, df2
behaves like a copy, and only df2
is modified.
Without CoW, df2
is a view with the exact same data as df
, and both DataFrames are modified. Pandas warns you about modifying df2
because it is derived from df
.
To address the warning, you can either refactor so this pattern doesn't occur, turn on CoW if CoW behavior is desirable, or explicitly copy()
the old DataFrame when creating the new one.
Unfortunately the Pandas documentation page jumps straight to chained indexing, which is confusing, but the key line is in parentheses:
See that
__getitem__
in there? Outside of simple cases, it’s very hard to predict whether it will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees)...
Part of the problem is the docs start with a description of old behavior, then jump straight to how to migrate, without defining what this thing even means. Also, the statement "CoW disallows updating a DataFrame or a Series that shares data with another DataFrame or Series object inplace" seems simply wrong as written, because it is perfectly fine to update any object in-place. It's just that users sometimes try updating an object by first constructing a view of it and then modifying the view, which will no longer work. ↩