Pandas copy on write

As of August 2025, I find the Pandas 2.3 documentation on Copy-on-Write and views versus copies very confusing. I guess I am not alone. This note covers:

How does Pandas store data?

pandas.DataFrame and pandas.Series are the two core objects. Series contains a one-dimensional array, behaving like a numpy.ndarray, plus a pandas.Index object. Just like built-in container objects, think of a Series as just a bunch of references to Python objects that serve as the underlying data. These data could be anything, including mutable or immutable objects.

Similarly, the DataFrame is best thought of (for copy semantics purposes) as a two-dimensional numpy.ndarray containing data, along with two Index objects, which label the rows and columns.

What do we mean by "view" and "copy"?

If DataFrame or Series object b is created from a, it may be:

Returning a view versus a copy (when subsetting in Pandas)

When Copy-on-Write is turned off in global options, any operation like

newdf = df['column1']
newdf = df.loc[:10, 'column1']
newdf = df[:10]
newdf = df.iloc[0]

is, according to the (confusingly written) docs1, ambiguous about whether newdf is a view or copy of df. If we now run newdf['column1'] = newdata, we don't know whether df is modified too. Similarly, "chained assignment" operations like df['column1'][:10] = newdata obviously try to modify df but do not actually guarantee that df is modified. However, any operation like

df['column1'] = newdata
df.loc[:10, 'column1'] = newdata
df[:10] = newdata
df.iloc[0] = newdata

gets translated into a call to __setitem__, which modifies df in-place.

What is Copy-on-Write?

Again, I find the docs confusing2. In the "Description" section the docs say:

CoW means that any DataFrame or Series derived from another in any way always behaves as a copy. As a consequence, we can only change the values of an object through modifying the object itself. CoW disallows updating a DataFrame or a Series that shares data with another DataFrame or Series object inplace.

I would rewrite this as:

With CoW enabled, a new DataFrame or Series that is derived from another in any way (e.g. by subsetting, appending, updating another) will unambiguously behave as a copy of the old one:

Of course, things like passing a DataFrame to a function, or writing df1 = df2 will continue to just create new references to the same DataFrame, not create a new copy of it.

What is the purpose of Copy-on-Write warnings?

Users frequently get warnings that look like

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.

This is a warning that you have created some new DataFrame derived from an existing one, and are now modifying the new one. Once CoW is turned on, the behavior of your code may change. It also may mean the code gives unexpected behavior with CoW turned off; as noted above, it is typically ambiguous (without CoW) whether the new DataFrame is a view or copy of the existing one.

For example, the documentation has this code:

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
df2 = df.reset_index(drop=True)
df2.iloc[0, 0] = 100

With CoW, df2 behaves like a copy, and only df2 is modified.

Without CoW, df2 is a view with the exact same data as df, and both DataFrames are modified. Pandas warns you about modifying df2 because it is derived from df.

To address the warning, you can either refactor so this pattern doesn't occur, turn on CoW if CoW behavior is desirable, or explicitly copy() the old DataFrame when creating the new one.


  1. Unfortunately the Pandas documentation page jumps straight to chained indexing, which is confusing, but the key line is in parentheses:

    See that __getitem__ in there? Outside of simple cases, it’s very hard to predict whether it will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees)...

  2. Part of the problem is the docs start with a description of old behavior, then jump straight to how to migrate, without defining what this thing even means. Also, the statement "CoW disallows updating a DataFrame or a Series that shares data with another DataFrame or Series object inplace" seems simply wrong as written, because it is perfectly fine to update any object in-place. It's just that users sometimes try updating an object by first constructing a view of it and then modifying the view, which will no longer work.