*✍Tips and Tricks in Python*

*✍Tips and Tricks in Python*

# What is the difference between NaN ,None, pd.nan and np.nan?

**Warning**: *There is no magical formula or Holy Grail here, though a new world might open the door for you.*

**TL;NR:**

- First of all, there is no
`pd.nan`

, but do have`np.nan`

. - if a data is missing and showing NaN, be careful to use
`NaN ==np.nan`

.`np.nan`

is not comparable to`np.nan`

... directly.

np.nan == np.nanFalse

NaN is used as a placeholder for missing data *consistently* in pandas, consistency is good. I usually read/translate NaN as **“missing”**. *Also see the **‘working with missing data’** section in the docs.*

Wes writes in the docs ‘choice of NA-representation’:

After years of production use [NaN] has proven, at least in my opinion, to be the best decision given the state of affairs in NumPy and Python in general. The special value NaN (Not-A-Number) is usedeverywhereas the NA value, and there are API functions

isnulland

notnullwhich can be used across the dtypes to detect NA values.

...

Thus, I have chosen the Pythonic “practicality beats purity” approach and traded integer NA capability for a much simpler approach of using a special value in float and object arrays to denote NA, and promoting integer arrays to floating when NAs must be introduced.

*Note: the **“gotcha” that integer Series containing missing data are upcast to floats**.*

In my opinion the main reason to use NaN (over None) is that it can be stored with numpy’s float64 dtype, rather than the less efficient object dtype, *see **NA type promotions*.

# without forcing dtype it changes None to NaN!

s_bad = pd.Series([1, None], dtype=object)

s_good = pd.Series([1, np.nan])In [13]: s_bad.dtype

Out[13]: dtype('O')In [14]: s_good.dtype

Out[14]: dtype('float64')

Jeff comments (below) on this:

np.nanallows for vectorized operations; its a float value, while

None, by definition, forces object type, which basically disables all efficiency in numpy.

So repeat 3 times fast: object==bad, float==good

Saying that, many operations may still work just as well with None vs NaN (but perhaps are not supported i.e. they may sometimes give surprising results):

In [15]: s_bad.sum()

Out[15]: 1In [16]: s_good.sum()

Out[16]: 1.0

To answer the second question:

You should be using `pd.isnull`

and `pd.notnull`

to test for missing data (NaN).

`np.nan`

is not comparable to `np.nan`

... directly.

np.nan == np.nanFalse

yes, if a data is missing and showing NaN, be careful to use `NaN ==np.nan`

.

While

np.isnan(np.nan)True

Could also do

pd.isnull(np.nan)True

*examples*

Filters nothing because nothing is equal to `np.nan`

s = pd.Series([1., np.nan, 2.])

s[s != np.nan]0 1.0

1 NaN

2 2.0

dtype: float64

Filters out the null

s = pd.Series([1., np.nan, 2.])

s[s.notnull()]0 1.0

2 2.0

dtype: float64

Use odd comparison behavior to get what we want anyway. If `np.nan != np.nan`

is `True`

then

s = pd.Series([1., np.nan, 2.])

s[s == s]0 1.0

2 2.0

dtype: float64

Just `dropna`

s = pd.Series([1., np.nan, 2.])

s.dropna()0 1.0

2 2.0

dtype: float64

you can use

, it's worth noting that you can do this natively in pandas:**where**

`df1 = df.where(pd.notnull(df), None)`

Note: this changes the dtype of **all columns** to `object`

.

Example:

In [1]: df = pd.DataFrame([1, np.nan])In [2]: df

Out[2]:

0

0 1

1 NaNIn [3]: df1 = df.where(pd.notnull(df), None)In [4]: df1

Out[4]:

0

0 1

1 None

Note: what you cannot do recast the DataFrames `dtype`

to allow all datatypes types, using

, and then the DataFrame **astype**

method:**fillna**

`df1 = df.astype(object).replace(np.nan, 'None')`

*Unfortunately neither this, nor using **replace**, works with **None** see **this (closed) issue**.*

As an aside, it’s worth noting that for most use cases you don’t need to replace NaN with None, see this question about **the difference between NaN and None in pandas**.

However, in this specific case it seems you do.