✍Tips and Tricks in Python

Data Structure Conversion

The conversion between list, dictionary, ndarray, Series and DataFrame

Ke Gui
The Startup
Published in
4 min readOct 28, 2020

--

Photo by Dave Gandy under the Public Domain Dedication License

Note: This is the learning note for common data structure conversion between list, dictionary and ndarray, Pandas’ Series and DataFrame . Nothing fancy, but handy.

When it comes to the data analysis, there are always needs for converting between different data containers.

List, dictionary, ndarray, Series and DataFream are data structures used most of the time in data analysis. It would be very helpful to be familiar with the conversion between these different data containers in daily data science work.

From List to Dictionary

let’s create a sample list first:

data_list = [*range(2, 30, 2)]
data_list

we can convert list to dict using list index as the keys:

data_dict = {i:k for i,k in enumerate(data_list)}
data_dict

and we can convert the dictionary back to list with asterisks.

data_list_dict = [*data_dict.values()]
data_list_dict

From List, Dictionary to Series and DataFrame

Pandas’ Series and DataFrame are so flexible that they can be converted to other data structure in a very neat way.

Pandas’ Series is one-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
missing data (currently represented as NaN).

Operations between Series (+, -, /, *, **) align values based on their
associated index values — they need not be the same length. The resulting
index will be the sorted union of the two indexes.

The conversion between list, dict and Series is straight forward.

data_Series_list = pd.Series(data_list)
data_Series_dict = pd.Series(data_dict)

Both give the same results:

While there is a small difference for DataFrame. Pandas’ DataFrame is two-dimensional, size-mutable, potentially heterogeneous tabular data. The data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects.

data_df_list = pd.DataFrame(data_list, columns=['items'])
data_df_list.shape

List will give a dataFrame as expected.

while dict is another story.

data_df_dict = pd.DataFrame(data_dict, index=data_dict.keys())                                                       
data_df_dict

But there is always another way in Pandas.

data_df_dict = pd.DataFrame.from_dict(data_dict, 
columns=['items'],
orient='index')
data_df_dict

From Series, DataFrame to ndarray

In machine learning scenario, this conversion is often seen before injecting the data to train our model. Just like from dict to list, we can do the same here from Series to ndarray.

data_ndarray = data_Series_list.values
data_ndarray

But be aware of the shape difference.

data_Series_list.values.shape, data_df_list.values.shape

From Series to DataFrame

There are many ways to do this. The following way is good for chain operation.

data_Series_list.to_frame(name='items')

and back to Series.

data_df_list.squeeze()

I will update this article from time to time to add more data structure conversion. Please, stay foolish, stay hungry!

--

--

Ke Gui
The Startup

An ordinary guy who wants to be the reason someone believes in the goodness of people. He is living at Brisbane, Australia, with a lovely backyard.