✍Tips and Tricks in Python
Data Structure Conversion
The conversion between list, dictionary, ndarray, Series and DataFrame
Note: This is the learning note for common data structure conversion between list, dictionary and ndarray
, Pandas’ Series
and DataFrame
. Nothing fancy, but handy.
📈Python For Finance Series
When it comes to the data analysis, there are always needs for converting between different data containers.
List, dictionary, ndarray, Series and DataFream are data structures used most of the time in data analysis. It would be very helpful to be familiar with the conversion between these different data containers in daily data science work.
From List to Dictionary
let’s create a sample list first:
data_list = [*range(2, 30, 2)]
data_list
we can convert list
to dict
using list index
as the keys
:
data_dict = {i:k for i,k in enumerate(data_list)}
data_dict
and we can convert the dictionary back to list with asterisks.
data_list_dict = [*data_dict.values()]
data_list_dict
From List
, Dictionary
to Series
and DataFrame
Pandas’ Series
and DataFrame
are so flexible that they can be converted to other data structure in a very neat way.
Pandas’ Series
is one-dimensional ndarray
with axis labels (including time series).
Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray
have been overridden to automatically exclude
missing data (currently represented as NaN
).
Operations between Series (+, -, /, *, **) align values based on their
associated index values — they need not be the same length. The resulting
index will be the sorted union of the two indexes.
The conversion between list, dict and Series is straight forward.
data_Series_list = pd.Series(data_list)
data_Series_dict = pd.Series(data_dict)
Both give the same results:
While there is a small difference for DataFrame. Pandas’ DataFrame is two-dimensional, size-mutable, potentially heterogeneous tabular data. The data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects.
data_df_list = pd.DataFrame(data_list, columns=['items'])
data_df_list.shape
List will give a dataFrame as expected.
while dict is another story.
data_df_dict = pd.DataFrame(data_dict, index=data_dict.keys())
data_df_dict
But there is always another way in Pandas.
data_df_dict = pd.DataFrame.from_dict(data_dict,
columns=['items'],
orient='index')
data_df_dict
From Series, DataFrame to ndarray
In machine learning scenario, this conversion is often seen before injecting the data to train our model. Just like from dict to list, we can do the same here from Series to ndarray.
data_ndarray = data_Series_list.values
data_ndarray
But be aware of the shape difference.
data_Series_list.values.shape, data_df_list.values.shape
From Series to DataFrame
There are many ways to do this. The following way is good for chain operation.
data_Series_list.to_frame(name='items')
and back to Series.
data_df_list.squeeze()
I will update this article from time to time to add more data structure conversion. Please, stay foolish, stay hungry!