Data frame with pandas

04 Janeiro, 2019

Read this in "about 2 minutes".

Hi!

Data frame is like table in SQL, it can be loaded from file and operated in memory. This is the process of data-preprocessing, which can be very tedious work.

Luckily, Pandas is a powerful tool to help us.

0.import package

import pandas as pd

1.create a data frame

df = pd.DataFrame()

Or load from (csv) file

header : is not set, header is the first line

delimiter : if not set, delimiter is ‘,’

bad split setting : error_bad_lines=False,warn_bad_lines=True

df = pd.read_csv(filepath_or_buffer=file_root, header=None, delimiter='\t')

2.rename columns

df.rename(columns={'old_column_name_1': 'new_column_name_1', 'old_column_name_2': 'new_column_name_2'}, inplace=True)

3.add a new column

df['new_column_name'] = 0

4.concat two data frames

add_df = pd.concat([one_part_df, two_part_df], axis=1)

5.column replace

df = df.replace('None', np.nan)

6.column change

df['column_name']  = df['column_name'] .apply(lambda x: trans_float(x, col))

7.fill null

df['column_name'] = df['column_name'].fillna(0)

8.remove duplicate columns

df = df.loc[:, ~df.columns.duplicated()]

9.drop some rows according to invalid value

df = df.drop(df[df['column_name] == -1.0].index)

10.split into chunk when df is too large

for chunk in pd.read_csv(file_name, chunksize=chunk_size):
    do_something(chunk)

Goodbye!

Author

Typing Theme

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Tempora non aut eos voluptas debitis unde impedit aliquid ipsa.

This is
April Cai.

Data frame with pandas

The comment for this post is disabled.

This is April Cai.

Data frame with pandas

The comment for this post is disabled.

This is
April Cai.