Back to Blog

Top 20 Pandas Data Cleaning Code

Stephen
ai-architectureproduct-strategydata-engineeringpython

Intro

Data Cleaning is necessary for making accurate insights, machine learning models, and insights to move your business forward.

Sadly data cleaning takes a long time to perform and is only slowly getting easier.

To make it easier to clean your data we collected the most common ways to clean your data. With an example, the library and most importantly the code snippet in an easy-to-use function with the pandas implementation.

Just find what you want to clean, copy and paste, install the library and you are done.

Phone — International Validation & Format

Parses, and validates phone numbers from various countries

Phonenumbers Library

Example

Input : +442083661177

Output : +44 20 8366 1177

Phone — US Domestic Validation & Format

Parses, and validates phone numbers from the United States

Phonenumbers Library

Example

Input : +120012301

Output : 2001230101

Address — Standardize

Parses and orders the parts of the address into a consistent string

usaddress Library

Example

Input: 3743 Carson Shores New Glenn, NC 21452

Output: 3743 Carson Shores New Glenn NC 21452

Address — Address to Street Number and Name

Parses the address and returns the street number and name

usaddress Library

Example

Input: 3743 Carson Shores New Glenn, NC 21452

Output: 3743 Carson Shores

Address — Address to Street Number

Parses the address and returns the street number and name

usaddress Library

Example

Input: 3743 Carson Shores New Glenn, NC 21452

Output: 3743

Address — Address to City

Parses the address and returns the city

usaddress Library

Example

Input: 3743 Carson Shores New Glenn, NC 21452

Output: New Glenn

Address — Address to State

Parses the address and returns the state

usaddress Library

Example

Input: 3743 Carson Shores New Glenn, North Carolina 21452

Output: North Carolina

Address — Address to State Code

Parses the address and returns the state code

usaddress Library

Example

Input: 3743 Carson Shores New Glenn, North Carolina 21452

Output: NC

Address — Address to Zipcode

Parses the address and returns the zipcode

usaddress Library

Example

Input: 3743 Carson Shores New Glenn, North Carolina 21452

Output: 21452

Currency to float

Turns currency into a float value

price-parser Library

Example

Input: Price: $119.00

Output: 119.00

Date Format to American Date Format (MM/DD/YYYY)

Formats a date into the American format (MM/DD/YYYY)

dateparser Library

Example

Input: 9–20–2021

Output: 09/20/2021

Date Format to European Date Format (YYYY/MM/DD)

Formats a date into the European format (YYYY/MM/DD)

dateparser Library

Example

Input: 9–20–2021

Output: 2021/09/20

Date Format to Quarter

Formats a date to a quarter

dateparser Library

Example

Input: 1987–08–10

Output: 3

Date Format to Timestamp

Formats a date to a unix timestamp

dateparser Library

Example

Input: 9–20–2021

Output: 1632137676

Uppercase String

Turns the string into uppercase

Example

Input: 3 brown foxes jump after 1 rabbit

Output: BROWN FOXES JUMP AFTER 1 RABBIT

Lowercase String

Turns the string into lowercase

Example

Input: 3 BROWN FOXES JUMP AFTER 1 RABBIT

Output: 3 brown foxes jump after 1 rabbit

Strip Numbers

Strips numbers from a string

Example

Input: 3 brown foxes jump after 1 rabbit

Output: brown foxes jump after rabbit

Strip Alpha Characters

Strips alpha (A-Z,a-z) characters

Example

Input: 3 brown foxes jump after 1 rabbit

Output: 3 1

Strip Special Characters

Strips special characters

Example

Input: 3 brown foxes, jump after 1 rabbit!

Output: 3 brown foxes jump after 1 rabbit

Full Name to First Name

Parses and returns the first name from the full name

nameparser Library

Example

Input: Stephen Weber

Output: Stephen

Full Name to Last Name

Parses and returns the last name from the full name

nameparser Library

Example

Input: Stephen Weber

Output: Weber

Email To Domain Name

Parses an email string and returns the domain name if the domain is a valid structured domain

Example

Input: info@bitrook.com

Output: bitrook.com

Not working or need something easier? Check out BitRook and have a desktop app do it for you and even write the python code too.