Basics of Pandas — Part 1

Aarish Alam
Analytics Vidhya
Published in
3 min readNov 18, 2020

--

When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean and process your data. In pandas, a data table is called a DataFrame. This Blog will pose some frequently asked queries and try to answer these queries.

To demonstrate the use of pandas, I’ll be using the UFO Dataset. The data is stored as an excel file by the name of ufo.xlsx.

How do I read a tabular data file into pandas?

The below code demonstrates how you read a csv file into your dataframe.

import pandas as pd
''' read a dataset of Chipotle orders directly from a URL and store the results in a DataFrame'''
orders = pd.read_table('http://bit.ly/chiporders',sep='\t')
ufo=pd.read_excel('ufo.xlsx',sheet_name='ufo')#reading excel file

The below code demonstrates how to view first and last five elements of your dataframe respectively

# examine the first 5 rows
print(orders.head())
print(ufo.tails())
Chiporders Dataset
UFO dataset

How do I select a pandas Series from a DataFrame?

# select the 'City' Series using bracket notation
ufo['City']
# or equivalently, use dot notation
ufo.State

Limitations for using dot notation

  • Dot notation doesn’t work if there are spaces in the Series name
  • Dot notation doesn’t work if the Series has the same name as a DataFrame method or attribute (like ‘head’ or ‘shape’)
  • Dot notation can’t be used to define the name of a new Series (see below)

Why do some pandas commands end with parentheses (and others don’t)?

The simple answer to this question is that Methods end with parentheses, while attributes don’t.

Example of methods

ufo.head()
ufo.info()
ufo.city.nunique()#no of unique cities #outputs 6476
'''use an optional parameter to the describe method to summarize only 'object' columns.'''
ufo.describe(include='object')
ufo.info()
ufo.describe(include=’object’)

Example of attributes

ufo.columns
ufo.shape # outputs ((18241, 6)) -- (no.rows , no.columns)
ufo.dtypes
All the columns from UFO dataset
Data Types of different columns

How do I rename columns in a pandas DataFrame?

Renaming columns by rename method

ufo.rename(columns={'Colors Reported':'Colors_Reported', 'Shape Reported':'Shape_Reported'}, inplace=True)
ufo.columns
Renamed Columns

Renaming columns by overriding columns attribute

ufo_cols = ['city', 'colors reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols
ufo.columns
Rolled back the changes applied by method ‘rename’

Replacing column spaces by underscores

''' replace all spaces with underscores in the column names by using the 'str.replace' method'''
ufo.columns = ufo.columns.str.replace(' ', '_')
ufo.columns

How do I remove columns from a pandas DataFrame?

Removing a single column

#axis=1 represents columns
ufo.drop('Colors Reported', axis=1, inplace=True)

Removing multiple columns at once

ufo.drop(['City', 'State'], axis=1, inplace=True)
uf.head()
After removing the columns

For removing rows , you can use this below given code

ufo.drop([2,9], axis=0, inplace=True)

This marks the end of part 1 of this introductory blog . In the next part I will be covering some more interesting and basic questions related to pandas.

Thanks 😉

--

--