- Get link
- X
- Other Apps
pandas
Uses of Pandas:
Pandas is a library used for data analysis, specifically CSV (comma-separated values) files. Pandas is the most popular library in Python for analyzing CSV files and saves you lots of time with many useful methods. Pandas is built on another very popular Python library that you might have heard of, Numpy. Numpy provides support for multi-dimensional arrays, or basically spreadsheets. Think of a CSV file as a spreadsheet without separate boxes for each cell but commas to divide columns and a new line for rows. There's more to Pandas for that, and you'll learn that next.
CSV files have a specific format. They won't look like normal spreadsheets with boxes but with commas and new lines. Take a look at the photo below to see how CSV files are formatted
![]() |
CSV File Format |
The image above is a CSV file with three columns and three rows. Here are a few things to remember when adding to a CSV file:
- CSV file names are something like this, user_info.csv.
- The 1st line in a CSV file is automatically saved as the headers of the document.
- The image above has three headers, name, email, and age.
- Name is the first header, so the first value in the next line goes under the category: name.
- You can gather every name using a Pandas method and the argument will be the header, "name".
- The columns in a CSV file are separated by a comma, with no spaces.
- The cell can have an entire sentence if desired but the columns can't be separated like "Bob, My name is Bob.". Only "Bob,My name is Bob".
- The rows in a CSV file are separated by a new line.
As said previously, a CSV file is basically a spreadsheet integrated to work with code. If the CSV file content in the last image was transferred into google sheets, this is what it would look like.
You can even use APIs (Application Programming Interfaces), to turn a CSV file into google sheets and vice versa just with Python code. We won't go over that here, but if you feel ready for APIs or already have knowledge of them. Resources can be found and the bottom of this post. Before you go onto those resources, check out the next section for how to install Pandas, initialize it into your python document, and some examples with methods to get you started.
Installation and Initialization:
Just like any other library that isn't pre-installed into Python, Pandas needs to be downloaded using pip. You can install pip at, https://pip.pypa.io/en/stable/installation/. Before you go and install pip though, check if you already have it by running the command, pip install pandas. If pip isn't installed, then the download will not work. After all your pip settings are good to go, take a look at the following snippet of code to see how to install Pandas in your terminal.
pip install pandas
Just like that, Pandas will be downloaded and ready to use in your Python project. A little initialization is required to start reading your CSV files through a .py file. You need to import Pandas and then read the file. Check out how to import Pandas and read a file with the code below.
# Imports
from pandas import *
# Read the CSV File (Variation 1)
file = read_csv("CSV FILE PATH HERE")
# DO WHAT YOU WANT WITH THE FILE HERE
file.close() # Always at the end of your tasks with the CSV file
# Read the CSV File (Variation 2)
with read_csv("CSV FILE PATH HERE") as file:
# DO WHAT YOU WANT WITH THE FILE HERE
With variation 2, the built-in function will automatically close the file at the end of the code block. Do what you want with the file and the rest of your code in the with function until you don't need to file. Then end the with function. Just remember, sometimes the path your terminal is using to run might not include the CSV file in that so use the correct file path to navigate your way to the file.
Something to take note of from the code snippet is the line I used to import pandas. Now that is completely optional. Let's say that you have a library called this_is_the_greatest_person_of_all_time instead of a short name like pandas. you wouldn't want to say, this_is_the_greatest_person_of_all_time.say_my_name("Nikhil"). The purpose of the * is to import every method from pandas separately. Now if you import that long-named library with, "from LIBRARY NAME HERE import *", you can just type say_my_name("Nikhil") instead of long_name.say_my_name("Nikhil"). For a neat explanation check out the snippet below. It won't work if you copy it because you need the imported file.
# VARIATION ONE
# Imports
from this_is_a_super_long_file_name import *
# Methods from the imported file
say_hi()
say_good_morning()
say_goodbye()
# VARIATION TWO
# Imports
import this_is_a_super_long_file_name
# Methods from the imported file
this_is_a_super_long_file_name.say_hi()
this_is_a_super_long_file_name.say_good_morning()
this_is_a_super_long_file_name.say_goodbye()
Although you might not have a problem with this in a small file, let's say you're making a video game. You might need to import something like a toolbox file with useful methods for every aspect of your game. What if that name is extremely long and you would need to type it out every single time. If you use the *, you won't have to. Think about using it just to get used to the command, you never know when you're going to need it. Next up: examples.
Examples:
Pandas has a variety of useful commands you can use in order to add data, collect data, and analyze data to and from a CSV file. We will go over some of those methods that are essential for beginners when using Pandas for the first time. Some of these are mandatory for your code to work so please read them.
# Imports
from pandas import *
file = read_csv("CSV FILE PATH HERE")
file.head()
file.tail()
file.close()
The methods head() and tail() are used to print the first and last parts of the file. The way you use these is the file name, then a period, then the method name, no spaces. The default value for each method is five, meaning they will print the headers, then 5 rows. You can change this by putting another number as an argument in the methods. Even though you imported Pandas using *, you still need the variable where the CSV file was read and then a period then the method because that variable isn't the Pandas library.
# Imports
from pandas import *
# Create the data to add to the CSV file
data = {
"Name": ["Bob", "Billy", "Linda"],
"Email": ["bob@gmail.com", "billy.senior@live.com", "linda@gmail.com"],
"Age": [37, 59, 18]
}
# Make a DataFrame of the data above so it can integrate with the CSV file
df = DataFrame(data)
# Append the DataFrame to the end of an existing CSV file
df.to_csv("user_info.csv", mode="a", index="False", header="False")
The code snippet above has multiple methods in it. First of all, we don't need to read a CSV file in this because the code here appends to an existing CSV file. Reading a CSV file is for collecting data or selecting values through headers or row numbers. The data variable is a dictionary where the keys are the headers that are already in the CSV file. The value is a list where each index is in a separate row but under the category of the headers given as the key. The df variable is coverting the data into a DataFrame—a data structure that organizes data into a set of rows and columns. Something to take note of is the method DataFrame. It comes from pandas and the syntax I used to type it wouldn't work without the line, from pandas import *. Then the last line is converting the variable df into a csv with the method to_csv(existing_file, mode, index, header).
Parameters:
- The argument existing_file is the file you want to append the data to.
- The argument mode is to let Python know how to append the data ("a" is for append).
- The argument index takes either True or False. False says no row indexes and vice versa.
- The argument header lets Python know that the headers already exist and not to create new headers.
Use the knowledge you've learned and create amazing projects with Python. If you're a pro at Python web development using libraries like Flask or Django, you can create websites with information collected from a CSV file. You could even add info to CSV files when a user clicks a submit button for a form or something like that on your website. Pandas can still be used for all Python projects and you just have to create them. Check out my other posts and leave a comment for suggestions, errors, updates, etc.
Sheety API for python: https://sheety.co/
Methods in pandas: 13 Important Methods in Pandas
Comments
Post a Comment