Hello all,
This is going to be a very basic and simple way of migrating excel data to a mysql table with the help of Pandas . Being a popular data analysis and manipulation python library, it provides a lot of features to work with data. Let us see one of those here.
Install packages
As we are going to work with an excel using pandas and mysql database, first we need to install the required packages.
pip install pandas
openpyxl
is required to work with .xlsx files.
pip install openpyxl
sqlalchemy
and pymysql
is required for connecting with mysql.
pip install sqlalchemy
pip install pymysql
Be ready with a sample .xlsx file
I have created a sample employee file 'employee.xlsx' having employee's first name, last name, age and profession.
Make a connection to your mysql
You should have your MySQL database ready.
create_engine()
lets you create an engine instance, basically a starting point of sqlalchemy application. You have to pass here a URL string that indicates database dialect and connection parameters. Replace username, password, host, port, database with your corresponding values.
from sqlalchemy import create_engine
engine = create_engine("mysql+pymysql://" + "username" + ":" + "password" + "@" + "host" + ":" + "port" + "/" + "database" + "?" + "charset=utf8mb4")
After creating the instance, now you should establish a connection to the database using connect()
.
conn = engine.connect()
Now that we have our connection established, it's time to work a little with Pandas.
Reading an excel file using Pandas
First create an excel file obj using pandas ExcelFile()
and pass your filename inside.
import pandas as pd
excel_file = pd.ExcelFile('employee.xlsx')
After creating the excel file obj, now it's time to parse the excel sheet using parse()
function. Pass the sheet name that you want to process.
excel_dataframe = excel_file.parse(sheet_name="employee_data")
And yes! The data has been copied to a dataframe.
Let's move on to the last step where we load this into our mysql table.
Load the data into mysql
To load our excel_dataframe to mysql, we use to_sql()
with table name and sql connection arguments. Here I have given table name as employee_data
. The parameter if_exists
checks if the table already exists in the database. If yes, in this case it will append the data.
excel_dataframe.to_sql("employee_data", conn, if_exists="append")
Don't worry if you don't have the table already. A table gets created with the name that you pass into to_sql()
.
Now go and query for the table results. You will see the data inserted.
Thank you all for reading this. Hope you liked it. Please do share your valuable feedbacks.
References
- pandas.pydata.org/pandas-docs/stable/refere..
- docs.sqlalchemy.org/en/14/core/engines.html
- pandas.pydata.org/pandas-docs/stable/refere..
I will be back with another article. Till then, take care all !!