[Python] Pandas.DataFrame, PostgreSQL, and Autoincrementing Columns
Pandas.DataFrame has a to_sql() convenience method for pumping dataframes to SQL tables. However, there is no parameterization of to_sql() that will create an autoincrementing column index. This post details a simple workaround.
Create an Autoincrementing Column
The key observation is that sqlalchemy supports the creation of a SERIAL type and that this functionality is exposed through pandas, just not via the to_sql() method. The relevant pandas component is the SQLTable class, which can be used to create a table in the database with a specified index, index_label, and primary key state.
Thus the following Python code,
import pandas.io.sql
import sqlalchemy
# [code to populate my_df, a pandas.DataFrame]
# Must drop the index prior to db table creation
idx = my_df.get_index()
my_df.reset_index(drop=True, inplace=True)
# Set up the database connection
sql_engine = sqlalchemy.create_engine("postgresql+psycopg2:///my_database")
sql_db = pandas.io.sql.SQLDatabase(sql_engine)
# Create the db table if necessary
if not sql_db.has_table("my_table"):
args = ["my_table", sql_db]
kwargs = {
"frame" : my_df,
"index" : True,
"index_label" : "id",
"keys" : "id"
}
sql_table = pandas.io.sql.SQLTable(*args, **kwargs)
sql_table.create()
# Insert the DataFrame rows into the table via to_sql() method (omit the index)
my_df.to_sql("my_table", dbcon, if_exists='append', index=False)
generates the following SQL
CREATE TABLE my_table (
id BIGSERIAL NOT NULL,
geoid BIGINT,
west FLOAT(53),
east FLOAT(53),
south FLOAT(53),
north FLOAT(53),
geoname TEXT,
CONSTRAINT my_table_pk PRIMARY KEY (id)
)
and then populates it with the rows of my_df plus values from the autoincrementing index.