[Python] Pandas.DataFrame, PostgreSQL, and Autoincrementing Columns

Pandas.DataFrame has a to_sql() convenience method for pumping dataframes to SQL tables. However, there is no parameterization of to_sql() that will create an autoincrementing column index. This post details a simple workaround.

Create an Autoincrementing Column

The key observation is that sqlalchemy supports the creation of a SERIAL type and that this functionality is exposed through pandas, just not via the to_sql() method. The relevant pandas component is the SQLTable class, which can be used to create a table in the database with a specified index, index_label, and primary key state.

Thus the following Python code,

import pandas.io.sql
import sqlalchemy

# [code to populate my_df, a pandas.DataFrame]

# Must drop the index prior to db table creation
idx = my_df.get_index()
my_df.reset_index(drop=True, inplace=True)

# Set up the database connection
sql_engine = sqlalchemy.create_engine("postgresql+psycopg2:///my_database")
sql_db = pandas.io.sql.SQLDatabase(sql_engine)

# Create the db table if necessary
if not sql_db.has_table("my_table"):
    args = ["my_table", sql_db]
    kwargs = {
        "frame" : my_df,
        "index" : True, 
        "index_label" : "id",
        "keys" : "id"
    }
    sql_table = pandas.io.sql.SQLTable(*args, **kwargs)
    sql_table.create()

# Insert the DataFrame rows into the table via to_sql() method (omit the index)
my_df.to_sql("my_table", dbcon, if_exists='append', index=False)

generates the following SQL

CREATE TABLE my_table (
	id BIGSERIAL NOT NULL, 
	geoid BIGINT, 
	west FLOAT(53), 
	east FLOAT(53), 
	south FLOAT(53), 
	north FLOAT(53), 
	geoname TEXT, 
	CONSTRAINT my_table_pk PRIMARY KEY (id)
)

and then populates it with the rows of my_df plus values from the autoincrementing index.