API¶

Database connections¶

class sdssdb.connection.DatabaseConnection(dbname=None, profile=None, autoconnect=True, dbversion=None)[source]¶

Bases: object

A PostgreSQL database connection with profile and autoconnect features.

Provides a base class for PostgreSQL connections for either peewee or SQLAlchemy. The parameters for the connection can be passed directly (see connect_from_parameters) or, more conveniently, a profile can be used. By default dbname is left undefined and needs to be passed when initiating the connection. This is useful for databases such as apodb/lcodb for which the model classes are identical but the database name is not. For databases for which the database name is fixed (e.g., sdss5db), this class can be subclassed and dbname overridden.

Parameters

dbname (str) – The database name.
profile (str) – The configuration profile to use. The profile defines the default user, database server hostname, and port for a given location. If not provided, the profile is automatically determined based on the current domain, or defaults to local.
autoconnect (bool) – Whether to autoconnect to the database using the profile parameters. Requites dbname to be set.
dbversion (str) – A database version. If specified, appends to dbname as “dbname_dbversion” and becomes the dbname used for connection strings.

become(user)[source]¶: Change the connection to a certain user.

become_admin()[source]¶: Becomes the admin user.

become_user()[source]¶: Becomes the read-only user.

change_version(dbversion=None)[source]¶

Change database version and attempt to reconnect

Parameters: dbversion (str) – A database version

connect(dbname=None, silent_on_fail=False, **connection_params)[source]¶

Initialises the database using the profile information.

Parameters

dbname (str or None) – The database name. If None, defaults to dbname.
user (str) – Overrides the profile database user.
host (str) – Overrides the profile database host.
port (str) – Overrides the profile database port.
silent_on_fail (bool) – If True, does not show a warning if the connection fails.

Returns

connected (bool) – Returns True if the database is connected.

connect_from_parameters(dbname=None, **params)[source]¶

Initialises the database from a dictionary of parameters.

Parameters

dbname (str or None) – The database name. If None, defaults to dbname.
params (dict) – A dictionary of parameters, which should include user, host, and port.

Returns

connected (bool) – Returns True if the database is connected.

static list_profiles(profile=None)[source]¶

Returns a list of profiles.

Parameters: profile (str or None) – If None, returns a list of profile keys. If profile is not None returns the parameters for the given profile.

set_profile(profile=None, connect=True)[source]¶

Sets the profile from the configuration file.

Parameters

profile (str) – The profile to set. If None, uses the domain name to determine the profile.
connect (bool) – If True, tries to connect to the database using the new profile.

Returns

connected (bool) – Returns True if the database is connected.

connected = None¶: Reports whether the connection is active.

abstract property connection_params¶

Returns a dictionary with the connection parameters.

Returns: connection_params (dict) – A dictionary with the user, host, and part of the current connection. E.g., {'user': 'sdssdb', 'host': 'sdss4-db', 'port': 5432}

dbname = None¶: The database name.

class sdssdb.connection.PeeweeDatabaseConnection(*args, **kwargs)[source]¶

Bases: sdssdb.connection.DatabaseConnection, peewee.PostgresqlDatabase

Peewee database connection implementation.

Variables: models (list) – Models bound to this database. Only models that are bound using BaseModel are handled.

get_model(table_name, schema=None)[source]¶

Returns the model for a table.

Parameters

table_name (str) – The name of the table whose model will be returned.
schema (str) – The schema for the table. If None, the first model that matches the table name will be returned.

Returns

Model or None – The model associated with the table, or None if no model was found.

property connection_params¶: Returns a dictionary with the connection parameters.

class sdssdb.connection.SQLADatabaseConnection(*args, **kwargs)[source]¶

Bases: sdssdb.connection.DatabaseConnection

SQLAlchemy database connection implementation

add_base(base, prepare=True)[source]¶: Binds a base to this connection.

create_engine(db_connection_string=None, echo=False, pool_size=10, pool_recycle=1800, expire_on_commit=True)[source]¶

Create a new database engine

Resets and creates a new sqlalchemy database engine. Also creates and binds engine metadata and a new scoped session.

prepare_bases(base=None)[source]¶

Prepare a Model Base

Prepares a SQLalchemy Base for reflection. This binds a database engine to a specific Base which maps to a set of ModelClasses. If base is passed only that base will be prepared. Otherwise, all the bases bound to this database connection will be prepared.

reset_engine()[source]¶: Reset the engine, metadata, and session

property connection_params¶: Returns a dictionary with the connection parameters.

Peewee¶

class sdssdb.peewee.ReflectMeta[source]¶

Bases: peewee.ModelBase

A metaclass that supports model reflection on demand.

This metaclass expands PeeWee’s ModelBase to provide a hook for declaring/expanding fields and indexes using the introspection system. The feature is enabled by a new attribute in Metadata called use_reflection (which is set to False by default). When set to True the metaclass extends the model using the fields and indexes discovered using reflection. It is possible to mix explicitely defined fields with discovered ones; the latter never override the former.

Normally ReflectMeta is implemented by creating a base model that is then used to defined the table models

class ReflectBaseModel(peewee.Model, metaclass=ReflectMeta):

    class Meta:
        primary_key = False
        use_reflection = False
        database = database

class Tycho2(ReflectBaseModel):

    class Meta:
        use_reflection = True
        schema = 'catalogdb'
        table_name = 'tycho2'

Note that use_reflection is inheritable so if set to True in the base class that will affect to all subclasses, except if it’s overridden there. It’s also a good idea to set primary_key=False to prevent Peewee from creating an id column automatically.

If the database connection changes it’s possible to call reflect to rediscover the reflected fields for the new connection. This will remove all reflected fields (but not those explicitely added) and add the newly discovered ones.

If the database class is PeeweeDatabaseConnection, the database will call reflect for each model bound to the database each time it connects. This ensures that if the connection changes the reflected fields are updated. Note that this will not work if using Peewee’s PostgresqlDatabase.

By default, ReflectMeta will add all the fields from the reflected models, including foreign keys. Sometimes that is not desirable and it’s preferable to define the foreign keys explicitely. In that case it’s possible to disable the reflection of foreign keys by doing

class ReflectBaseModel(peewee.Model, metaclass=ReflectMeta):

        class Meta:
            primary_key = False
            use_reflection = False
            reflection_options = {'skip_foreign_keys': True}
            database = database

Foreign keys explicitely defined need to reference existing fields, so the referenced columns need to be added manually. In practice, this means that if you add a ForeignKeyField, the referenced field (usually the primary key) needs to be defined explicitely.

Caveats:

Many-to-many relationships need to be defined explicitely since it’s not possible to set the through model based on the reflected information.
When the primary key of a model is also a foreign key and reflection_options = {'skip_foreign_keys': True}, both the primary key and the foreign key need to be defined explicitely. Otherwise neither will be added.
Reflection will fail if a table is locked with AccessExclusiveLock. In that case reflection will be skipped and a warning issued. Note that if the table is locked with an exclusive lock you won’t be able to access the data in any case.
In this version, indexes discovered by reflection are not propagated to the model class. This should not have any impact in performance.

reflect()[source]¶: Adds fields and indexes to the model using reflection.

class sdssdb.peewee.BaseModel(*args, **kwargs)[source]¶

Bases: peewee.Model

A custom peewee Model with enhanced representation and methods.

By default it always prints pk, name, and label, if found. Models can define they own print_fields in Meta as a list of field names to be output in the representation.

DoesNotExist¶: alias of BaseModelDoesNotExist

cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶

Returns a query with the rows inside a region on the sky.

Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns ra and dec. All units are expected to be in degrees.

Parameters

ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If b=None, a circular search will be done with a as the radius.
b (float or None) – The semi-minor axis of the ellipse. If None, a circular cone search will be run. In that case, pa is ignored.
pa (float or None) – The parallactic angle of the ellipse.
ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.

SQLAlchemy¶

class sdssdb.sqlalchemy.BaseModel[source]¶

Bases: object

A custom sqlalchemy declarative Base

By default it always prints pk, name, and label, if found. Models can define they own print_fields as a list of field to be output in the representation. Works with field names nested inside other models as well.

cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶

Returns a query with the rows inside a region on the sky.

Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns ra and dec. All units are expected to be in degrees.

Parameters

ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If b=None, a circular search will be done with a as the radius.
b (float or None) – The semi-minor axis of the ellipse. If None, a circular cone search will be run. In that case, pa is ignored.
pa (float or None) – The parallactic angle of the ellipse.
ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.

get_id()[source]¶: get the pk

print_fields = []¶: A list of fields (as strings) to be included in the __repr__

Utils¶

sdssdb.utils.ingest.to_csv(table, path, header=True, delimiter='\t', use_multiprocessing=False, workers=4)[source]¶

Creates a PostgreSQL-valid CSV file from a table, handling arrays.

Parameters

table (astropy.table.Table) – The table to convert.
path (str) – The path to which to write the CSV file.
header (bool) – Whether to add a header with the column names.
delimiter (str) – The delimiter between columns in the CSV files.
use_multiprocessing (bool) – Whether to use multiple cores. The rows of the resulting file will not have the same ordering as the original table.
workers (int) – How many workers to use with multiprocessing.

sdssdb.utils.ingest.copy_data(data, connection, table_name, schema=None, chunk_size=10000, show_progress=False)[source]¶

Loads data into a DB table using COPY.

Parameters

data (Table) – An astropy table whose column names and types will be used to create the model.
connection (PeeweeDatabaseConnection) – The Peewee database connection to use.
table_name (str) – The name of the table.
schema (str) – The schema in which the table lives.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If True, shows a progress bar. Requires the progressbar2 module to be installed.

sdssdb.utils.ingest.drop_table(table_name, connection, cascade=False, schema=None)[source]¶

Drops a table. Does nothing if the table does not exist.

Parameters

table_name (str) – The name of the table to be dropped.
connection (PeeweeDatabaseConnection) – The Peewee database connection to use.
cascade (bool) – Whether to drop related tables using CASCADE.
schema (str) – The schema in which the table lives.

Returns

result (bool) – Returns True if the table was correctly dropped or False if the table does not exists and nothing was done.

sdssdb.utils.ingest.create_model_from_table(table_name, table, schema=None, lowercase=False, primary_key=None)[source]¶

Returns a Model from the columns in a table.

Parameters

table_name (str) – The name of the table.
table (Table) – An astropy table whose column names and types will be used to create the model.
schema (str) – The schema in which the table lives.
lowercase (bool) – If True, all column names will be converted to lower case.
primary_key (str) – The name of the column to mark as primary key.

sdssdb.utils.ingest.bulk_insert(data, connection, model, chunk_size=100000, show_progress=False)[source]¶

Loads data into a DB table using bulk insert.

Parameters

data (Table) – An astropy table with the data to insert.
connection (PeeweeDatabaseConnection) – The Peewee database connection to use.
model (~peewee:Model) – The model representing the database table into which to insert the data.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If True, shows a progress bar. Requires the progressbar2 module to be installed.

sdssdb.utils.ingest.file_to_db(input_, connection, table_name, schema=None, lowercase=False, create=False, drop=False, truncate=False, primary_key=None, load_data=True, use_copy=True, chunk_size=100000, show_progress=False)[source]¶

Loads a table from a file to a database.

Loads a file or a Table object into a database. If create=True a new table will be created, with column types matching the table ones. All columns are initially defined as NULL.

By default, the data are loaded using the COPY method to optimise performance. This can be disabled if needed.

Parameters

input_ (str or Table) – The path to a file that will be opened using Table.read or an astropy Table.
connection (PeeweeDatabaseConnection) – The Peewee database connection to use (SQLAlchemy connections are not supported).
table_name (str) – The name of the table where to load the data, or to be created.
schema (str) – The schema in which the table lives.
lowercase (bool) – If True, all column names will be converted to lower case.
create (bool) – Creates the table if it does not exist.
drop (bool) – Drops the table before recreating it. Implies create=True. Note that a CASCADE drop will be executed. Use with caution.
truncate (bool) – Truncates the table before loading the data but maintains the existing columns.
primary_key (str) – The name of the column to mark as primary key (ignored if the table is not being created).
load_data (bool) – If True, loads the data from the table; otherwise just creates the table in the database.
use_copy (bool) – When True (recommended) uses the SQL COPY command to load the data from a CSV stream.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If True, shows a progress bar. Requires the progressbar2 module to be installed.

Returns

model (~peewee:Model) – The model for the table created.

sdssdb.utils.ingest.create_adhoc_database(dbname, schema=None, profile='local')[source]¶

Creates an adhoc SQLA database and models, given an existing db

Creates an in-memory SQLA database connection given a database name to connect to, along with auto-generated models for the a given schema name. Currently limited to building models for one schema at a time. Useful for temporarily creating and trialing a database connection, and simple models, without building and committing a full fledged new database connection.

Parameters

dbname (str) – The name of the database to create a connection for
schema (str) – The name of the schema to create mappings for
profile (str) – The database profile to connect with

Returns

A temporary database connection and module of model classes

Example

>>> from sdssdb.utils.database import create_adhoc_database
>>> tempdb, models = create_adhoc_database('datamodel', schema='filespec')
>>> tempdb
>>> <DatamodelDatabaseConnection (dbname='datamodel', profile='local', connected=True)>
>>> models.File
>>> sqlalchemy.ext.automap.File

sdssdb.utils.internals.vacuum(database, table_name, analyze=True, verbose=False, schema=None)[source]¶

Vacuums (and optionally analyses) a table.

Parameters

database (PeeweeDatabaseConnection) – A PeeWee connection to the database to vacuum.
table_name (str) – The table name.
analyze (bool) – Whether to run ANALYZE when vacuumming.
verbose (bool) – Whether to run in verbose mode.
schema (str) – The schema to vacuum. If None, vacuums the entire database.

sdssdb.utils.internals.vacuum_all(database, analyze=True, verbose=False, schema=None)[source]¶

Vacuums all the tables in a database or schema.

Parameters

database (PeeweeDatabaseConnection) – A PeeWee connection to the database to vacuum.
analyze (bool) – Whether to run ANALYZE when vacuumming.
verbose (bool) – Whether to run in verbose mode.
schema (str) – The schema to vacuum. If None, vacuums the entire database.

sdssdb.utils.internals.get_cluster_index(connection, table_name=None, schema=None)[source]¶: Returns a tuple with the index on which a table has been clustered.

sdssdb.utils.internals.get_unclustered_tables(connection, schema=None)[source]¶: Lists tables not clustered.

sdssdb.utils.internals.get_row_count(connection, table_name, schema=None, approximate=True)[source]¶

Returns the model row count.

Parameters

connection (PeeweeDatabaseConnection) – The database connection.
table (str) – The table name.
schema (str) – The schema in which the table lives.
approximate (bool) – If True, returns the approximate row count from the pg_class table (much faster). Otherwise calculates the exact count.

sdssdb.utils.internals.is_table_locked(connection, table_name, schema=None)[source]¶: Returns the locks for a table or None if no lock is present.

sdssdb.utils.create_schema_graph(models=None, base=None, schema=None, show_columns=True, show_pks=True, show_indices=True, show_datatypes=True, skip_tables=[], font='Bitstream-Vera Sans', graph_options={}, relation_options={})[source]¶

Creates a graph visualisation from a series of Peewee models.

Produces a pydot graph including the tables and relationships from a series of models or from a base model class.

Parameters

models (list) – A list of Peewee models to be graphed.
base (Model) – A base model class. If passed, all the model classes that were created by subclassing from the base model will be used.
schema (str) – A schema name. If passed, will be used to limit the list of models or base subclasses to only the models that match the schema name.
show_columns (bool) – Whether to show the column names.
show_pks (bool) – Whether to show the primary key. Supersedes show_columns.
show_indices (bool) – Whether to show the indices from the table as separate rows.
show_datatypes (bool) – Whether to show the data type of each column.
skip_tables (list) – List of table names to skip.
font (str) – The name of the font to use.
graph_options (dict) – Options for creating the graph. Any valid Graphviz option.
relation_options (dict) – Additional parameters to be passed to pydot.Edge when creating the relationships.

Returns

graph (pydot.Dot) – A pydot.Dot object with the graph representation of the schema.

Example

>>> graph = create_schema_graph([User, Tweet])
>>> graph.write_pdf('tweetdb.pdf')