API¶

Database connections¶

class sdssdb.connection.DatabaseConnection(dbname=None, profile=None, autoconnect=None, dbversion=None, use_psycopg3=None)[source]¶

Bases: object

A PostgreSQL database connection with profile and autoconnect features.

Provides a base class for PostgreSQL connections for either peewee or SQLAlchemy. The parameters for the connection can be passed directly (see connect_from_parameters) or, more conveniently, a profile can be used. By default dbname is left undefined and needs to be passed when initiating the connection. This is useful for databases such as apodb/lcodb for which the model classes are identical but the database name is not. For databases for which the database name is fixed (e.g., sdss5db), this class can be subclassed and dbname overridden.

Parameters:

dbname (str) – The database name.
profile (str) – The configuration profile to use. The profile defines the default user, database server hostname, and port for a given location. If not provided, the profile is automatically determined based on the current domain, or defaults to local.
autoconnect (bool or None) – Whether to autoconnect to the database using the profile parameters. Requites dbname to be set. If None, whether to autoconnect is defined, in order, by the existence of an environment variable $SDSSDB_AUTOCONNECT or by sdssdb.autoconnect. If they are set to 0 or false the database won’t autoconnect. Note that this must be set before importing any model classes.
dbversion (str) – A database version. If specified, appends to dbname as “dbname_dbversion” and becomes the dbname used for connection strings.
use_psycopg3 (bool) – Whether to use psycopg3 instead of psycopg2. If None, defaults to the value of the environment variable $SDSSDB_PSYCOPG3 (which defaults to True if not set).

static list_profiles(profile=None)[source]¶

Returns a list of profiles.

Parameters:: profile (str or None) – If None, returns a list of profile keys. If profile is not None returns the parameters for the given profile.

become(user)[source]¶: Change the connection to a certain user.

become_admin(admin=None)[source]¶

Becomes the admin user.

If admin=None defaults to the admin value in the current profile.

become_user(user=None)[source]¶

Becomes the read-only user.

If user=None defaults to the user value in the current profile.

change_version(dbversion=None)[source]¶

Change database version and attempt to reconnect

Parameters:: dbversion (str) – A database version

connect(dbname=None, silent_on_fail=False, **connection_params)[source]¶

Initialises the database using the profile information.

Parameters:

dbname (str or None) – The database name. If None, defaults to dbname. dbname can also be a full database URI, in which case the other connection parameters are ignored.
user (str) – Overrides the profile database user.
host (str) – Overrides the profile database host.
port (str) – Overrides the profile database port.
silent_on_fail (bool) – If True, does not show a warning if the connection fails.

Returns:

connected – Returns True if the database is connected.

Return type:

bool

connect_from_parameters(dbname=None, **params)[source]¶

Initialises the database from a dictionary of parameters.

Parameters:

dbname (str or None) – The database name. If None, defaults to dbname.
params (dict) – A dictionary of parameters, which should include user, host, and port.

Returns:

connected – Returns True if the database is connected.

Return type:

bool

get_connection_uri()[source]¶: Returns the URI to the database connection.

post_connect()[source]¶: Hook called after a successfull connection.

set_profile(profile=None, connect=True, **params)[source]¶

Sets the profile from the configuration file.

Parameters:

profile (str) – The profile to set. If None, uses the domain name to determine the profile.
connect (bool) – If True, tries to connect to the database using the new profile.
params – Connection parameters (user, host, port, password) that will override the profile values.

Returns:

connected – Returns True if the database is connected.

Return type:

bool

auto_reflect = True¶: # Whether to call Model.reflect() in Peewee after a connection.

abstract property connection_params: dict | None¶

Returns a dictionary with the connection parameters.

Returns:: connection_params – A dictionary with the user, host, and part of the current connection. E.g., {'user': 'sdssdb', 'host': 'sdss4-db', 'port': 5432}
Return type:: dict

dbname = None¶: The database name.

dbversion = None¶: Database version

class sdssdb.connection.PeeweeDatabaseConnection(*args, **kwargs)[source]¶

Bases: DatabaseConnection, PostgresqlDatabase

Peewee database connection implementation.

models¶

Models bound to this database. Only models that are bound using BaseModel are handled.

Type:: list

get_fields(table_name, schema=None, cache=True)[source]¶: Returns a list of Peewee fields for a table.

get_introspector(schema=None)[source]¶: Gets a Peewee database peewee:Introspector.

get_model(table_name, schema=None)[source]¶

Returns the model for a table.

Parameters:

table_name (str) – The name of the table whose model will be returned.
schema (str) – The schema for the table. If None, the first model that matches the table name will be returned.

Returns:

The model associated with the table, or None if no model was found.

Return type:

Model or None

get_primary_keys(table_name, schema=None, cache=True)[source]¶: Returns the primary keys for a table.

property connected¶: Reports whether the connection is active.

property connection_params¶: Returns a dictionary with the connection parameters.

property psycopg_version¶: Returns the version of psycopg in use.

class sdssdb.connection.SQLADatabaseConnection(*args, **kwargs)[source]¶

Bases: DatabaseConnection

SQLAlchemy database connection implementation

add_base(base, prepare=True)[source]¶: Binds a base to this connection.

create_engine(db_connection_string=None, echo=False, pool_size=10, pool_recycle=1800, expire_on_commit=True)[source]¶

Create a new database engine

Resets and creates a new sqlalchemy database engine. Also creates and binds engine metadata and a new scoped session.

prepare_bases(base=None)[source]¶

Prepare a Model Base

Prepares a SQLalchemy Base for reflection. This binds a database engine to a specific Base which maps to a set of ModelClasses. If base is passed only that base will be prepared. Otherwise, all the bases bound to this database connection will be prepared.

reset_engine()[source]¶: Reset the engine, metadata, and session

connected¶: Reports whether the connection is active.

property connection_params¶: Returns a dictionary with the connection parameters.

Peewee¶

class sdssdb.peewee.ReflectMeta(name, bases, attrs)[source]¶

Bases: ModelBase

A metaclass that supports model reflection on demand.

This metaclass expands PeeWee’s ModelBase to provide a hook for declaring/expanding fields and indexes using the introspection system. The feature is enabled by a new attribute in Metadata called use_reflection (which is set to False by default). When set to True the metaclass extends the model using the fields and indexes discovered using reflection. It is possible to mix explicitely defined fields with discovered ones; the latter never override the former.

Normally ReflectMeta is implemented by creating a base model that is then used to defined the table models

class ReflectBaseModel(peewee.Model, metaclass=ReflectMeta):

    class Meta:
        primary_key = False
        use_reflection = False
        database = database

class Tycho2(ReflectBaseModel):

    class Meta:
        use_reflection = True
        schema = 'catalogdb'
        table_name = 'tycho2'

Note that use_reflection is inheritable so if set to True in the base class that will affect to all subclasses, except if it’s overridden there. It’s also a good idea to set primary_key=False to prevent Peewee from creating an id column automatically.

If the database connection changes it’s possible to call reflect to rediscover the reflected fields for the new connection. This will remove all reflected fields (but not those explicitely added) and add the newly discovered ones.

If the database class is PeeweeDatabaseConnection, the database will call reflect for each model bound to the database each time it connects. This ensures that if the connection changes the reflected fields are updated. Note that this will not work if using Peewee’s PostgresqlDatabase.

By default, ReflectMeta will add all the fields from the reflected models, including foreign keys. Sometimes that is not desirable and it’s preferable to define the foreign keys explicitely. In that case it’s possible to disable the reflection of foreign keys by doing

class ReflectBaseModel(peewee.Model, metaclass=ReflectMeta):

        class Meta:
            primary_key = False
            use_reflection = False
            reflection_options = {'skip_foreign_keys': True}
            database = database

Foreign keys explicitely defined need to reference existing fields, so the referenced columns need to be added manually. In practice, this means that if you add a ForeignKeyField, the referenced field (usually the primary key) needs to be defined explicitely.

The default Peewee reflection process requires multiple queries against the database for each table. This can become quite slow if the schema contains many tables or if the connection has significant overhead, for example when connected over an SSH tunnel. In these cases one can set Meta with reflection_options = {'use_peewee_reflection': False}. This will use a reflection system that is designed to minimise the number of queries needed for schema introspection. Note that this system is experimental and doesn’t reflect foreign keys or constraints.

Caveats:

Due to a bug in PeeWee primary keys are not discovered correctly if the user connected to the database does not have write access. In that case a composite key encompassing all the fields in the model is created. To avoid this, explicitely define the primary key in the model.
Many-to-many relationships need to be defined explicitely since it’s not possible to set the through model based on the reflected information.
When the primary key of a model is also a foreign key and reflection_options = {'skip_foreign_keys': True}, both the primary key and the foreign key need to be defined explicitely. Otherwise neither will be added.
Reflection will fail if a table is locked with AccessExclusiveLock. In that case reflection will be skipped and a warning issued. Note that if the table is locked with an exclusive lock you won’t be able to access the data in any case.
In this version, indexes discovered by reflection are not propagated to the model class. This should not have any impact in performance.

reflect()[source]¶: Adds fields and indexes to the model using reflection.

class sdssdb.peewee.BaseModel(*args, **kwargs)[source]¶

Bases: Model

A custom peewee Model with enhanced representation and methods.

By default it always prints pk, name, and label, if found. Models can define they own print_fields in Meta as a list of field names to be output in the representation.

DoesNotExist¶: alias of BaseModelDoesNotExist

cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶

Returns a query with the rows inside a region on the sky.

Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns ra and dec. All units are expected to be in degrees.

Parameters:

ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If b=None, a circular search will be done with a as the radius.
b (float or None) – The semi-minor axis of the ellipse. If None, a circular cone search will be run. In that case, pa is ignored.
pa (float or None) – The parallactic angle of the ellipse.
ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.

SQLAlchemy¶

class sdssdb.sqlalchemy.BaseModel[source]¶

Bases: object

A custom sqlalchemy declarative Base

By default it always prints pk, name, and label, if found. Models can define they own print_fields as a list of field to be output in the representation. Works with field names nested inside other models as well.

cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶

Returns a query with the rows inside a region on the sky.

Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns ra and dec. All units are expected to be in degrees.

Parameters:

ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If b=None, a circular search will be done with a as the radius.
b (float or None) – The semi-minor axis of the ellipse. If None, a circular cone search will be run. In that case, pa is ignored.
pa (float or None) – The parallactic angle of the ellipse.
ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.

get_id()[source]¶: get the pk

print_fields = []¶: A list of fields (as strings) to be included in the __repr__

Utils¶

sdssdb.utils.ingest.bulk_insert(data, connection, model, chunk_size=100000, show_progress=False)[source]¶

Loads data into a DB table using bulk insert.

Parameters:

data (Table) – An astropy table with the data to insert.
connection (.PeeweeDatabaseConnection) – The Peewee database connection to use.
model (~peewee:Model) – The model representing the database table into which to insert the data.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If True, shows a progress bar. Requires the progressbar2 module to be installed.

sdssdb.utils.ingest.copy_data(data, connection, table_name, schema=None, chunk_size=10000, show_progress=False)[source]¶

Loads data into a DB table using COPY.

Parameters:

data (Table) – An astropy table whose column names and types will be used to create the model.
connection (.PeeweeDatabaseConnection) – The Peewee database connection to use.
table_name (str) – The name of the table.
schema (str) – The schema in which the table lives.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If True, shows a progress bar. Requires the progressbar2 module to be installed.

sdssdb.utils.ingest.create_adhoc_database(dbname, schema=None, profile='local')[source]¶

Creates an adhoc SQLA database and models, given an existing db

Creates an in-memory SQLA database connection given a database name to connect to, along with auto-generated models for the a given schema name. Currently limited to building models for one schema at a time. Useful for temporarily creating and trying a database connection, and simple models, without building and committing a full fledged new database connection.

Parameters:

dbname (str) – The name of the database to create a connection for
schema (str) – The name of the schema to create mappings for
profile (str) – The database profile to connect with

Returns:

A temporary database connection and module of model classes

Return type:

tuple

Example

>>> from sdssdb.utils.ingest import create_adhoc_database
>>> tempdb, models = create_adhoc_database('datamodel', schema='filespec')
>>> tempdb
>>> <DatamodelDatabaseConnection (dbname='datamodel', profile='local', connected=True)>
>>> models.File
>>> sqlalchemy.ext.automap.File

sdssdb.utils.ingest.create_model_from_table(table_name, table, schema=None, lowercase=False, primary_key=None)[source]¶

Returns a Model from the columns in a table.

Parameters:

table_name (str) – The name of the table.
table (Table) – An astropy table whose column names and types will be used to create the model.
schema (str) – The schema in which the table lives.
lowercase (bool) – If True, all column names will be converted to lower case.
primary_key (str) – The name of the column to mark as primary key.

sdssdb.utils.ingest.drop_table(table_name, connection, cascade=False, schema=None)[source]¶

Drops a table. Does nothing if the table does not exist.

Parameters:

table_name (str) – The name of the table to be dropped.
connection (.PeeweeDatabaseConnection) – The Peewee database connection to use.
cascade (bool) – Whether to drop related tables using CASCADE.
schema (str) – The schema in which the table lives.

Returns:

result – Returns True if the table was correctly dropped or False if the table does not exists and nothing was done.

Return type:

bool

sdssdb.utils.ingest.file_to_db(input_, connection, table_name, schema=None, lowercase=False, create=False, drop=False, truncate=False, primary_key=None, load_data=True, use_copy=True, chunk_size=100000, show_progress=False)[source]¶

Loads a table from a file to a database.

Loads a file or a Table object into a database. If create=True a new table will be created, with column types matching the table ones. All columns are initially defined as NULL.

By default, the data are loaded using the COPY method to optimise performance. This can be disabled if needed.

Parameters:

input (str or Table) – The path to a file that will be opened using Table.read or an astropy Table.
connection (.PeeweeDatabaseConnection) – The Peewee database connection to use (SQLAlchemy connections are not supported).
table_name (str) – The name of the table where to load the data, or to be created.
schema (str) – The schema in which the table lives.
lowercase (bool) – If True, all column names will be converted to lower case.
create (bool) – Creates the table if it does not exist.
drop (bool) – Drops the table before recreating it. Implies create=True. Note that a CASCADE drop will be executed. Use with caution.
truncate (bool) – Truncates the table before loading the data but maintains the existing columns.
primary_key (str) – The name of the column to mark as primary key (ignored if the table is not being created).
load_data (bool) – If True, loads the data from the table; otherwise just creates the table in the database.
use_copy (bool) – When True (recommended) uses the SQL COPY command to load the data from a CSV stream.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If True, shows a progress bar. Requires the progressbar2 module to be installed.

Returns:

model – The model for the table created.

Return type:

~peewee:Model

sdssdb.utils.ingest.to_csv(table, path, header=True, delimiter='\t', use_multiprocessing=False, workers=4)[source]¶

Creates a PostgreSQL-valid CSV file from a table, handling arrays.

Parameters:

table (astropy.table.Table) – The table to convert.
path (str) – The path to which to write the CSV file.
header (bool) – Whether to add a header with the column names.
delimiter (str) – The delimiter between columns in the CSV files.
use_multiprocessing (bool) – Whether to use multiple cores. The rows of the resulting file will not have the same ordering as the original table.
workers (int) – How many workers to use with multiprocessing.

sdssdb.utils.internals.get_cluster_index(connection, table_name=None, schema=None)[source]¶: Returns a tuple with the index on which a table has been clustered.

sdssdb.utils.internals.get_database_columns(database, schema=None)[source]¶

Returns database column metadata.

Queries the pg_catalog schema to retrieve column information for all the tables in a schema.

Parameters:

database (.PeeweeDatabaseConnection) – The database connection.
schema (str) – The schema to query. Defaults to the public schema.

Returns:

metadata – A mapping keyed by the table name. For each table the list of primary keys is accessible via the key pk, as well as the column metadata as columns. Each column metadata is a named tuple with attributes name, field_type (the Peewee column class), array_type, and nullable.

Return type:

dict

sdssdb.utils.internals.get_row_count(connection, table_name, schema=None, approximate=True)[source]¶

Returns the model row count.

Parameters:

connection (.PeeweeDatabaseConnection) – The database connection.
table (str) – The table name.
schema (str) – The schema in which the table lives.
approximate (bool) – If True, returns the approximate row count from the pg_class table (much faster). Otherwise calculates the exact count.

sdssdb.utils.internals.get_unclustered_tables(connection, schema=None)[source]¶: Lists tables not clustered.

sdssdb.utils.internals.is_table_locked(connection, table_name, schema=None)[source]¶: Returns the locks for a table or None if no lock is present.

sdssdb.utils.internals.vacuum(database, table_name, analyze=True, verbose=False, schema=None)[source]¶

Vacuums (and optionally analyses) a table.

Parameters:

database (.PeeweeDatabaseConnection) – A PeeWee connection to the database to vacuum.
table_name (str) – The table name.
analyze (bool) – Whether to run ANALYZE when vacuumming.
verbose (bool) – Whether to run in verbose mode.
schema (str) – The schema to vacuum. If None, vacuums the entire database.

sdssdb.utils.internals.vacuum_all(database, analyze=True, verbose=False, schema=None)[source]¶

Vacuums all the tables in a database or schema.

Parameters:

database (.PeeweeDatabaseConnection) – A PeeWee connection to the database to vacuum.
analyze (bool) – Whether to run ANALYZE when vacuumming.
verbose (bool) – Whether to run in verbose mode.
schema (str) – The schema to vacuum. If None, vacuums the entire database.

sdssdb.utils.registry.display_table(pprint: bool = None, mask_dups: bool = False, fill: str = '', **kwargs) → Type[Table][source]¶

Display sdssdb databases and schema as an Astropy Table

Displays the list of available sdssdb databases organized by ORM and includes the schema for each database. Produces a table with columns “orm”, “db”, and “schema”.

Parameters:

pprint (bool, optional) – Pretty print the Astropy Table, by default None
mask_dups (bool, optional) – If True, masks duplicate orm and db entries, by default False
fill (str, optional) – The column mask fill value, by default ‘’
kwargs – extra kwargs passed to Table.pprint

Returns:

an Astropy Table of sdssdb databases

Return type:

Table

Raises:

ImportError – when astropy is not installed

sdssdb.utils.registry.list_databases(orm: str = None, with_schema: bool = False) → dict | list[source]¶

Return a list of sdssdb databases

Returns a list of available databases in sdssdb. When no orm is specified, returns a dict of orm:database key:values. If with_schema is specified, also returns a list of schema for each database.

Parameters:

orm (str, optional) – The type of ORM to select on, by default None
with_schema (bool, optional) – If True, also includes the schemas for each database, by default False

Returns:

A list of databases for a given ORM or a dict of database:schema values or a dict of orm:database values

Return type:

Union[dict, list]

Raises:

TypeError – when input orm is not a string
ValueError – when input orm is not either peewee or sqlalchemy

sdssdb.utils.schemadisplay.create_schema_graph(models=None, base=None, schema=None, show_columns=True, show_pks=True, show_indices=True, show_datatypes=True, skip_tables=[], font='Bitstream-Vera Sans', graph_options={}, relation_options={})[source]¶

Creates a graph visualisation from a series of Peewee models.

Produces a pydot graph including the tables and relationships from a series of models or from a base model class.

Parameters:

models (list) – A list of Peewee models to be graphed.
base (Model) – A base model class. If passed, all the model classes that were created by subclassing from the base model will be used.
schema (str) – A schema name. If passed, will be used to limit the list of models or base subclasses to only the models that match the schema name.
show_columns (bool) – Whether to show the column names.
show_pks (bool) – Whether to show the primary key. Supersedes show_columns.
show_indices (bool) – Whether to show the indices from the table as separate rows.
show_datatypes (bool) – Whether to show the data type of each column.
skip_tables (list) – List of table names to skip.
font (str) – The name of the font to use.
graph_options (dict) – Options for creating the graph. Any valid Graphviz option.
relation_options (dict) – Additional parameters to be passed to pydot.Edge when creating the relationships.

Returns:

graph – A pydot.Dot object with the graph representation of the schema.

Return type:

pydot.Dot

Example

>>> graph = create_schema_graph([User, Tweet])
>>> graph.write_pdf('tweetdb.pdf')