API¶
Database connections¶
-
class
sdssdb.connection.DatabaseConnection(dbname=None, profile=None, autoconnect=True, dbversion=None)[source]¶ Bases:
objectA PostgreSQL database connection with profile and autoconnect features.
Provides a base class for PostgreSQL connections for either peewee or SQLAlchemy. The parameters for the connection can be passed directly (see
connect_from_parameters) or, more conveniently, a profile can be used. By defaultdbnameis left undefined and needs to be passed when initiating the connection. This is useful for databases such asapodb/lcodbfor which the model classes are identical but the database name is not. For databases for which the database name is fixed (e.g.,sdss5db), this class can be subclassed anddbnameoverridden.- Parameters
dbname (str) – The database name.
profile (str) – The configuration profile to use. The profile defines the default user, database server hostname, and port for a given location. If not provided, the profile is automatically determined based on the current domain, or defaults to
local.autoconnect (bool) – Whether to autoconnect to the database using the profile parameters. Requites
dbnameto be set.dbversion (str) – A database version. If specified, appends to dbname as “dbname_dbversion” and becomes the dbname used for connection strings.
-
change_version(dbversion=None)[source]¶ Change database version and attempt to reconnect
- Parameters
dbversion (str) – A database version
-
connect(dbname=None, silent_on_fail=False, **connection_params)[source]¶ Initialises the database using the profile information.
- Parameters
dbname (
strorNone) – The database name. IfNone, defaults todbname.user (str) – Overrides the profile database user.
host (str) – Overrides the profile database host.
port (str) – Overrides the profile database port.
silent_on_fail (
bool) – IfTrue, does not show a warning if the connection fails.
- Returns
connected (bool) – Returns True if the database is connected.
-
connect_from_parameters(dbname=None, **params)[source]¶ Initialises the database from a dictionary of parameters.
-
connected= None¶ Reports whether the connection is active.
-
abstract property
connection_params¶ Returns a dictionary with the connection parameters.
- Returns
connection_params (dict) – A dictionary with the
user,host, andpartof the current connection. E.g.,{'user': 'sdssdb', 'host': 'sdss4-db', 'port': 5432}
-
dbname= None¶ The database name.
-
class
sdssdb.connection.PeeweeDatabaseConnection(*args, **kwargs)[source]¶ Bases:
sdssdb.connection.DatabaseConnection,peewee.PostgresqlDatabasePeewee database connection implementation.
- Variables
models (list) – Models bound to this database. Only models that are bound using
BaseModelare handled.
-
property
connection_params¶ Returns a dictionary with the connection parameters.
-
class
sdssdb.connection.SQLADatabaseConnection(*args, **kwargs)[source]¶ Bases:
sdssdb.connection.DatabaseConnectionSQLAlchemy database connection implementation
-
create_engine(db_connection_string=None, echo=False, pool_size=10, pool_recycle=1800, expire_on_commit=True)[source]¶ Create a new database engine
Resets and creates a new sqlalchemy database engine. Also creates and binds engine metadata and a new scoped session.
-
prepare_bases(base=None)[source]¶ Prepare a Model Base
Prepares a SQLalchemy Base for reflection. This binds a database engine to a specific Base which maps to a set of ModelClasses. If
baseis passed only that base will be prepared. Otherwise, all the bases bound to this database connection will be prepared.
-
property
connection_params¶ Returns a dictionary with the connection parameters.
-
Peewee¶
-
class
sdssdb.peewee.ReflectMeta[source]¶ Bases:
peewee.ModelBaseA metaclass that supports model reflection on demand.
This metaclass expands PeeWee’s
ModelBaseto provide a hook for declaring/expanding fields and indexes using the introspection system. The feature is enabled by a new attribute inMetadatacalleduse_reflection(which is set toFalseby default). When set toTruethe metaclass extends the model using the fields and indexes discovered using reflection. It is possible to mix explicitely defined fields with discovered ones; the latter never override the former.Normally
ReflectMetais implemented by creating a base model that is then used to defined the table modelsclass ReflectBaseModel(peewee.Model, metaclass=ReflectMeta): class Meta: primary_key = False use_reflection = False database = database class Tycho2(ReflectBaseModel): class Meta: use_reflection = True schema = 'catalogdb' table_name = 'tycho2'
Note that
use_reflectionis inheritable so if set toTruein the base class that will affect to all subclasses, except if it’s overridden there. It’s also a good idea to setprimary_key=Falseto prevent Peewee from creating anidcolumn automatically.If the database connection changes it’s possible to call
reflectto rediscover the reflected fields for the new connection. This will remove all reflected fields (but not those explicitely added) and add the newly discovered ones.If the database class is
PeeweeDatabaseConnection, the database will callreflectfor each model bound to the database each time it connects. This ensures that if the connection changes the reflected fields are updated. Note that this will not work if using Peewee’sPostgresqlDatabase.By default,
ReflectMetawill add all the fields from the reflected models, including foreign keys. Sometimes that is not desirable and it’s preferable to define the foreign keys explicitely. In that case it’s possible to disable the reflection of foreign keys by doingclass ReflectBaseModel(peewee.Model, metaclass=ReflectMeta): class Meta: primary_key = False use_reflection = False reflection_options = {'skip_foreign_keys': True} database = database
Foreign keys explicitely defined need to reference existing fields, so the referenced columns need to be added manually. In practice, this means that if you add a
ForeignKeyField, the referenced field (usually the primary key) needs to be defined explicitely.Caveats:
Many-to-many relationships need to be defined explicitely since it’s not possible to set the through model based on the reflected information.
When the primary key of a model is also a foreign key and
reflection_options = {'skip_foreign_keys': True}, both the primary key and the foreign key need to be defined explicitely. Otherwise neither will be added.Reflection will fail if a table is locked with
AccessExclusiveLock. In that case reflection will be skipped and a warning issued. Note that if the table is locked with an exclusive lock you won’t be able to access the data in any case.In this version, indexes discovered by reflection are not propagated to the model class. This should not have any impact in performance.
-
class
sdssdb.peewee.BaseModel(*args, **kwargs)[source]¶ Bases:
peewee.ModelA custom peewee
Modelwith enhanced representation and methods.By default it always prints
pk,name, andlabel, if found. Models can define they ownprint_fieldsinMetaas a list of field names to be output in the representation.-
DoesNotExist¶ alias of
BaseModelDoesNotExist
-
cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶ Returns a query with the rows inside a region on the sky.
Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns
raanddec. All units are expected to be in degrees.- Parameters
ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If
b=None, a circular search will be done withaas the radius.b (
floatorNone) – The semi-minor axis of the ellipse. IfNone, a circular cone search will be run. In that case,pais ignored.ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.
-
SQLAlchemy¶
-
class
sdssdb.sqlalchemy.BaseModel[source]¶ Bases:
objectA custom sqlalchemy declarative Base
By default it always prints
pk,name, andlabel, if found. Models can define they ownprint_fieldsas a list of field to be output in the representation. Works with field names nested inside other models as well.-
cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶ Returns a query with the rows inside a region on the sky.
Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns
raanddec. All units are expected to be in degrees.- Parameters
ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If
b=None, a circular search will be done withaas the radius.b (
floatorNone) – The semi-minor axis of the ellipse. IfNone, a circular cone search will be run. In that case,pais ignored.ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.
-
print_fields= []¶ A list of fields (as strings) to be included in the
__repr__
-
Utils¶
-
sdssdb.utils.ingest.to_csv(table, path, header=True, delimiter='\t', use_multiprocessing=False, workers=4)[source]¶ Creates a PostgreSQL-valid CSV file from a table, handling arrays.
- Parameters
table (astropy.table.Table) – The table to convert.
path (str) – The path to which to write the CSV file.
header (bool) – Whether to add a header with the column names.
delimiter (str) – The delimiter between columns in the CSV files.
use_multiprocessing (bool) – Whether to use multiple cores. The rows of the resulting file will not have the same ordering as the original table.
workers (int) – How many workers to use with multiprocessing.
-
sdssdb.utils.ingest.copy_data(data, connection, table_name, schema=None, chunk_size=10000, show_progress=False)[source]¶ Loads data into a DB table using
COPY.- Parameters
data (Table) – An astropy table whose column names and types will be used to create the model.
connection (PeeweeDatabaseConnection) – The Peewee database connection to use.
table_name (str) – The name of the table.
schema (str) – The schema in which the table lives.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If
True, shows a progress bar. Requires the progressbar2 module to be installed.
-
sdssdb.utils.ingest.drop_table(table_name, connection, cascade=False, schema=None)[source]¶ Drops a table. Does nothing if the table does not exist.
- Parameters
table_name (str) – The name of the table to be dropped.
connection (PeeweeDatabaseConnection) – The Peewee database connection to use.
cascade (bool) – Whether to drop related tables using
CASCADE.schema (str) – The schema in which the table lives.
- Returns
result (bool) – Returns
Trueif the table was correctly dropped orFalseif the table does not exists and nothing was done.
-
sdssdb.utils.ingest.create_model_from_table(table_name, table, schema=None, lowercase=False, primary_key=None)[source]¶ Returns a
Modelfrom the columns in a table.- Parameters
table_name (str) – The name of the table.
table (Table) – An astropy table whose column names and types will be used to create the model.
schema (str) – The schema in which the table lives.
lowercase (bool) – If
True, all column names will be converted to lower case.primary_key (str) – The name of the column to mark as primary key.
-
sdssdb.utils.ingest.bulk_insert(data, connection, model, chunk_size=100000, show_progress=False)[source]¶ Loads data into a DB table using bulk insert.
- Parameters
data (Table) – An astropy table with the data to insert.
connection (PeeweeDatabaseConnection) – The Peewee database connection to use.
model (~peewee:Model) – The model representing the database table into which to insert the data.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If
True, shows a progress bar. Requires the progressbar2 module to be installed.
-
sdssdb.utils.ingest.file_to_db(input_, connection, table_name, schema=None, lowercase=False, create=False, drop=False, truncate=False, primary_key=None, load_data=True, use_copy=True, chunk_size=100000, show_progress=False)[source]¶ Loads a table from a file to a database.
Loads a file or a
Tableobject into a database. Ifcreate=Truea new table will be created, with column types matching the table ones. All columns are initially defined asNULL.By default, the data are loaded using the
COPYmethod to optimise performance. This can be disabled if needed.- Parameters
input_ (str or Table) – The path to a file that will be opened using
Table.reador an astropyTable.connection (PeeweeDatabaseConnection) – The Peewee database connection to use (SQLAlchemy connections are not supported).
table_name (str) – The name of the table where to load the data, or to be created.
schema (str) – The schema in which the table lives.
lowercase (bool) – If
True, all column names will be converted to lower case.create (bool) – Creates the table if it does not exist.
drop (bool) – Drops the table before recreating it. Implies
create=True. Note that aCASCADEdrop will be executed. Use with caution.truncate (bool) – Truncates the table before loading the data but maintains the existing columns.
primary_key (str) – The name of the column to mark as primary key (ignored if the table is not being created).
load_data (bool) – If
True, loads the data from the table; otherwise just creates the table in the database.use_copy (bool) – When
True(recommended) uses the SQLCOPYcommand to load the data from a CSV stream.chunk_size (int) – How many rows to load at once.
show_progress (bool) – If
True, shows a progress bar. Requires theprogressbar2module to be installed.
- Returns
model (~peewee:Model) – The model for the table created.
-
sdssdb.utils.ingest.create_adhoc_database(dbname, schema=None, profile='local')[source]¶ Creates an adhoc SQLA database and models, given an existing db
Creates an in-memory SQLA database connection given a database name to connect to, along with auto-generated models for the a given schema name. Currently limited to building models for one schema at a time. Useful for temporarily creating and trialing a database connection, and simple models, without building and committing a full fledged new database connection.
- Parameters
- Returns
A temporary database connection and module of model classes
Example
>>> from sdssdb.utils.database import create_adhoc_database >>> tempdb, models = create_adhoc_database('datamodel', schema='filespec') >>> tempdb >>> <DatamodelDatabaseConnection (dbname='datamodel', profile='local', connected=True)> >>> models.File >>> sqlalchemy.ext.automap.File
-
sdssdb.utils.internals.vacuum(database, table_name, analyze=True, verbose=False, schema=None)[source]¶ Vacuums (and optionally analyses) a table.
- Parameters
database (PeeweeDatabaseConnection) – A PeeWee connection to the database to vacuum.
table_name (str) – The table name.
analyze (bool) – Whether to run
ANALYZEwhen vacuumming.verbose (bool) – Whether to run in verbose mode.
schema (str) – The schema to vacuum. If
None, vacuums the entire database.
-
sdssdb.utils.internals.vacuum_all(database, analyze=True, verbose=False, schema=None)[source]¶ Vacuums all the tables in a database or schema.
- Parameters
database (PeeweeDatabaseConnection) – A PeeWee connection to the database to vacuum.
analyze (bool) – Whether to run
ANALYZEwhen vacuumming.verbose (bool) – Whether to run in verbose mode.
schema (str) – The schema to vacuum. If
None, vacuums the entire database.
-
sdssdb.utils.internals.get_cluster_index(connection, table_name=None, schema=None)[source]¶ Returns a tuple with the index on which a table has been clustered.
-
sdssdb.utils.internals.get_unclustered_tables(connection, schema=None)[source]¶ Lists tables not clustered.
-
sdssdb.utils.internals.get_row_count(connection, table_name, schema=None, approximate=True)[source]¶ Returns the model row count.
- Parameters
connection (PeeweeDatabaseConnection) – The database connection.
table (str) – The table name.
schema (str) – The schema in which the table lives.
approximate (bool) – If True, returns the approximate row count from the
pg_classtable (much faster). Otherwise calculates the exact count.
-
sdssdb.utils.internals.is_table_locked(connection, table_name, schema=None)[source]¶ Returns the locks for a table or
Noneif no lock is present.
-
sdssdb.utils.create_schema_graph(models=None, base=None, schema=None, show_columns=True, show_pks=True, show_indices=True, show_datatypes=True, skip_tables=[], font='Bitstream-Vera Sans', graph_options={}, relation_options={})[source]¶ Creates a graph visualisation from a series of Peewee models.
Produces a pydot graph including the tables and relationships from a series of models or from a base model class.
- Parameters
base (Model) – A base model class. If passed, all the model classes that were created by subclassing from the base model will be used.
schema (str) – A schema name. If passed, will be used to limit the list of models or
basesubclasses to only the models that match the schema name.show_columns (bool) – Whether to show the column names.
show_pks (bool) – Whether to show the primary key. Supersedes
show_columns.show_indices (bool) – Whether to show the indices from the table as separate rows.
show_datatypes (bool) – Whether to show the data type of each column.
skip_tables (list) – List of table names to skip.
font (str) – The name of the font to use.
graph_options (dict) – Options for creating the graph. Any valid Graphviz option.
relation_options (dict) – Additional parameters to be passed to
pydot.Edgewhen creating the relationships.
- Returns
graph (
pydot.Dot) – Apydot.Dotobject with the graph representation of the schema.
Example
>>> graph = create_schema_graph([User, Tweet]) >>> graph.write_pdf('tweetdb.pdf')