API¶
Database connections¶
- class sdssdb.connection.DatabaseConnection(dbname=None, profile=None, autoconnect=None, dbversion=None)[source]¶
Bases:
object
A PostgreSQL database connection with profile and autoconnect features.
Provides a base class for PostgreSQL connections for either peewee or SQLAlchemy. The parameters for the connection can be passed directly (see
connect_from_parameters
) or, more conveniently, a profile can be used. By defaultdbname
is left undefined and needs to be passed when initiating the connection. This is useful for databases such asapodb/lcodb
for which the model classes are identical but the database name is not. For databases for which the database name is fixed (e.g.,sdss5db
), this class can be subclassed anddbname
overridden.- Parameters:
dbname (str) – The database name.
profile (str) – The configuration profile to use. The profile defines the default user, database server hostname, and port for a given location. If not provided, the profile is automatically determined based on the current domain, or defaults to
local
.autoconnect (bool or None) – Whether to autoconnect to the database using the profile parameters. Requites
dbname
to be set. IfNone
, whether to autoconnect is defined, in order, by the existence of an environment variable$SDSSDB_AUTOCONNECT
or bysdssdb.autoconnect
. If they are set to0
orfalse
the database won’t autoconnect. Note that this must be set before importing any model classes.dbversion (str) – A database version. If specified, appends to dbname as “dbname_dbversion” and becomes the dbname used for connection strings.
- become_admin(admin=None)[source]¶
Becomes the admin user.
If
admin=None
defaults to theadmin
value in the current profile.
- become_user(user=None)[source]¶
Becomes the read-only user.
If
user=None
defaults to theuser
value in the current profile.
- change_version(dbversion=None)[source]¶
Change database version and attempt to reconnect
- Parameters:
dbversion (str) – A database version
- connect(dbname=None, silent_on_fail=False, **connection_params)[source]¶
Initialises the database using the profile information.
- Parameters:
dbname (
str
orNone
) – The database name. IfNone
, defaults todbname
.user (str) – Overrides the profile database user.
host (str) – Overrides the profile database host.
port (str) – Overrides the profile database port.
silent_on_fail (
bool
) – IfTrue
, does not show a warning if the connection fails.
- Returns:
connected (bool) – Returns True if the database is connected.
- connect_from_parameters(dbname=None, **params)[source]¶
Initialises the database from a dictionary of parameters.
- auto_reflect = True¶
# Whether to call Model.reflect() in Peewee after a connection.
- abstract property connection_params¶
Returns a dictionary with the connection parameters.
- Returns:
connection_params (dict) – A dictionary with the
user
,host
, andpart
of the current connection. E.g.,{'user': 'sdssdb', 'host': 'sdss4-db', 'port': 5432}
- dbname = None¶
The database name.
- dbversion = None¶
Database version
- class sdssdb.connection.PeeweeDatabaseConnection(*args, **kwargs)[source]¶
Bases:
DatabaseConnection
,PostgresqlDatabase
Peewee database connection implementation.
- Variables:
models (list) – Models bound to this database. Only models that are bound using
BaseModel
are handled.
- get_fields(table_name, schema=None, cache=True)[source]¶
Returns a list of Peewee fields for a table.
- get_introspector(schema=None)[source]¶
Gets a Peewee database
Introspector
.
- get_primary_keys(table_name, schema=None, cache=True)[source]¶
Returns the primary keys for a table.
- property connected¶
Reports whether the connection is active.
- property connection_params¶
Returns a dictionary with the connection parameters.
- class sdssdb.connection.SQLADatabaseConnection(*args, **kwargs)[source]¶
Bases:
DatabaseConnection
SQLAlchemy database connection implementation
- create_engine(db_connection_string=None, echo=False, pool_size=10, pool_recycle=1800, expire_on_commit=True)[source]¶
Create a new database engine
Resets and creates a new sqlalchemy database engine. Also creates and binds engine metadata and a new scoped session.
- prepare_bases(base=None)[source]¶
Prepare a Model Base
Prepares a SQLalchemy Base for reflection. This binds a database engine to a specific Base which maps to a set of ModelClasses. If
base
is passed only that base will be prepared. Otherwise, all the bases bound to this database connection will be prepared.
- connected¶
Reports whether the connection is active.
- property connection_params¶
Returns a dictionary with the connection parameters.
Peewee¶
- class sdssdb.peewee.ReflectMeta(name, bases, attrs)[source]¶
Bases:
ModelBase
A metaclass that supports model reflection on demand.
This metaclass expands PeeWee’s
ModelBase
to provide a hook for declaring/expanding fields and indexes using the introspection system. The feature is enabled by a new attribute inMetadata
calleduse_reflection
(which is set toFalse
by default). When set toTrue
the metaclass extends the model using the fields and indexes discovered using reflection. It is possible to mix explicitely defined fields with discovered ones; the latter never override the former.Normally
ReflectMeta
is implemented by creating a base model that is then used to defined the table modelsclass ReflectBaseModel(peewee.Model, metaclass=ReflectMeta): class Meta: primary_key = False use_reflection = False database = database class Tycho2(ReflectBaseModel): class Meta: use_reflection = True schema = 'catalogdb' table_name = 'tycho2'
Note that
use_reflection
is inheritable so if set toTrue
in the base class that will affect to all subclasses, except if it’s overridden there. It’s also a good idea to setprimary_key=False
to prevent Peewee from creating anid
column automatically.If the database connection changes it’s possible to call
reflect
to rediscover the reflected fields for the new connection. This will remove all reflected fields (but not those explicitely added) and add the newly discovered ones.If the database class is
PeeweeDatabaseConnection
, the database will callreflect
for each model bound to the database each time it connects. This ensures that if the connection changes the reflected fields are updated. Note that this will not work if using Peewee’sPostgresqlDatabase
.By default,
ReflectMeta
will add all the fields from the reflected models, including foreign keys. Sometimes that is not desirable and it’s preferable to define the foreign keys explicitely. In that case it’s possible to disable the reflection of foreign keys by doingclass ReflectBaseModel(peewee.Model, metaclass=ReflectMeta): class Meta: primary_key = False use_reflection = False reflection_options = {'skip_foreign_keys': True} database = database
Foreign keys explicitely defined need to reference existing fields, so the referenced columns need to be added manually. In practice, this means that if you add a
ForeignKeyField
, the referenced field (usually the primary key) needs to be defined explicitely.The default Peewee reflection process requires multiple queries against the database for each table. This can become quite slow if the schema contains many tables or if the connection has significant overhead, for example when connected over an SSH tunnel. In these cases one can set
Meta
withreflection_options = {'use_peewee_reflection': False}
. This will use a reflection system that is designed to minimise the number of queries needed for schema introspection. Note that this system is experimental and doesn’t reflect foreign keys or constraints.Caveats:
Due to a bug in PeeWee primary keys are not discovered correctly if the user connected to the database does not have write access. In that case a composite key encompassing all the fields in the model is created. To avoid this, explicitely define the primary key in the model.
Many-to-many relationships need to be defined explicitely since it’s not possible to set the through model based on the reflected information.
When the primary key of a model is also a foreign key and
reflection_options = {'skip_foreign_keys': True}
, both the primary key and the foreign key need to be defined explicitely. Otherwise neither will be added.Reflection will fail if a table is locked with
AccessExclusiveLock
. In that case reflection will be skipped and a warning issued. Note that if the table is locked with an exclusive lock you won’t be able to access the data in any case.In this version, indexes discovered by reflection are not propagated to the model class. This should not have any impact in performance.
- class sdssdb.peewee.BaseModel(*args, **kwargs)[source]¶
Bases:
Model
A custom peewee
Model
with enhanced representation and methods.By default it always prints
pk
,name
, andlabel
, if found. Models can define they ownprint_fields
inMeta
as a list of field names to be output in the representation.- DoesNotExist¶
alias of
BaseModelDoesNotExist
- cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶
Returns a query with the rows inside a region on the sky.
Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns
ra
anddec
. All units are expected to be in degrees.- Parameters:
ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If
b=None
, a circular search will be done witha
as the radius.b (
float
orNone
) – The semi-minor axis of the ellipse. IfNone
, a circular cone search will be run. In that case,pa
is ignored.ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.
SQLAlchemy¶
- class sdssdb.sqlalchemy.BaseModel[source]¶
Bases:
object
A custom sqlalchemy declarative Base
By default it always prints
pk
,name
, andlabel
, if found. Models can define they ownprint_fields
as a list of field to be output in the representation. Works with field names nested inside other models as well.- cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶
Returns a query with the rows inside a region on the sky.
Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns
ra
anddec
. All units are expected to be in degrees.- Parameters:
ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If
b=None
, a circular search will be done witha
as the radius.b (
float
orNone
) – The semi-minor axis of the ellipse. IfNone
, a circular cone search will be run. In that case,pa
is ignored.ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.
- print_fields = []¶
A list of fields (as strings) to be included in the
__repr__
Utils¶
- sdssdb.utils.ingest.bulk_insert(data, connection, model, chunk_size=100000, show_progress=False)[source]¶
Loads data into a DB table using bulk insert.
- Parameters:
data (Table) – An astropy table with the data to insert.
connection (.PeeweeDatabaseConnection) – The Peewee database connection to use.
model (~peewee:Model) – The model representing the database table into which to insert the data.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If
True
, shows a progress bar. Requires the progressbar2 module to be installed.
- sdssdb.utils.ingest.copy_data(data, connection, table_name, schema=None, chunk_size=10000, show_progress=False)[source]¶
Loads data into a DB table using
COPY
.- Parameters:
data (Table) – An astropy table whose column names and types will be used to create the model.
connection (.PeeweeDatabaseConnection) – The Peewee database connection to use.
table_name (str) – The name of the table.
schema (str) – The schema in which the table lives.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If
True
, shows a progress bar. Requires the progressbar2 module to be installed.
- sdssdb.utils.ingest.create_adhoc_database(dbname, schema=None, profile='local')[source]¶
Creates an adhoc SQLA database and models, given an existing db
Creates an in-memory SQLA database connection given a database name to connect to, along with auto-generated models for the a given schema name. Currently limited to building models for one schema at a time. Useful for temporarily creating and trying a database connection, and simple models, without building and committing a full fledged new database connection.
- Parameters:
- Returns:
tuple – A temporary database connection and module of model classes
Example
>>> from sdssdb.utils.ingest import create_adhoc_database >>> tempdb, models = create_adhoc_database('datamodel', schema='filespec') >>> tempdb >>> <DatamodelDatabaseConnection (dbname='datamodel', profile='local', connected=True)> >>> models.File >>> sqlalchemy.ext.automap.File
- sdssdb.utils.ingest.create_model_from_table(table_name, table, schema=None, lowercase=False, primary_key=None)[source]¶
Returns a
Model
from the columns in a table.- Parameters:
table_name (str) – The name of the table.
table (Table) – An astropy table whose column names and types will be used to create the model.
schema (str) – The schema in which the table lives.
lowercase (bool) – If
True
, all column names will be converted to lower case.primary_key (str) – The name of the column to mark as primary key.
- sdssdb.utils.ingest.drop_table(table_name, connection, cascade=False, schema=None)[source]¶
Drops a table. Does nothing if the table does not exist.
- sdssdb.utils.ingest.file_to_db(input_, connection, table_name, schema=None, lowercase=False, create=False, drop=False, truncate=False, primary_key=None, load_data=True, use_copy=True, chunk_size=100000, show_progress=False)[source]¶
Loads a table from a file to a database.
Loads a file or a
Table
object into a database. Ifcreate=True
a new table will be created, with column types matching the table ones. All columns are initially defined asNULL
.By default, the data are loaded using the
COPY
method to optimise performance. This can be disabled if needed.- Parameters:
input (str or Table) – The path to a file that will be opened using
Table.read
or an astropyTable
.connection (.PeeweeDatabaseConnection) – The Peewee database connection to use (SQLAlchemy connections are not supported).
table_name (str) – The name of the table where to load the data, or to be created.
schema (str) – The schema in which the table lives.
lowercase (bool) – If
True
, all column names will be converted to lower case.create (bool) – Creates the table if it does not exist.
drop (bool) – Drops the table before recreating it. Implies
create=True
. Note that aCASCADE
drop will be executed. Use with caution.truncate (bool) – Truncates the table before loading the data but maintains the existing columns.
primary_key (str) – The name of the column to mark as primary key (ignored if the table is not being created).
load_data (bool) – If
True
, loads the data from the table; otherwise just creates the table in the database.use_copy (bool) – When
True
(recommended) uses the SQLCOPY
command to load the data from a CSV stream.chunk_size (int) – How many rows to load at once.
show_progress (bool) – If
True
, shows a progress bar. Requires theprogressbar2
module to be installed.
- Returns:
model (~peewee:Model) – The model for the table created.
- sdssdb.utils.ingest.to_csv(table, path, header=True, delimiter='\t', use_multiprocessing=False, workers=4)[source]¶
Creates a PostgreSQL-valid CSV file from a table, handling arrays.
- Parameters:
table (astropy.table.Table) – The table to convert.
path (str) – The path to which to write the CSV file.
header (bool) – Whether to add a header with the column names.
delimiter (str) – The delimiter between columns in the CSV files.
use_multiprocessing (bool) – Whether to use multiple cores. The rows of the resulting file will not have the same ordering as the original table.
workers (int) – How many workers to use with multiprocessing.
- sdssdb.utils.internals.get_cluster_index(connection, table_name=None, schema=None)[source]¶
Returns a tuple with the index on which a table has been clustered.
- sdssdb.utils.internals.get_database_columns(database, schema=None)[source]¶
Returns database column metadata.
Queries the
pg_catalog
schema to retrieve column information for all the tables in a schema.- Parameters:
database (.PeeweeDatabaseConnection) – The database connection.
schema (str) – The schema to query. Defaults to the public schema.
- Returns:
metadata (
dict
) – A mapping keyed by the table name. For each table the list of primary keys is accessible via the keypk
, as well as the column metadata ascolumns
. Each column metadata is a named tuple with attributesname
,field_type
(the Peewee column class),array_type
, andnullable
.
- sdssdb.utils.internals.get_row_count(connection, table_name, schema=None, approximate=True)[source]¶
Returns the model row count.
- Parameters:
- sdssdb.utils.internals.get_unclustered_tables(connection, schema=None)[source]¶
Lists tables not clustered.
- sdssdb.utils.internals.is_table_locked(connection, table_name, schema=None)[source]¶
Returns the locks for a table or
None
if no lock is present.
- sdssdb.utils.internals.vacuum(database, table_name, analyze=True, verbose=False, schema=None)[source]¶
Vacuums (and optionally analyses) a table.
- Parameters:
database (.PeeweeDatabaseConnection) – A PeeWee connection to the database to vacuum.
table_name (str) – The table name.
analyze (bool) – Whether to run
ANALYZE
when vacuumming.verbose (bool) – Whether to run in verbose mode.
schema (str) – The schema to vacuum. If
None
, vacuums the entire database.
- sdssdb.utils.internals.vacuum_all(database, analyze=True, verbose=False, schema=None)[source]¶
Vacuums all the tables in a database or schema.
- sdssdb.utils.registry.display_table(pprint: bool = None, mask_dups: bool = False, fill: str = '', **kwargs) Type[Table] [source]¶
Display sdssdb databases and schema as an Astropy Table
Displays the list of available sdssdb databases organized by ORM and includes the schema for each database. Produces a table with columns “orm”, “db”, and “schema”.
- Parameters:
- Returns:
Table
– an Astropy Table of sdssdb databases- Raises:
ImportError – when astropy is not installed
- sdssdb.utils.registry.list_databases(orm: str = None, with_schema: bool = False) dict | list [source]¶
Return a list of sdssdb databases
Returns a list of available databases in sdssdb. When no orm is specified, returns a dict of orm:database key:values. If with_schema is specified, also returns a list of schema for each database.
- Parameters:
- Returns:
Union[dict, list] – A list of databases for a given ORM or a dict of database:schema values or a dict of orm:database values
- Raises:
TypeError – when input orm is not a string
ValueError – when input orm is not either peewee or sqlalchemy
- sdssdb.utils.schemadisplay.create_schema_graph(models=None, base=None, schema=None, show_columns=True, show_pks=True, show_indices=True, show_datatypes=True, skip_tables=[], font='Bitstream-Vera Sans', graph_options={}, relation_options={})[source]¶
Creates a graph visualisation from a series of Peewee models.
Produces a pydot graph including the tables and relationships from a series of models or from a base model class.
- Parameters:
base (Model) – A base model class. If passed, all the model classes that were created by subclassing from the base model will be used.
schema (str) – A schema name. If passed, will be used to limit the list of models or
base
subclasses to only the models that match the schema name.show_columns (bool) – Whether to show the column names.
show_pks (bool) – Whether to show the primary key. Supersedes
show_columns
.show_indices (bool) – Whether to show the indices from the table as separate rows.
show_datatypes (bool) – Whether to show the data type of each column.
skip_tables (list) – List of table names to skip.
font (str) – The name of the font to use.
graph_options (dict) – Options for creating the graph. Any valid Graphviz option.
relation_options (dict) – Additional parameters to be passed to
pydot.Edge
when creating the relationships.
- Returns:
graph (
pydot.Dot
) – Apydot.Dot
object with the graph representation of the schema.
Example
>>> graph = create_schema_graph([User, Tweet]) >>> graph.write_pdf('tweetdb.pdf')