API¶
Database connections¶
- class sdssdb.connection.DatabaseConnection(dbname=None, profile=None, autoconnect=None, dbversion=None, use_psycopg3=None)[source]¶
Bases:
objectA PostgreSQL database connection with profile and autoconnect features.
Provides a base class for PostgreSQL connections for either peewee or SQLAlchemy. The parameters for the connection can be passed directly (see
connect_from_parameters) or, more conveniently, a profile can be used. By defaultdbnameis left undefined and needs to be passed when initiating the connection. This is useful for databases such asapodb/lcodbfor which the model classes are identical but the database name is not. For databases for which the database name is fixed (e.g.,sdss5db), this class can be subclassed anddbnameoverridden.- Parameters:
dbname (str) – The database name.
profile (str) – The configuration profile to use. The profile defines the default user, database server hostname, and port for a given location. If not provided, the profile is automatically determined based on the current domain, or defaults to
local.autoconnect (bool or None) – Whether to autoconnect to the database using the profile parameters. Requites
dbnameto be set. IfNone, whether to autoconnect is defined, in order, by the existence of an environment variable$SDSSDB_AUTOCONNECTor bysdssdb.autoconnect. If they are set to0orfalsethe database won’t autoconnect. Note that this must be set before importing any model classes.dbversion (str) – A database version. If specified, appends to dbname as “dbname_dbversion” and becomes the dbname used for connection strings.
use_psycopg3 (bool) – Whether to use psycopg3 instead of psycopg2. If
None, defaults to the value of the environment variable$SDSSDB_PSYCOPG3(which defaults toTrueif not set).
- become_admin(admin=None)[source]¶
Becomes the admin user.
If
admin=Nonedefaults to theadminvalue in the current profile.
- become_user(user=None)[source]¶
Becomes the read-only user.
If
user=Nonedefaults to theuservalue in the current profile.
- change_version(dbversion=None)[source]¶
Change database version and attempt to reconnect
- Parameters:
dbversion (str) – A database version
- connect(dbname=None, silent_on_fail=False, **connection_params)[source]¶
Initialises the database using the profile information.
- Parameters:
dbname (
strorNone) – The database name. IfNone, defaults todbname.dbnamecan also be a full database URI, in which case the other connection parameters are ignored.user (str) – Overrides the profile database user.
host (str) – Overrides the profile database host.
port (str) – Overrides the profile database port.
silent_on_fail (
bool) – IfTrue, does not show a warning if the connection fails.
- Returns:
connected – Returns True if the database is connected.
- Return type:
- connect_from_parameters(dbname=None, **params)[source]¶
Initialises the database from a dictionary of parameters.
- set_profile(profile=None, connect=True, **params)[source]¶
Sets the profile from the configuration file.
- Parameters:
- Returns:
connected – Returns True if the database is connected.
- Return type:
- auto_reflect = True¶
# Whether to call Model.reflect() in Peewee after a connection.
- abstract property connection_params: dict | None¶
Returns a dictionary with the connection parameters.
- Returns:
connection_params – A dictionary with the
user,host, andpartof the current connection. E.g.,{'user': 'sdssdb', 'host': 'sdss4-db', 'port': 5432}- Return type:
- dbname = None¶
The database name.
- dbversion = None¶
Database version
- class sdssdb.connection.PeeweeDatabaseConnection(*args, **kwargs)[source]¶
Bases:
DatabaseConnection,PostgresqlDatabasePeewee database connection implementation.
- models¶
Models bound to this database. Only models that are bound using
BaseModelare handled.- Type:
- get_fields(table_name, schema=None, cache=True)[source]¶
Returns a list of Peewee fields for a table.
- get_primary_keys(table_name, schema=None, cache=True)[source]¶
Returns the primary keys for a table.
- property connected¶
Reports whether the connection is active.
- property connection_params¶
Returns a dictionary with the connection parameters.
- property psycopg_version¶
Returns the version of psycopg in use.
- class sdssdb.connection.SQLADatabaseConnection(*args, **kwargs)[source]¶
Bases:
DatabaseConnectionSQLAlchemy database connection implementation
- create_engine(db_connection_string=None, echo=False, pool_size=10, pool_recycle=1800, expire_on_commit=True)[source]¶
Create a new database engine
Resets and creates a new sqlalchemy database engine. Also creates and binds engine metadata and a new scoped session.
- prepare_bases(base=None)[source]¶
Prepare a Model Base
Prepares a SQLalchemy Base for reflection. This binds a database engine to a specific Base which maps to a set of ModelClasses. If
baseis passed only that base will be prepared. Otherwise, all the bases bound to this database connection will be prepared.
- connected¶
Reports whether the connection is active.
- property connection_params¶
Returns a dictionary with the connection parameters.
Peewee¶
- class sdssdb.peewee.ReflectMeta(name, bases, attrs)[source]¶
Bases:
ModelBaseA metaclass that supports model reflection on demand.
This metaclass expands PeeWee’s
ModelBaseto provide a hook for declaring/expanding fields and indexes using the introspection system. The feature is enabled by a new attribute inMetadatacalleduse_reflection(which is set toFalseby default). When set toTruethe metaclass extends the model using the fields and indexes discovered using reflection. It is possible to mix explicitely defined fields with discovered ones; the latter never override the former.Normally
ReflectMetais implemented by creating a base model that is then used to defined the table modelsclass ReflectBaseModel(peewee.Model, metaclass=ReflectMeta): class Meta: primary_key = False use_reflection = False database = database class Tycho2(ReflectBaseModel): class Meta: use_reflection = True schema = 'catalogdb' table_name = 'tycho2'
Note that
use_reflectionis inheritable so if set toTruein the base class that will affect to all subclasses, except if it’s overridden there. It’s also a good idea to setprimary_key=Falseto prevent Peewee from creating anidcolumn automatically.If the database connection changes it’s possible to call
reflectto rediscover the reflected fields for the new connection. This will remove all reflected fields (but not those explicitely added) and add the newly discovered ones.If the database class is
PeeweeDatabaseConnection, the database will callreflectfor each model bound to the database each time it connects. This ensures that if the connection changes the reflected fields are updated. Note that this will not work if using Peewee’sPostgresqlDatabase.By default,
ReflectMetawill add all the fields from the reflected models, including foreign keys. Sometimes that is not desirable and it’s preferable to define the foreign keys explicitely. In that case it’s possible to disable the reflection of foreign keys by doingclass ReflectBaseModel(peewee.Model, metaclass=ReflectMeta): class Meta: primary_key = False use_reflection = False reflection_options = {'skip_foreign_keys': True} database = database
Foreign keys explicitely defined need to reference existing fields, so the referenced columns need to be added manually. In practice, this means that if you add a
ForeignKeyField, the referenced field (usually the primary key) needs to be defined explicitely.The default Peewee reflection process requires multiple queries against the database for each table. This can become quite slow if the schema contains many tables or if the connection has significant overhead, for example when connected over an SSH tunnel. In these cases one can set
Metawithreflection_options = {'use_peewee_reflection': False}. This will use a reflection system that is designed to minimise the number of queries needed for schema introspection. Note that this system is experimental and doesn’t reflect foreign keys or constraints.Caveats:
Due to a bug in PeeWee primary keys are not discovered correctly if the user connected to the database does not have write access. In that case a composite key encompassing all the fields in the model is created. To avoid this, explicitely define the primary key in the model.
Many-to-many relationships need to be defined explicitely since it’s not possible to set the through model based on the reflected information.
When the primary key of a model is also a foreign key and
reflection_options = {'skip_foreign_keys': True}, both the primary key and the foreign key need to be defined explicitely. Otherwise neither will be added.Reflection will fail if a table is locked with
AccessExclusiveLock. In that case reflection will be skipped and a warning issued. Note that if the table is locked with an exclusive lock you won’t be able to access the data in any case.In this version, indexes discovered by reflection are not propagated to the model class. This should not have any impact in performance.
- class sdssdb.peewee.BaseModel(*args, **kwargs)[source]¶
Bases:
ModelA custom peewee
Modelwith enhanced representation and methods.By default it always prints
pk,name, andlabel, if found. Models can define they ownprint_fieldsinMetaas a list of field names to be output in the representation.- DoesNotExist¶
alias of
BaseModelDoesNotExist
- cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶
Returns a query with the rows inside a region on the sky.
Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns
raanddec. All units are expected to be in degrees.- Parameters:
ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If
b=None, a circular search will be done withaas the radius.b (
floatorNone) – The semi-minor axis of the ellipse. IfNone, a circular cone search will be run. In that case,pais ignored.ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.
SQLAlchemy¶
- class sdssdb.sqlalchemy.BaseModel[source]¶
Bases:
objectA custom sqlalchemy declarative Base
By default it always prints
pk,name, andlabel, if found. Models can define they ownprint_fieldsas a list of field to be output in the representation. Works with field names nested inside other models as well.- cone_search(ra, dec, a, b=None, pa=None, ra_col='ra', dec_col='dec')[source]¶
Returns a query with the rows inside a region on the sky.
Defines a sky ellipse and returns the targets within. By default it assumes that the table contains two columns
raanddec. All units are expected to be in degrees.- Parameters:
ra (float) – The R.A. of the centre of the ellipse.
dec (float) – The declination of the centre of the ellipse.
a (float) – Defines the semi-major axis of the ellipse for the cone search. If
b=None, a circular search will be done withaas the radius.b (
floatorNone) – The semi-minor axis of the ellipse. IfNone, a circular cone search will be run. In that case,pais ignored.ra_col (str) – The name of the column with the RA value.
dec_col (str) – The name of the column with the Dec value.
- print_fields = []¶
A list of fields (as strings) to be included in the
__repr__
Utils¶
- sdssdb.utils.ingest.bulk_insert(data, connection, model, chunk_size=100000, show_progress=False)[source]¶
Loads data into a DB table using bulk insert.
- Parameters:
data (Table) – An astropy table with the data to insert.
connection (.PeeweeDatabaseConnection) – The Peewee database connection to use.
model (~peewee:Model) – The model representing the database table into which to insert the data.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If
True, shows a progress bar. Requires the progressbar2 module to be installed.
- sdssdb.utils.ingest.copy_data(data, connection, table_name, schema=None, chunk_size=10000, show_progress=False)[source]¶
Loads data into a DB table using
COPY.- Parameters:
data (Table) – An astropy table whose column names and types will be used to create the model.
connection (.PeeweeDatabaseConnection) – The Peewee database connection to use.
table_name (str) – The name of the table.
schema (str) – The schema in which the table lives.
chunk_size (int) – How many rows to load at once.
show_progress (bool) – If
True, shows a progress bar. Requires the progressbar2 module to be installed.
- sdssdb.utils.ingest.create_adhoc_database(dbname, schema=None, profile='local')[source]¶
Creates an adhoc SQLA database and models, given an existing db
Creates an in-memory SQLA database connection given a database name to connect to, along with auto-generated models for the a given schema name. Currently limited to building models for one schema at a time. Useful for temporarily creating and trying a database connection, and simple models, without building and committing a full fledged new database connection.
- Parameters:
- Returns:
A temporary database connection and module of model classes
- Return type:
Example
>>> from sdssdb.utils.ingest import create_adhoc_database >>> tempdb, models = create_adhoc_database('datamodel', schema='filespec') >>> tempdb >>> <DatamodelDatabaseConnection (dbname='datamodel', profile='local', connected=True)> >>> models.File >>> sqlalchemy.ext.automap.File
- sdssdb.utils.ingest.create_model_from_table(table_name, table, schema=None, lowercase=False, primary_key=None)[source]¶
Returns a
Modelfrom the columns in a table.- Parameters:
table_name (str) – The name of the table.
table (Table) – An astropy table whose column names and types will be used to create the model.
schema (str) – The schema in which the table lives.
lowercase (bool) – If
True, all column names will be converted to lower case.primary_key (str) – The name of the column to mark as primary key.
- sdssdb.utils.ingest.drop_table(table_name, connection, cascade=False, schema=None)[source]¶
Drops a table. Does nothing if the table does not exist.
- sdssdb.utils.ingest.file_to_db(input_, connection, table_name, schema=None, lowercase=False, create=False, drop=False, truncate=False, primary_key=None, load_data=True, use_copy=True, chunk_size=100000, show_progress=False)[source]¶
Loads a table from a file to a database.
Loads a file or a
Tableobject into a database. Ifcreate=Truea new table will be created, with column types matching the table ones. All columns are initially defined asNULL.By default, the data are loaded using the
COPYmethod to optimise performance. This can be disabled if needed.- Parameters:
input (str or Table) – The path to a file that will be opened using
Table.reador an astropyTable.connection (.PeeweeDatabaseConnection) – The Peewee database connection to use (SQLAlchemy connections are not supported).
table_name (str) – The name of the table where to load the data, or to be created.
schema (str) – The schema in which the table lives.
lowercase (bool) – If
True, all column names will be converted to lower case.create (bool) – Creates the table if it does not exist.
drop (bool) – Drops the table before recreating it. Implies
create=True. Note that aCASCADEdrop will be executed. Use with caution.truncate (bool) – Truncates the table before loading the data but maintains the existing columns.
primary_key (str) – The name of the column to mark as primary key (ignored if the table is not being created).
load_data (bool) – If
True, loads the data from the table; otherwise just creates the table in the database.use_copy (bool) – When
True(recommended) uses the SQLCOPYcommand to load the data from a CSV stream.chunk_size (int) – How many rows to load at once.
show_progress (bool) – If
True, shows a progress bar. Requires theprogressbar2module to be installed.
- Returns:
model – The model for the table created.
- Return type:
~peewee:Model
- sdssdb.utils.ingest.to_csv(table, path, header=True, delimiter='\t', use_multiprocessing=False, workers=4)[source]¶
Creates a PostgreSQL-valid CSV file from a table, handling arrays.
- Parameters:
table (astropy.table.Table) – The table to convert.
path (str) – The path to which to write the CSV file.
header (bool) – Whether to add a header with the column names.
delimiter (str) – The delimiter between columns in the CSV files.
use_multiprocessing (bool) – Whether to use multiple cores. The rows of the resulting file will not have the same ordering as the original table.
workers (int) – How many workers to use with multiprocessing.
- sdssdb.utils.internals.get_cluster_index(connection, table_name=None, schema=None)[source]¶
Returns a tuple with the index on which a table has been clustered.
- sdssdb.utils.internals.get_database_columns(database, schema=None)[source]¶
Returns database column metadata.
Queries the
pg_catalogschema to retrieve column information for all the tables in a schema.- Parameters:
database (.PeeweeDatabaseConnection) – The database connection.
schema (str) – The schema to query. Defaults to the public schema.
- Returns:
metadata – A mapping keyed by the table name. For each table the list of primary keys is accessible via the key
pk, as well as the column metadata ascolumns. Each column metadata is a named tuple with attributesname,field_type(the Peewee column class),array_type, andnullable.- Return type:
- sdssdb.utils.internals.get_row_count(connection, table_name, schema=None, approximate=True)[source]¶
Returns the model row count.
- Parameters:
- sdssdb.utils.internals.get_unclustered_tables(connection, schema=None)[source]¶
Lists tables not clustered.
- sdssdb.utils.internals.is_table_locked(connection, table_name, schema=None)[source]¶
Returns the locks for a table or
Noneif no lock is present.
- sdssdb.utils.internals.vacuum(database, table_name, analyze=True, verbose=False, schema=None)[source]¶
Vacuums (and optionally analyses) a table.
- Parameters:
database (.PeeweeDatabaseConnection) – A PeeWee connection to the database to vacuum.
table_name (str) – The table name.
analyze (bool) – Whether to run
ANALYZEwhen vacuumming.verbose (bool) – Whether to run in verbose mode.
schema (str) – The schema to vacuum. If
None, vacuums the entire database.
- sdssdb.utils.internals.vacuum_all(database, analyze=True, verbose=False, schema=None)[source]¶
Vacuums all the tables in a database or schema.
- sdssdb.utils.registry.display_table(pprint: bool = None, mask_dups: bool = False, fill: str = '', **kwargs) Type[Table][source]¶
Display sdssdb databases and schema as an Astropy Table
Displays the list of available sdssdb databases organized by ORM and includes the schema for each database. Produces a table with columns “orm”, “db”, and “schema”.
- Parameters:
- Returns:
an Astropy Table of sdssdb databases
- Return type:
- Raises:
ImportError – when astropy is not installed
- sdssdb.utils.registry.list_databases(orm: str = None, with_schema: bool = False) dict | list[source]¶
Return a list of sdssdb databases
Returns a list of available databases in sdssdb. When no orm is specified, returns a dict of orm:database key:values. If with_schema is specified, also returns a list of schema for each database.
- Parameters:
- Returns:
A list of databases for a given ORM or a dict of database:schema values or a dict of orm:database values
- Return type:
- Raises:
TypeError – when input orm is not a string
ValueError – when input orm is not either peewee or sqlalchemy
- sdssdb.utils.schemadisplay.create_schema_graph(models=None, base=None, schema=None, show_columns=True, show_pks=True, show_indices=True, show_datatypes=True, skip_tables=[], font='Bitstream-Vera Sans', graph_options={}, relation_options={})[source]¶
Creates a graph visualisation from a series of Peewee models.
Produces a pydot graph including the tables and relationships from a series of models or from a base model class.
- Parameters:
base (Model) – A base model class. If passed, all the model classes that were created by subclassing from the base model will be used.
schema (str) – A schema name. If passed, will be used to limit the list of models or
basesubclasses to only the models that match the schema name.show_columns (bool) – Whether to show the column names.
show_pks (bool) – Whether to show the primary key. Supersedes
show_columns.show_indices (bool) – Whether to show the indices from the table as separate rows.
show_datatypes (bool) – Whether to show the data type of each column.
skip_tables (list) – List of table names to skip.
font (str) – The name of the font to use.
graph_options (dict) – Options for creating the graph. Any valid Graphviz option.
relation_options (dict) – Additional parameters to be passed to
pydot.Edgewhen creating the relationships.
- Returns:
graph – A
pydot.Dotobject with the graph representation of the schema.- Return type:
pydot.Dot
Example
>>> graph = create_schema_graph([User, Tweet]) >>> graph.write_pdf('tweetdb.pdf')