Getting started with sdssdb¶
sdssdb
provides general utilities and functionality for connecting to SDSS databases using Object-relational Mapping (ORM). sdssdb
supports two Python ORM libraries for mapping between database tables and Python classes: Peewee and SQLAlchemy.
Making a simple query with sdssdb
¶
Imagine that you want to query catalogdb
(the schema containing all the catalogues used for target selection in SDSS-V) and get all the Gaia targets within a range of magnitudes. In most cases this only requires a couple lines of code. This example assumes that you’re running the code from a machine at Utah which has direct access to the operations.sdss.org
machine.
>>> from sdssdb.peewee.sdss5db.catalogdb import database
>>> database.set_profile('operations')
True
>>> from sdssdb.peewee.sdss5db.catalogdb import Gaia_DR2
>>> targets = Gaia_DR2.select().where((Gaia_DR2.phot_g_mean_mag > 15) & (Gaia_DR2.phot_g_mean_mag < 16)).limit(10)
This will returns the first 10 results from Gaia DR2 with g magnitude in the range \((15, 16)\). Simple.
A subtlety is that the order of imports is important. Since the Gaia_DR2
model is populated dynamically, it must be imported once a connection to the database has been accomplished. Alternatively, one can do
>>> from sdssdb.peewee.sdss5db import catalogdb
>>> catalogdb.database.set_profile('operations')
True
>>> targets = catalogdb.Gaia_DR2.select().where((catalogdb.Gaia_DR2.phot_g_mean_mag > 15) & (catalogdb.Gaia_DR2.phot_g_mean_mag < 16)).limit(10)
Available databases¶
Currently, we support the following databases and schemas:
- operationsdb: a global name for the APO (
apodb
) and LCO (lcodb
) operations database. platedb: schema for plate observations and metadata.
mangadb: schema for MaNGA observations and metadata.
apogeeqldb: schema for APOGEE observations and quick reductions.
- operationsdb: a global name for the APO (
- mangadb: the SDSS-IV MaNGA database.
datadb: schema for MaNGA DRP data products.
dapdb: schema for MaNGA DAP data products.
sampledb: schema for MaNGA target sample.
auxdb: schema for auxiliary information for MaNGA.
- sdss5db: the SDSS-V development database.
catalogdb: schema for source catalogues used for target selection.
targetdb: schema with the results of the target selection and positioner information.
opsdb: schema for the operations database
apogee_drpdb: schema with the results of the MWM DRP
boss_drp: schema with the results of the BHM DRP
vizdb: primary schema for the data visualization webapp
astradb: primary schema for the Astra pipeline
- archive: the SDSS science archive database.
sas: schema for SAS.
- lvmdb: the LVM database
lvmopsdb: schema for the LVM operations database at LCO
Note that the level of readiness is not necessarily identical in both Peewee and SQLAlchemy. This table summarises what schemas are available for each library. Green indicates fully supported, yellow partial support, and red means that there are currently not model classes available for that schema. You can download the graph visualisation of the schema, showing the tables, columns, and relations between tables.
Database | Schema | Peewee | SQLAlchemy | Graph |
---|---|---|---|---|
operationsdb | platedb | |||
mangadb | ||||
apogeeqldb | ||||
manga | auxdb | |||
dapdb | ||||
datadb | ||||
sampledb | ||||
sdss5db | catalogdb | |||
targetdb | ||||
opsdb | ||||
apogee_drpdb | ||||
boss_drp | ||||
vizdb | ||||
astradb | ||||
archive | sas |
Connecting to a Database¶
The DatabaseConnection
abstract class allows to connect to a PostgreSQL database using a profile (see the Supported Profiles) or a custom set of connection parameters. In most cases, the user will need to use either PeeweeDatabaseConnection
or SQLADatabaseConnection
depending on the backend library used. Regarding the implementation details, their behaviour is identical. To open a connection to the database manga
we can do
>>> from sdssdb.connection import SQLADatabaseConnection
>>> db = SQLADatabaseConnection('manga')
>>> db
<SQLADatabaseConnection (dbname='manga', profile='local', connected=True)>
(note that this example will only work if you have a local database called manga
)
What happened here? SQLADatabaseConnection
connected to the manga
database using the local
profile. A profile is simply a set of username, hostname, and port on which to look for a PostgreSQL server. sdssdb
tries to be smart and select a profile that matches the machine on which you are working. That may not always work. For example, imagine that you are working on manga.wasatch.peaks
but trying to connect to sdss5db
which is running on operations-test.sdss.utah.edu
>>> from sdssdb.connection import PeeweeDatabaseConnection
>>> db = PeeweeDatabaseConnection('sdss5db')
<PeeweeDatabaseConnection (dbname='sdss5db', profile='manga', connected=False)>
In this case the profile is not the appropriate for connecting to sdss5db
and the connection fails. We can fix that by connecting with the correct profile
>>> db.set_profile('operations')
True
>>> db
<PeeweeDatabaseConnection (dbname='sdss5db', profile='operations', connected=True)>
Or we could have connected to the database passing it a full set of parameters
>>> db.connect_from_parameters(user='sdss', host='operations.sdss.org', port=5432)
True
In other cases you may have several databases running on the same server. You can prepare a connection using the appropriate profile and then connect to a specific database
>>> local_db = PeeWeeDatabaseConnection(profile='local')
>>> local_db.connect('apodb')
connect
will try to use the current profile to connect to the given database.
In general you will not usually create database connections directly. Each database schema is bound to a database connection which will try to connect to the correct database. For example
>>> from sdssdb.peewee.operationsdb import database
>>> database
<PeeweeDatabaseConnection (dbname='apodb', profile='apo', connected=True)>
Now imagine the case in which you are running sdssdb
from your local computer and are trying to connect to apodb
at APO. You do not have the database locally but have created a tunnel connection to sdss4-db.apo.nmsu.edu
and redirected it to your localhost port 6666. To connect to that tunnel you do
>>> from sdssdb.peewee.operationsdb import database
>>> database
<PeeweeDatabaseConnection (dbname=None, profile='local', connected=False)>
>>> database.connect_from_parameters(dbname='apodb', host='localhost', port=6666, user='sdssdb')
True
>>> database
<PeeweeDatabaseConnection (dbname='apodb', profile='local', connected=True)>
There are two database connections, SQLADatabaseConnection
and PeeWeeDatabaseConnection
, one for each mapping library. Each database connection has two keyword arguments: a user/machine profile, a database name. The connection will automatically attempt to connect to the specified database with the profile unless the autoconnect
keyword is set to False
.
# load a database connection with the Utah manga machine profile and connect to the manga database. To create a Peewee conenction replace with PeeweeDatabaseConnection.
from sdssdb.connection import SQLADatabaseConnection
db = SQLADatabaseConnection(profile='manga', dbname='manga')
A note about passwords¶
sdssdb
does not allow you to pass plaintext passwords when creating a connection, or to store them in the profiles. Instead, you should use pgpass to set your passwords. A typical ~/.pgpass
file looks something like
*:*:apodb:sdssdb:XXXX
localhost:5432:sdss5db:sdss:YYYY
operations-test.sdss.utah.edu:5432:sdss5db:sdss:ZZZZ
where XXXX
, YYYY
, etc are the associated passwords for each set of parameters.
Supported Profiles¶
The following profiles are included with sdssdb. When a database connection is created without an explicit profile, the hostname of the current machine is used to find the best possible profile. Profiles can be added or modified by creating a YAML file in ~/.config/sdss/sdssdb.yaml
with the same structure.
local: a generic localhost profile. Used if the hostname does not match any other profile.
apo: a user on the APO machines.
lco: a user on the LCO machines.
manga: a user on the Utah manga machine.
operations: a user on the Utah operations machine.
sdssadmin: a user on the Utah sdssadmin machine.
lore: a user on the Utah lore machine.
A list of available profiles (including custom ones) can also be accessed via de list_profiles
classmethod
>>> import sdssdb
>>> profiles = sdssdb.DatabaseConnection.list_profiles()
>>> profiles
dict_keys(['apo', 'lco', 'operations-test', 'local', 'lore', 'jhu', 'sdssadmin', 'manga'])
>>> sdssdb.DatabaseConnection.list_profiles('apo')
{'user': 'sdssdb',
'admin': 'sdssdb_admin',
'host': 'sdss4-db',
'port': 5432,
'domain': 'apo.nmsu.edu'}
Accessing the model classes¶
A model class is a Python class that abstracts a database table so that it can be accessed by the ORM libraries. In sdssdb
the model class for a given table can always be found under sdssdb.XXX.YYY.ZZZ
where XXX
is either peewee
or sqlalchemy
depending on the library you want to use, YYY
is the database name, and ZZZ
is the schema name. For instance, if you want to use peewee to query the target
table in the targetdb
schema in sdss5db
, you need to import
from sdssdb.peewee.sdss5db.targetdb import Target
Note that we use the standard of capitalising class names. Frequently, you’ll want to import the whole schema as
from sdssdb.peewee.sdss5db import targetdb
which gives you access to all the model classes for that schema. The database bound to those model classes can be accessed from the submodule containing the database or from the schema
>>> from sdssdb.peewee.sdss5db import database
>>> from sdssdb.peewee.sdss5db import targetdb
>>> database
<SDSS5dbDatabaseConnection (dbname='sdss5db', profile='local', connected=True)>
>>> targetdb.database
<SDSS5dbDatabaseConnection (dbname='sdss5db', profile='local', connected=True)>
>>> targetdb.database == database
True
SQLAlchemy specifics¶
The database handling with SQLAlchemy is mostly the same as with Peewee. The main difference is the need to create a database session before connecting and querying
# connecting to the manga database
from sdssdb.sqlalchemy.mangadb import database, datadb
# start a session
session = database.Session()
# write a query
cube = session.query(datadb.Cube).first()
If you connect to a different database, you must recreate the database session
# connect to a separate database
database.connect('other-mangadb')
session = database.Session()
The case of operationsdb
¶
If you are familiar with the SDSS databases you will know that there is no operationsdb
. Instead, there is apodb
and lcodb
, two databases that share the same schemas but are located on computers are APO and LCO respectively. Instead of creating different sets of identical model classes for both databases, the models and database connections can be found under the operationsdb
submodule (sdssdb.peewee.operationsdb
or sdssdb.sqlalchemy.operationsdb
).
When you import the database connection sdssdb
will try use the profile name to decide to which database to connect. For example, if you are at APO the apo
profile will be used by default and the database connection will try to connect to apodb
>>> from sdssdb.peewee.operationsdb import database
>>> database
<PeeweeDatabaseConnection (dbname='apodb', profile='apo', connected=True)>
If that fails, you will need to define the database name and profile. In the following example the user has apodb
available locally
>>> from sdssdb.peewee.operationsdb import database
>>> database
<PeeweeDatabaseConnection (dbname=None, profile='local', connected=False)>
>>> database.connect('apodb')
True
>>> database
<PeeweeDatabaseConnection (dbname='apodb', profile='local', connected=True)>
If both apodb
and lcodb
are available the local
profile will not connect to either of them automatically
>>> from sdssdb.peewee.operationsdb import database
>>> database
<OperationsDBConnection (dbname=None, profile='local', connected=False)>
>>> database.connect('lcodb')
True
>>> database
<OperationsDBConnection (dbname='lcodb', profile='local', connected=True)>
We can switch from one to the other in runtime
>>> database
<PeeweeDatabaseConnection (dbname='lcodb', profile='local', connected=True)>
>>> from sdssdb.peewee.operationsdb import platedb
>>> plate_9781 = platedb.Plate.get(plate_id=9781)
>>> plate_9781.plate_run.label
'2017.03.b.apogee2s.south'
>>> database.connect('apodb')
True
>>> database
<PeeweeDatabaseConnection (dbname='apodb', profile='local', connected=True)>
>>> plate_10k = platedb.Plate.get(plate_id=10000)
>>> plate_10k.plate_run.label
'2015.08.z.eboss'
Where to go from here?¶
Once the connection has been created and the model classes imported you can use them as you would with any Peewee or SQLALchemy model. It is beyond the purpose of this documentation to explain how to use those libraries. Instead, refer to the Peewee or SQLAlchemy documentation.
The How to use sdssdb for target selection section provides a detailed example of how to use sdssdb
that highlights the advantages of the ORM approach.