ORM Configuration

    Most ORMs require that objects have some kind of primary key defined because the object in memory must correspond to a uniquely identifiable row in the database table; at the very least, this allows the object can be targeted for UPDATE and DELETE statements which will affect only that object’s row and no other. However, the importance of the primary key goes far beyond that. In SQLAlchemy, all ORM-mapped objects are at all times linked uniquely within a to their specific database row using a pattern called the identity map, a pattern that’s central to the unit of work system employed by SQLAlchemy, and is also key to the most common (and not-so-common) patterns of ORM usage.

    Note

    It’s important to note that we’re only talking about the SQLAlchemy ORM; an application which builds on Core and deals only with objects, select() constructs and the like, does not need any primary key to be present on or associated with a table in any way (though again, in SQL, all tables should really have some kind of primary key, lest you need to actually update or delete specific rows).

    In almost all cases, a table does have a so-called , which is a column or series of columns that uniquely identify a row. If a table truly doesn’t have this, and has actual fully duplicate rows, the table is not corresponding to first normal form and cannot be mapped. Otherwise, whatever columns comprise the best candidate key can be applied directly to the mapper:

    Better yet is when using fully declared table metadata, use the primary_key=True flag on those columns:

    1. class SomeClass(Base):
    2. __tablename__ = "some_table_with_no_pk"
    3. uid = Column(Integer, primary_key=True)
    4. bar = Column(String, primary_key=True)

    All tables in a relational database should have primary keys. Even a many-to-many association table - the primary key would be the composite of the two association columns:

    1. CREATE TABLE my_association (
    2. user_id INTEGER REFERENCES user(id),
    3. account_id INTEGER REFERENCES account(id),
    4. PRIMARY KEY (user_id, account_id)
    5. )

    How do I configure a Column that is a Python reserved word or similar?

    Column-based attributes can be given any name desired in the mapping. See Naming Declarative Mapped Columns Explicitly.

    This information is all available from the Mapper object.

    To get at the for a particular mapped class, call the inspect() function on it:

    1. from sqlalchemy import inspect
    2. mapper = inspect(MyClass)

    From there, all information about the class can be accessed through properties such as:

    • - a namespace of all mapped attributes. The attributes themselves are instances of MapperProperty, which contain additional attributes that can lead to the mapped SQL expression or column, if applicable.

    • - namespace of all RelationshipProperty attributes.

    • - namespace of all mapped attributes, plus user-defined attributes defined using systems such as hybrid_property, and others.

    • Mapper.columns - A namespace of objects and other named SQL expressions associated with the mapping.

    • Mapper.mapped_table - The or other selectable to which this mapper is mapped.

    • Mapper.local_table - The that is “local” to this mapper; this differs from Mapper.mapped_table in the case of a mapper mapped using inheritance to a composed selectable.

    I’m getting a warning or error about “Implicitly combining column X under attribute Y”

    This condition refers to when a mapping contains two columns that are being mapped under the same attribute name due to their name, but there’s no indication that this is intentional. A mapped class needs to have explicit names for every attribute that is to store an independent value; when two columns have the same name and aren’t disambiguated, they fall under the same attribute and the effect is that the value from one column is copied into the other, based on which column was assigned to the attribute first.

    This behavior is often desirable and is allowed without warning in the case where the two columns are linked together via a foreign key relationship within an inheritance mapping. When the warning or exception occurs, the issue can be resolved by either assigning the columns to differently-named attributes, or if combining them together is desired, by using column_property() to make this explicit.

    Given the example as follows:

    1. from sqlalchemy import Integer, Column, ForeignKey
    2. from sqlalchemy.ext.declarative import declarative_base
    3. Base = declarative_base()
    4. class A(Base):
    5. __tablename__ = "a"
    6. id = Column(Integer, primary_key=True)
    7. __tablename__ = "b"
    8. id = Column(Integer, primary_key=True)
    9. a_id = Column(Integer, ForeignKey("a.id"))

    As of SQLAlchemy version 0.9.5, the above condition is detected, and will warn that the id column of A and B is being combined under the same-named attribute id, which above is a serious issue since it means that a B object’s primary key will always mirror that of its A.

    A mapping which resolves this is as follows:

    Suppose we did want A.id and B.id to be mirrors of each other, despite the fact that B.a_id is where is related. We could combine them together using :

    1. class A(Base):
    2. __tablename__ = "a"
    3. id = Column(Integer, primary_key=True)
    4. class B(A):
    5. __tablename__ = "b"
    6. # probably not what you want, but this is a demonstration
    7. id = column_property(Column(Integer, primary_key=True), A.id)
    8. a_id = Column(Integer, ForeignKey("a.id"))
    1. class MyClass(Base):
    2. # ....
    3. foo = relationship(
    4. "Dest", primaryjoin=and_("MyClass.id==Dest.foo_id", "MyClass.foo==Dest.bar")
    5. )

    That’s an and_() of two string expressions, which SQLAlchemy cannot apply any mapping towards. Declarative allows arguments to be specified as strings, which are converted into expression objects using eval(). But this doesn’t occur inside of an and_() expression - it’s a special operation declarative applies only to the entirety of what’s passed to primaryjoin or other arguments as a string:

    1. class MyClass(Base):
    2. # ....
    3. foo = relationship(
    4. "Dest", primaryjoin="and_(MyClass.id==Dest.foo_id, MyClass.foo==Dest.bar)"
    5. )

    Or if the objects you need are already available, skip the strings:

    1. class MyClass(Base):
    2. # ....
    3. foo = relationship(
    4. Dest, primaryjoin=and_(MyClass.id == Dest.foo_id, MyClass.foo == Dest.bar)
    5. )

    The same idea applies to all the other arguments, such as foreign_keys:

    When ORDER BY is not used for a SELECT statement that returns rows, the relational database is free to returned matched rows in any arbitrary order. While this ordering very often corresponds to the natural order of rows within a table, this is not the case for all databases and all queries. The consequence of this is that any query that limits rows using LIMIT or OFFSET, or which merely selects the first row of the result, discarding the rest, will not be deterministic in terms of what result row is returned, assuming there’s more than one row that matches the query’s criteria.

    While we may not notice this for simple queries on databases that usually returns rows in their natural order, it becomes more of an issue if we also use to load related collections, and we may not be loading the collections as intended.

    SQLAlchemy implements subqueryload() by issuing a separate query, the results of which are matched up to the results from the first query. We see two queries emitted like this:

    1. >>> session.scalars(select(User).options(subqueryload(User.addresses))).all()
    2. -- the "main" query
    3. SELECT users.id AS users_id
    4. SELECT addresses.id AS addresses_id,
    5. addresses.user_id AS addresses_user_id,
    6. anon_1.users_id AS anon_1_users_id
    7. FROM (SELECT users.id AS users_id FROM users) AS anon_1
    8. JOIN addresses ON anon_1.users_id = addresses.user_id
    9. ORDER BY anon_1.users_id

    The second query embeds the first query as a source of rows. When the inner query uses OFFSET and/or LIMIT without ordering, the two queries may not see the same results:

    1. >>> user = session.scalars(
    2. ... select(User).options(subqueryload(User.addresses)).limit(1)
    3. ... ).first()
    4. -- the "main" query
    5. SELECT users.id AS users_id
    6. FROM users
    7. LIMIT 1
    8. -- the "load" query issued by subqueryload
    9. SELECT addresses.id AS addresses_id,
    10. addresses.user_id AS addresses_user_id,
    11. anon_1.users_id AS anon_1_users_id
    12. FROM (SELECT users.id AS users_id FROM users LIMIT 1) AS anon_1
    13. JOIN addresses ON anon_1.users_id = addresses.user_id
    14. ORDER BY anon_1.users_id

    Depending on database specifics, there is a chance we may get a result like the following for the two queries:

    1. -- query #1
    2. +--------+
    3. |users_id|
    4. +--------+
    5. | 1|
    6. +--------+
    7. -- query #2
    8. +------------+-----------------+---------------+
    9. |addresses_id|addresses_user_id|anon_1_users_id|
    10. +------------+-----------------+---------------+
    11. | 3| 2| 2|
    12. +------------+-----------------+---------------+
    13. | 4| 2| 2|
    14. +------------+-----------------+---------------+

    Above, we receive two addresses rows for user.id of 2, and none for 1. We’ve wasted two rows and failed to actually load the collection. This is an insidious error because without looking at the SQL and the results, the ORM will not show that there’s any issue; if we access the addresses for the User we have, it will emit a lazy load for the collection and we won’t see that anything actually went wrong.

    The solution to this problem is to always specify a deterministic sort order, so that the main query always returns the same set of rows. This generally means that you should on a unique column on the table. The primary key is a good choice for this:

    1. session.scalars(
    2. select(User).options(subqueryload(User.addresses)).order_by(User.id).limit(1)

    Note that the joinedload() eager loader strategy does not suffer from the same problem because only one query is ever issued, so the load query cannot be different from the main query. Similarly, the eager loader strategy also does not have this issue as it links its collection loads directly to primary key values just loaded.

    Subquery Eager Loading