Working with Large Collections
When a related collection is potentially very large, it may not be feasible for such a collection to be populated into memory under any circumstances, as the operation may be overly consuming of time, network and memory resources.
This section includes API features intended to allow relationship() to be used with large collections while maintaining adequate performance.
The write only loader strategy is the primary means of configuring a relationship() that will remain writeable, but will not load its contents into memory. A write-only ORM configuration in modern type-annotated Declarative form is illustrated below:
Above, the relationship is configured not using the ordinary annotation, but instead using the WriteOnlyMapped type annotation, which at runtime will assign the of lazy="write_only"
to the target relationship(). The annotation is an alternative form of the Mapped annotation which indicate the use of the collection type on instances of the object.
The above relationship() configuration also includes several elements that are specific to what action to take when Account
objects are deleted, as well as when AccountTransaction
objects are removed from the account_transactions
collection. These elements are:
passive_deletes=True
- allows the to forego having to load the collection whenAccount
is deleted; see Using foreign key ON DELETE cascade with ORM relationships.ondelete="cascade"
configured on the constraint. This is also detailed at Using foreign key ON DELETE cascade with ORM relationships.cascade="all, delete-orphan"
- instructs the to deleteAccountTransaction
objects when they are removed from the collection. See delete-orphan in the document.
New in version 2.0: Added “Write only” relationship loaders.
The write-only collection allows for direct assignment of the collection as a whole only for or pending objects. With our above mapping, this indicates we can create a new Account
object with a sequence of AccountTransaction
objects to be added to a . Any Python iterable may be used as the source of objects to start, where below we use a Python list
:
>>> new_account = Account(
... identifier="account_01",
... account_transactions=[
... AccountTransaction(description="initial deposit", amount=Decimal("500.00")),
... AccountTransaction(description="transfer", amount=Decimal("1000.00")),
... AccountTransaction(description="withdrawal", amount=Decimal("-29.50")),
... ],
... )
>>> with Session(engine) as session:
... session.add(new_account)
... session.commit()
BEGIN (implicit)
INSERT INTO account (identifier) VALUES (?)
[...] ('account_01',)
INSERT INTO account_transaction (account_id, description, amount, timestamp) VALUES
(?, ?, ?, CURRENT_TIMESTAMP), (?, ?, ?, CURRENT_TIMESTAMP), (?, ?, ?, CURRENT_TIMESTAMP)
RETURNING id, timestamp
[...] (1, 'initial deposit', 500.0, 1, 'transfer', 1000.0, 1, 'withdrawal', -29.5)
COMMIT
Once an object is database-persisted (i.e. in the persistent or state), the collection has the ability to be extended with new items as well as the ability for individual items to be removed. However, the collection may no longer be re-assigned with a full replacement collection, as such an operation requires that the previous collection is fully loaded into memory in order to reconcile the old entries with the new ones:
>>> new_account.account_transactions = [
... AccountTransaction(description="some transaction", amount=Decimal("10.00"))
... ]
Traceback (most recent call last):
...
sqlalchemy.exc.InvalidRequestError: Collection "Account.account_transactions" does not
support implicit iteration; collection replacement operations can't be used
Adding New Items to an Existing Collection
For write-only collections of persistent objects, modifications to the collection using processes may proceed only by using the WriteOnlyCollection.add(), and WriteOnlyCollection.remove() methods:
>>> from sqlalchemy import select
>>> session = Session(engine, expire_on_commit=False)
>>> existing_account = session.scalar(select(Account).filter_by(identifier="account_01"))
BEGIN (implicit)
SELECT account.id, account.identifier
FROM account
WHERE account.identifier = ?
[...] ('account_01',)
>>> existing_account.account_transactions.add_all(
... [
... AccountTransaction(description="paycheck", amount=Decimal("2000.00")),
... AccountTransaction(description="rent", amount=Decimal("-800.00")),
... ]
... )
>>> session.commit()
INSERT INTO account_transaction (account_id, description, amount, timestamp)
VALUES (?, ?, ?, CURRENT_TIMESTAMP), (?, ?, ?, CURRENT_TIMESTAMP)
RETURNING id, timestamp
[...] (1, 'paycheck', 2000.0, 1, 'rent', -800.0)
COMMIT
The items added above are held in a pending queue within the until the next flush, at which point they are INSERTed into the database, assuming the added objects were previously transient.
The WriteOnlyCollection does not at any point store a reference to the current contents of the collection, nor does it have any behavior where it would directly emit a SELECT to the database in order to load them; the overriding assumption is that the collection may contain many thousands or millions of rows, and should never be fully loaded into memory as a side effect of any other operation.
Instead, the includes SQL-generating helpers such as WriteOnlyCollection.select(), which will generate a construct pre-configured with the correct WHERE / FROM criteria for the current parent row, which can then be further modified in order to SELECT any range of rows desired, as well as invoked using features like server side cursors for processes that wish to iterate through the full collection in a memory-efficient manner.
The statement generated is illustrated below. Note it also includes ORDER BY criteria, indicated in the example mapping by the parameter of relationship(); this criteria would be omitted if the parameter were not configured:
>>> print(existing_account.account_transactions.select())
SELECT account_transaction.id, account_transaction.account_id, account_transaction.description,
account_transaction.amount, account_transaction.timestamp
FROM account_transaction
WHERE :param_1 = account_transaction.account_id ORDER BY account_transaction.timestamp
We may use this construct along with the Session in order to query for AccountTransaction
objects, most easily using the method that will return a Result that yields ORM objects directly. It’s typical, though not required, that the would be modified further to limit the records returned; in the example below, additional WHERE criteria to load only “debit” account transactions is added, along with “LIMIT 10” to retrieve only the first ten rows:
>>> account_transactions = session.scalars(
... existing_account.account_transactions.select()
... .limit(10)
... ).all()
BEGIN (implicit)
SELECT account_transaction.id, account_transaction.account_id, account_transaction.description,
account_transaction.amount, account_transaction.timestamp
FROM account_transaction
WHERE ? = account_transaction.account_id AND account_transaction.amount < ?
ORDER BY account_transaction.timestamp LIMIT ? OFFSET ?
[...] (1, 0, 10, 0)
>>> print(account_transactions)
Removing Items
Individual items that are loaded in the state against the current Session may be marked for removal from the collection using the method. The flush process will implicitly consider the object to be already part of the collection when the operation proceeds. The example below illustrates removal of an individual AccountTransaction
item, which per cascade settings results in a DELETE of that row:
>>> existing_transaction = account_transactions[0]
>>> existing_account.account_transactions.remove(existing_transaction)
>>> session.commit()
DELETE FROM account_transaction WHERE account_transaction.id = ?
[...] (3,)
COMMIT
As with any ORM-mapped collection, object removal may proceed either to de-associate the object from the collection while leaving the object present in the database, or may issue a DELETE for its row, based on the configuration of the relationship().
Collection removal without deletion involves setting foreign key columns to NULL for a relationship, or deleting the corresponding association row for a many-to-many relationship.
The WriteOnlyCollection can generate DML constructs such as objects, which may be used in an ORM context to produce bulk insert behavior. See the section ORM Bulk INSERT Statements for an overview of ORM bulk inserts.
One to Many Collections
For a regular one to many collection only, the WriteOnlyCollection.insert() method will produce an construct which is pre-established with VALUES criteria corresponding to the parent object. As this VALUES criteria is entirely against the related table, the statement can be used to INSERT new rows that will at the same time become new records in the related collection:
See also
ORM Bulk INSERT Statements - in the
One To Many - at
Many to Many Collections
For a many to many collection, the relationship between two classes involves a third table that is configured using the parameter of relationship. To bulk insert rows into a collection of this type using , the new records may be bulk-inserted separately first, retrieved using RETURNING, and those records then passed to the WriteOnlyCollection.add_all() method where the unit of work process will proceed to persist them as part of the collection.
Supposing a class BankAudit
referred to many AccountTransaction
records using a many-to-many table:
>>> from sqlalchemy import Table, Column
>>> audit_to_transaction = Table(
... "audit_transaction",
... Base.metadata,
... Column("audit_id", ForeignKey("audit.id", ondelete="CASCADE"), primary_key=True),
... Column(
... "transaction_id",
... ForeignKey("account_transaction.id", ondelete="CASCADE"),
... primary_key=True,
... ),
... )
>>> class BankAudit(Base):
... __tablename__ = "audit"
... id: Mapped[int] = mapped_column(primary_key=True)
... account_transactions: WriteOnlyMapped["AccountTransaction"] = relationship(
... secondary=audit_to_transaction, passive_deletes=True
... )
To illustrate the two operations, we add more AccountTransaction
objects using bulk insert, which we retrieve using RETURNING by adding returning(AccountTransaction)
to the bulk INSERT statement (note that we could just as easily use existing AccountTransaction
objects as well):
>>> new_transactions = session.scalars(
... existing_account.account_transactions.insert().returning(AccountTransaction),
... [
... {"description": "odd trans 1", "amount": Decimal("50000.00")},
... {"description": "odd trans 2", "amount": Decimal("25000.00")},
... {"description": "odd trans 3", "amount": Decimal("45.00")},
... ],
... ).all()
BEGIN (implicit)
INSERT INTO account_transaction (account_id, description, amount, timestamp) VALUES
(?, ?, ?, CURRENT_TIMESTAMP), (?, ?, ?, CURRENT_TIMESTAMP), (?, ?, ?, CURRENT_TIMESTAMP)
RETURNING id, account_id, description, amount, timestamp
[...] (1, 'odd trans 1', 50000.0, 1, 'odd trans 2', 25000.0, 1, 'odd trans 3', 45.0)
With a list of AccountTransaction
objects ready, the method is used to associate many rows at once with a new BankAudit
object:
>>> bank_audit = BankAudit()
>>> session.add(bank_audit)
>>> bank_audit.account_transactions.add_all(new_transactions)
>>> session.commit()
INSERT INTO audit DEFAULT VALUES
[...] ()
INSERT INTO audit_transaction (audit_id, transaction_id) VALUES (?, ?)
[...] [(1, 10), (1, 11), (1, 12)]
COMMIT
See also
ORM Bulk INSERT Statements - in the
Many To Many - at
Bulk UPDATE and DELETE of Items
In a similar way in which can generate Select constructs with WHERE criteria pre-established, it can also generate and Delete constructs with that same WHERE criteria, to allow criteria-oriented UPDATE and DELETE statements against the elements in a large collection.
One To Many Collections
As is the case with INSERT, this feature is most straightforward with one to many collections.
In the example below, the WriteOnlyCollection.update() method is used to generate an UPDATE statement is emitted against the elements in the collection, locating rows where the “amount” is equal to -800
and adding the amount of 200
to them:
>>> session.execute(
... existing_account.account_transactions.update()
... .values(amount=AccountTransaction.amount + 200)
... .where(AccountTransaction.amount == -800),
... )
BEGIN (implicit)
UPDATE account_transaction SET amount=(account_transaction.amount + ?)
WHERE ? = account_transaction.account_id AND account_transaction.amount = ?
[...] (200, 1, -800)
<...>
In a similar way, will produce a DELETE statement that is invoked in the same way:
>>> session.execute(
... existing_account.account_transactions.delete().where(
... AccountTransaction.amount.between(0, 30)
... ),
... )
DELETE FROM account_transaction WHERE ? = account_transaction.account_id
AND account_transaction.amount BETWEEN ? AND ? RETURNING id
<...>
Many to Many Collections
Tip
The techniques here involve multi-table UPDATE expressions, which are slightly more advanced.
For bulk UPDATE and DELETE of many to many collections, in order for an UPDATE or DELETE statement to relate to the primary key of the parent object, the association table must be explicitly part of the UPDATE/DELETE statement, which requires either that the backend includes supports for non-standard SQL syntaxes, or extra explicit steps when constructing the UPDATE or DELETE statement.
For backends that support multi-table versions of UPDATE, the method should work without extra steps for a many-to-many collection, as in the example below where an UPDATE is emitted against AccountTransaction
objects in terms of the many-to-many BankAudit.account_transactions
collection:
>>> session.execute(
... bank_audit.account_transactions.update().values(
... )
... )
UPDATE account_transaction SET description=(account_transaction.description || ?)
FROM audit_transaction WHERE ? = audit_transaction.audit_id
AND account_transaction.id = audit_transaction.transaction_id RETURNING id
[...] (' (audited)', 1)
<...>
The above statement automatically makes use of “UPDATE..FROM” syntax, supported by SQLite and others, to name the additional audit_transaction
table in the WHERE clause.
To UPDATE or DELETE a many-to-many collection where multi-table syntax is not available, the many-to-many criteria may be moved into SELECT that for example may be combined with IN to match rows. The WriteOnlyCollection still helps us here, as we use the method to generate this SELECT for us, making use of the Select.with_only_columns() method to produce a :
class sqlalchemy.orm.WriteOnlyCollection
Write-only collection which can synchronize changes into the attribute event system.
The is used in a mapping by using the "write_only"
lazy loading strategy with relationship(). For background on this configuration, see .
New in version 2.0.
See also
Members
Class signature
class (sqlalchemy.orm.writeonly.AbstractCollectionWriter
)
method sqlalchemy.orm.WriteOnlyCollection.add(item: _T) → None
Add an item to this .
The given item will be persisted to the database in terms of the parent instance’s collection on the next flush.
method sqlalchemy.orm.WriteOnlyCollection.add_all(iterator: Iterable[_T]) → None
Add an iterable of items to this .
The given items will be persisted to the database in terms of the parent instance’s collection on the next flush.
method sqlalchemy.orm.WriteOnlyCollection.delete() →
Produce a Delete which will refer to rows in terms of this instance-local .
method sqlalchemy.orm.WriteOnlyCollection.insert() →
For one-to-many collections, produce a Insert which will insert new rows in terms of this this instance-local .
This construct is only supported for a Relationship that does not include the parameter. For relationships that refer to a many-to-many table, use ordinary bulk insert techniques to produce new objects, then use
AbstractCollectionWriter.add_all()
to associate them with the collection.method sqlalchemy.orm.WriteOnlyCollection.remove(item: _T) → None
Remove an item from this .
The given item will be removed from the parent instance’s collection on the next flush.
method sqlalchemy.orm.WriteOnlyCollection.select() → [Tuple[_T]]
Produce a construct that represents the rows within this instance-local WriteOnlyCollection.
method update() → Update
Produce a which will refer to rows in terms of this instance-local WriteOnlyCollection.
class sqlalchemy.orm.WriteOnlyMapped
Represent the ORM mapped attribute type for a “write only” relationship.
The type annotation may be used in an Annotated Declarative Table mapping to indicate that the lazy="write_only"
loader strategy should be used for a particular .
E.g.:
class User(Base):
__tablename__ = "user"
id: Mapped[int] = mapped_column(primary_key=True)
addresses: WriteOnlyMapped[Address] = relationship(
cascade="all,delete-orphan"
)
See the section Write Only Relationships for background.
New in version 2.0.
See also
- complete background
DynamicMapped - includes legacy support
Class signature
class sqlalchemy.orm.WriteOnlyMapped (sqlalchemy.orm.base._MappedAnnotationBase
)
Legacy Feature
The “dynamic” lazy loader strategy is the legacy form of what is now the “write_only” strategy described in the section Write Only Relationships.
The “dynamic” strategy produces a legacy object from the related collection. However, a major drawback of “dynamic” relationships is that there are several cases where the collection will fully iterate, some of which are non-obvious, which can only be prevented with careful programming and testing on a case-by-case basis. Therefore, for truly large collection management, the WriteOnlyCollection should be preferred.
The dynamic loader is also not compatible with the extension. It can be used with some limitations, as indicated in Asyncio dynamic guidelines, but again the , which is fully compatible with asyncio, should be preferred.
The dynamic relationship strategy allows configuration of a relationship() which when accessed on an instance will return a legacy object in place of the collection. The Query can then be modified further so that the database collection may be iterated based on filtering criteria. The returned object is an instance of AppenderQuery, which combines the loading and iteration behavior of along with rudimentary collection mutation methods such as AppenderQuery.append() and .
The “dynamic” loader strategy may be configured with type-annotated Declarative form using the DynamicMapped annotation class:
from sqlalchemy.orm import DynamicMapped
class User(Base):
__tablename__ = "user"
id: Mapped[int] = mapped_column(primary_key=True)
posts: DynamicMapped[Post] = relationship()
Above, the User.posts
collection on an individual User
object will return the object, which is a subclass of Query that also supports basic collection mutation operations:
jack = session.get(User, id)
# filter Jack's blog posts
posts = jack.posts.filter(Post.headline == "this is a post")
# apply array slices
posts = jack.posts[5:20]
The dynamic relationship supports limited write operations, via the and AppenderQuery.remove() methods:
oldpost = jack.posts.filter(Post.headline == "old post").one()
jack.posts.remove(oldpost)
jack.posts.append(Post("new post"))
Since the read side of the dynamic relationship always queries the database, changes to the underlying collection will not be visible until the data has been flushed. However, as long as “autoflush” is enabled on the in use, this will occur automatically each time the collection is about to emit a query.
Dynamic Relationship Loaders - API
Object Name | Description |
---|---|
A dynamic query that supports basic collection storage operations. | |
Represent the ORM mapped attribute type for a “dynamic” relationship. |
class sqlalchemy.orm.AppenderQuery
A dynamic query that supports basic collection storage operations.
Methods on include all methods of Query, plus additional methods used for collection persistence.
Members
, add_all(), , count(), , remove()
Class signature
class (sqlalchemy.orm.dynamic.AppenderMixin
, sqlalchemy.orm.Query)
method add(item: _T) → None
inherited from the
AppenderMixin.add()
method ofAppenderMixin
The given item will be persisted to the database in terms of the parent instance’s collection on the next flush.
This method is provided to assist in delivering forwards-compatibility with the WriteOnlyCollection collection class.
New in version 2.0.
method add_all(iterator: Iterable[_T]) → None
inherited from the
AppenderMixin.add_all()
method ofAppenderMixin
Add an iterable of items to this AppenderQuery.
The given items will be persisted to the database in terms of the parent instance’s collection on the next flush.
This method is provided to assist in delivering forwards-compatibility with the collection class.
New in version 2.0.
method sqlalchemy.orm.AppenderQuery.append(item: _T) → None
inherited from the
AppenderMixin.append()
method ofAppenderMixin
Append an item to this .
The given item will be removed from the parent instance’s collection on the next flush.
method sqlalchemy.orm.AppenderQuery.count() → int
inherited from the
AppenderMixin.count()
method ofAppenderMixin
Return a count of rows this the SQL formed by this would return.
This generates the SQL for this Query as follows:
SELECT count(1) AS count_1 FROM (
SELECT <rest of query follows...>
) AS anon_1
The above SQL returns a single row, which is the aggregate value of the count function; the Query.count() method then returns that single integer value.
Warning
It is important to note that the value returned by count() is not the same as the number of ORM objects that this Query would return from a method such as the .all() method. The object, when asked to return full entities, will deduplicate entries based on primary key, meaning if the same primary key value would appear in the results more than once, only one object of that primary key would be present. This does not apply to a query that is against individual columns.
See also
My Query does not return the same number of objects as query.count() tells me - why?
For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use expressions in conjunction with Session.query(), i.e.:
from sqlalchemy import func
# count User records, without
# using a subquery.
session.query(func.count(User.id))
# return count of user "id" grouped
# by "name"
session.query(func.count(User.id)).\
group_by(User.name)
from sqlalchemy import distinct
# count distinct "name" values
session.query(func.count(distinct(User.name)))
See also
method sqlalchemy.orm.AppenderQuery.extend(iterator: Iterable[_T]) → None
inherited from the
AppenderMixin.extend()
method ofAppenderMixin
Add an iterable of items to this .
The given items will be persisted to the database in terms of the parent instance’s collection on the next flush.
method sqlalchemy.orm.AppenderQuery.remove(item: _T) → None
inherited from the
AppenderMixin.remove()
method ofAppenderMixin
Remove an item from this .
The given item will be removed from the parent instance’s collection on the next flush.
class sqlalchemy.orm.DynamicMapped
Represent the ORM mapped attribute type for a “dynamic” relationship.
The DynamicMapped type annotation may be used in an mapping to indicate that the lazy="dynamic"
loader strategy should be used for a particular relationship().
Legacy Feature
The “dynamic” lazy loader strategy is the legacy form of what is now the “write_only” strategy described in the section .
E.g.:
See the section Dynamic Relationship Loaders for background.
New in version 2.0.
See also
- complete background
WriteOnlyMapped - fully 2.0 style version
Class signature
class (sqlalchemy.orm.base._MappedAnnotationBase
)
A “raise”-loaded relationship will raise an where the attribute would normally emit a lazy load:
class MyClass(Base):
__tablename__ = "some_table"
# ...
Above, attribute access on the children
collection will raise an exception if it was not previously populated. This includes read access but for collections will also affect write access, as collections can’t be mutated without first loading them. The rationale for this is to ensure that an application is not emitting any unexpected lazy loads within a certain context. Rather than having to read through SQL logs to determine that all necessary attributes were eager loaded, the “raise” strategy will cause unloaded attributes to raise immediately if accessed. The raise strategy is also available on a query option basis using the raiseload() loader option.
See also
An important aspect of collection management in SQLAlchemy is that when an object that refers to a collection is deleted, SQLAlchemy needs to consider the objects that are inside this collection. Those objects will need to be de-associated from the parent, which for a one-to-many collection would mean that foreign key columns are set to NULL, or based on settings, may instead want to emit a DELETE for these rows.
The unit of work process only considers objects on a row-by-row basis, meaning a DELETE operation implies that all rows within a collection must be fully loaded into memory inside the flush process. This is not feasible for large collections, so we instead seek to rely upon the database’s own capability to update or delete the rows automatically using foreign key ON DELETE rules, instructing the unit of work to forego actually needing to load these rows in order to handle them. The unit of work can be instructed to work in this manner by configuring on the relationship() construct; the foreign key constraints in use must also be correctly configured.
For further detail on a complete “passive delete” configuration, see the section .