How to batch insert large amounts of data efficiently
How to batch insert large amounts of data efficiently 관련
Updated for Xcode 15
The best way to perform a bulk data import with SwiftData – i.e., to create a lot of model objects at the same time, perhaps as a result of a network call – is to use a background task through something like an actor.
Tips
If you're inserting more than 5000 or so objects, or if your models are particularly large because they include binary data, I would strongly suggest you split your inserts into smaller batches and trigger a manual save at the end of each batch so that your memory usage doesn't skyrocket. You might also want to consider very slightly throttling your inserts if you anticipate extended periods of work.
To get you started, here's some sample code for a BackgroundImporter
actor that inserts lots of sample data on a background actor:
actor BackgroundImporter {
var modelContainer: ModelContainer
init(modelContainer: ModelContainer) {
self.modelContainer = modelContainer
}
func backgroundInsert() async throws {
let modelContext = ModelContext(modelContainer)
let batchSize = 1000
let totalObjects = 100_000
for i in 0..<(totalObjects / batchSize) {
for j in 0..<batchSize {
// try await Task.sleep(for: .milliseconds(1))
let issue = Movie(title: "Movie \(I * batchSize + j)", cast: [])
modelContext.insert(issue)
}
try modelContext.save()
}
}
}
That does a few important things you should take note of for maximum performance:
- It creates the model context inside the
backgroundInsert()
method, thus avoiding the significant cost of accessing an actor's property repeatedly. - It inserts 100,000 test objects in chunks of 1000.
- After each batch it calls
save()
to write the data to disk, keeping the peak memory overhead lower. - It has a commented-out 1-millisecond pause between each object insertion – the code goes as fast as possible, but if you want to throttle your inserts you should uncomment that line.
Important
It bears repeating that if you create the model context as a property of your actor rather than inside the method, your batch inserts will run an order of magnitude slower.
Each time you call save()
your changes will automatically be reflected on your main context, so for example any @Query
properties will update.
In your own projects, you would presumably either be calculating your data dynamically or fetching it from a remote server, but the concept is the same: push the work onto a model context running on a different actor, and it will automatically be synchronized when save()
is called.