Upsides of Mongo ObjectIds in Postgres
6/22/2023After porting a database from MongoDB to Postgres, I decided to keep using and generating the MongoDB ObjectIds. These solve a lot of problems by generating unique IDs at scale without having to synchronize in a distributed architecture. This means your horizontally scaled application can have multiple app servers generating IDs without collisions, easing load on the database (no locking on a sequence generator, and you don't have to read the inserted row back to get the new ID, basically).
Plus, ObjectId includes a time component so it still sorts naturally. They're pretty handy! It's more common to use UUIDs for this but ObjectId is a little smaller and has similar guarantees.
When reading about UUID index performance in "Unexpected downsides of UUID keys in PostgreSQL" I wondered if ObjectId performed similarly to the updated uuid7.
I followed the same setup to create a records table in Postgres 15.2 except I added a bid column and filled it with values using this Postgres function to generate the ObjectIds.
I was happy to see ObjectId's data locality working out well:
postgres=# SELECT COUNT(id) FROM records;
count
----------
10000000
(1 row)
Time: 411.971 ms
postgres=# SELECT COUNT(uuid_v4) FROM records;
count
----------
10000000
(1 row)
Time: 748.385 ms
postgres=# SELECT COUNT(uuid_v7) FROM records;
count
----------
10000000
(1 row)
Time: 495.941 ms
postgres=# SELECT COUNT(bid) FROM records;
count
----------
10000000
(1 row)
Time: 451.762 ms
The explain plans:
postgres=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(id) FROM records;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=322243.93..322243.94 rows=1 width=8) (actual rows=1 loops=1)
Buffers: shared hit=52135 read=54647
-> Gather (cost=322243.72..322243.93 rows=2 width=8) (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=52135 read=54647
-> Partial Aggregate (cost=321243.72..321243.73 rows=1 width=8) (actual rows=1 loops=3)
Buffers: shared hit=52135 read=54647
-> Parallel Index Only Scan using records_id_idx on records (cost=0.43..310847.46 rows=4158501 width=8) (actual rows=3333333 loops=3)
Heap Fetches: 0
Buffers: shared hit=52135 read=54647
Planning Time: 0.081 ms
JIT:
Functions: 8
Options: Inlining false, Optimization false, Expressions true, Deforming true
Execution Time: 411.976 ms
(16 rows)
Time: 412.711 ms
postgres=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(uuid_v4) FROM records;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=410928.06..410928.07 rows=1 width=8) (actual rows=1 loops=1)
Buffers: shared hit=8704564 read=76635
-> Gather (cost=410927.84..410928.05 rows=2 width=8) (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=8704564 read=76635
-> Partial Aggregate (cost=409927.84..409927.85 rows=1 width=8) (actual rows=1 loops=3)
Buffers: shared hit=8704564 read=76635
-> Parallel Index Only Scan using records_uuid_v4_idx on records (cost=0.56..399531.59 rows=4158501 width=16) (actual rows=3333333 loops=3)
Heap Fetches: 0
Buffers: shared hit=8704564 read=76635
Planning Time: 0.108 ms
JIT:
Functions: 8
Options: Inlining false, Optimization false, Expressions true, Deforming true
Execution Time: 826.196 ms
(16 rows)
Time: 827.001 ms
postgres=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(uuid_v7) FROM records;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=410928.06..410928.07 rows=1 width=8) (actual rows=1 loops=1)
Buffers: shared hit=52829 read=76641
-> Gather (cost=410927.84..410928.05 rows=2 width=8) (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=52829 read=76641
-> Partial Aggregate (cost=409927.84..409927.85 rows=1 width=8) (actual rows=1 loops=3)
Buffers: shared hit=52829 read=76641
-> Parallel Index Only Scan using records_uuid_v7_idx on records (cost=0.56..399531.59 rows=4158501 width=16) (actual rows=3333333 loops=3)
Heap Fetches: 0
Buffers: shared hit=52829 read=76641
Planning Time: 0.075 ms
JIT:
Functions: 8
Options: Inlining false, Optimization false, Expressions true, Deforming true
Execution Time: 511.102 ms
(16 rows)
Time: 511.919 ms
postgres=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(bid) FROM records;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=345740.06..345740.07 rows=1 width=8) (actual rows=1 loops=1)
Buffers: shared hit=123595 read=60245
-> Gather (cost=345739.84..345740.05 rows=2 width=8) (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=123595 read=60245
-> Partial Aggregate (cost=344739.84..344739.85 rows=1 width=8) (actual rows=1 loops=3)
Buffers: shared hit=123595 read=60245
-> Parallel Index Only Scan using records_bid_idx on records (cost=0.56..334343.59 rows=4158501 width=25) (actual rows=3333333 loops=3)
Heap Fetches: 0
Buffers: shared hit=123595 read=60245
Planning Time: 0.060 ms
JIT:
Functions: 8
Options: Inlining false, Optimization false, Expressions true, Deforming true
Execution Time: 431.256 ms
(16 rows)
Time: 431.965 ms
I was happy to see ObjectId's data locality working out as expected. They're a small cost for some nicer semantics if you need it.