Another Mike

Upsides of Mongo ObjectIds in Postgres

6/22/2023

After porting a database from MongoDB to Postgres, I decided to keep using and generating the MongoDB ObjectIds. These solve a lot of problems by generating unique IDs at scale without having to synchronize in a distributed architecture. This means your horizontally scaled application can have multiple app servers generating IDs without collisions, easing load on the database (no locking on a sequence generator, and you don't have to read the inserted row back to get the new ID, basically).

Plus, ObjectId includes a time component so it still sorts naturally. They're pretty handy! It's more common to use UUIDs for this but ObjectId is a little smaller and has similar guarantees.

When reading about UUID index performance in "Unexpected downsides of UUID keys in PostgreSQL" I wondered if ObjectId performed similarly to the updated uuid7.

I followed the same setup to create a records table in Postgres 15.2 except I added a bid column and filled it with values using this Postgres function to generate the ObjectIds.

I was happy to see ObjectId's data locality working out well:

postgres=# SELECT COUNT(id) FROM records;
  count   
----------
 10000000
(1 row)

Time: 411.971 ms
postgres=# SELECT COUNT(uuid_v4) FROM records;
  count   
----------
 10000000
(1 row)

Time: 748.385 ms
postgres=# SELECT COUNT(uuid_v7) FROM records;
  count   
----------
 10000000
(1 row)

Time: 495.941 ms
postgres=# SELECT COUNT(bid) FROM records;
  count   
----------
 10000000
(1 row)

Time: 451.762 ms

The explain plans:

postgres=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(id) FROM records;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=322243.93..322243.94 rows=1 width=8) (actual rows=1 loops=1)
   Buffers: shared hit=52135 read=54647
   ->  Gather  (cost=322243.72..322243.93 rows=2 width=8) (actual rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=52135 read=54647
         ->  Partial Aggregate  (cost=321243.72..321243.73 rows=1 width=8) (actual rows=1 loops=3)
               Buffers: shared hit=52135 read=54647
               ->  Parallel Index Only Scan using records_id_idx on records  (cost=0.43..310847.46 rows=4158501 width=8) (actual rows=3333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=52135 read=54647
 Planning Time: 0.081 ms
 JIT:
   Functions: 8
   Options: Inlining false, Optimization false, Expressions true, Deforming true
 Execution Time: 411.976 ms
(16 rows)

Time: 412.711 ms
postgres=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(uuid_v4) FROM records;
                                                                         QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=410928.06..410928.07 rows=1 width=8) (actual rows=1 loops=1)
   Buffers: shared hit=8704564 read=76635
   ->  Gather  (cost=410927.84..410928.05 rows=2 width=8) (actual rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=8704564 read=76635
         ->  Partial Aggregate  (cost=409927.84..409927.85 rows=1 width=8) (actual rows=1 loops=3)
               Buffers: shared hit=8704564 read=76635
               ->  Parallel Index Only Scan using records_uuid_v4_idx on records  (cost=0.56..399531.59 rows=4158501 width=16) (actual rows=3333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=8704564 read=76635
 Planning Time: 0.108 ms
 JIT:
   Functions: 8
   Options: Inlining false, Optimization false, Expressions true, Deforming true
 Execution Time: 826.196 ms
(16 rows)

Time: 827.001 ms
postgres=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(uuid_v7) FROM records;
                                                                         QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=410928.06..410928.07 rows=1 width=8) (actual rows=1 loops=1)
   Buffers: shared hit=52829 read=76641
   ->  Gather  (cost=410927.84..410928.05 rows=2 width=8) (actual rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=52829 read=76641
         ->  Partial Aggregate  (cost=409927.84..409927.85 rows=1 width=8) (actual rows=1 loops=3)
               Buffers: shared hit=52829 read=76641
               ->  Parallel Index Only Scan using records_uuid_v7_idx on records  (cost=0.56..399531.59 rows=4158501 width=16) (actual rows=3333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=52829 read=76641
 Planning Time: 0.075 ms
 JIT:
   Functions: 8
   Options: Inlining false, Optimization false, Expressions true, Deforming true
 Execution Time: 511.102 ms
(16 rows)

Time: 511.919 ms
postgres=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(bid) FROM records;
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=345740.06..345740.07 rows=1 width=8) (actual rows=1 loops=1)
   Buffers: shared hit=123595 read=60245
   ->  Gather  (cost=345739.84..345740.05 rows=2 width=8) (actual rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=123595 read=60245
         ->  Partial Aggregate  (cost=344739.84..344739.85 rows=1 width=8) (actual rows=1 loops=3)
               Buffers: shared hit=123595 read=60245
               ->  Parallel Index Only Scan using records_bid_idx on records  (cost=0.56..334343.59 rows=4158501 width=25) (actual rows=3333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=123595 read=60245
 Planning Time: 0.060 ms
 JIT:
   Functions: 8
   Options: Inlining false, Optimization false, Expressions true, Deforming true
 Execution Time: 431.256 ms
(16 rows)

Time: 431.965 ms

I was happy to see ObjectId's data locality working out as expected. They're a small cost for some nicer semantics if you need it.