Scaling Self Hosted¶
Logfire is designed to be horizontally scalable, and can handle a lot of traffic. Depending on your usage patterns, however, you may be required to scale certain pods in order to maintain performance.
Please use the architecture diagram as reference.
PostgreSQL Configuration¶
PostgreSQL is not managed by the logfire helm chart. It is assumed that an existing cluster is available in your environment. If not, a good solution to deploying PostgreSQL in Kubernetes is CloudNativePG.
Note: No telemetry data is stored within PostgreSQL. We use PostgreSQL to manage organisations, projects, dashboards etc. and for tracking/compacting files within object storage.
A recommended starting size would be 4 vCPUs and 16gb RAM.
Here are some parameters you can use to start tuning:
Parameter | Value |
---|---|
autovacuum_analyze_scale_factor | 0.05 |
autovacuum_analyze_threshold | 50 |
autovacuum_max_workers | 6 |
autovacuum_naptime | 30 |
autovacuum_vacuum_cost_delay | 1 |
autovacuum_vacuum_cost_limit | 2000 |
autovacuum_vacuum_scale_factor | 0.1 |
autovacuum_vacuum_threshold | 50 |
idle_in_transaction_session_timeout | 60000 |
log_autovacuum_min_duration | 600000 |
log_lock_waits | on |
log_min_duration_statement | 1000 |
maintenance_work_mem | 4000000 |
max_connections | 2048 |
max_wal_size | 16000000 |
max_slot_wal_keep_size | 8000000 |
random_page_cost | 1.1 |
work_mem | 128000 |
Scaling Configuration¶
Each service can have standard kubernetes replicas, resource limits and autoscaling configured:
<service_name>:
# -- Number of pod replicas
replicas: 1
# -- Resource limits and allocations
resources:
cpu: "1"
memory: "1Gi"
# -- Autoscaler settings
autoscaling:
minReplicas: 2
maxReplicas: 4
memAverage: 65
cpuAverage: 20
# -- POD Disruption Budget
pdb:
maxUnavailable: 1
minAvaliable: 1
Recommended Starting Values¶
By default, the helm chart only includes a single replica for all pods, and no configured resource limits. When bringing self hosted to production, you will need to adjust the scaling of each service. This is depenent on the usage patterns of your instance.
I.e, if a lot of querying is going on, or there are a high number of dashboards, then you may need to scale up the query api and cache. Conversely, if you are write heavy, but don't query as much, you may need to scale up ingest. You can use the CPU and memory resources to gauge how busy certain aspects of Logfire are.
In the event that the system is not performing well, and there is no obvious CPU/Memory spikes, then please have a look at accessing the meta project in the troubleshooting section to understand internally what's going on.
Here are some recommended values to get you started:
logfire-backend:
replicas: 2
resources:
cpu: "2"
memory: "2Gi"
autoscaling:
minReplicas: 2
maxReplicas: 4
memAverage: 65
cpuAverage: 20
logfire-ff-query-api:
replicas: 2
resources:
cpu: "2"
memory: "2Gi"
autoscaling:
minReplicas: 2
maxReplicas: 8
memAverage: 65
cpuAverage: 20
logfire-ff-cache:
replicas: 2
cacheStorage: "256Gi"
resources:
cpu: "4"
memory: "8Gi"
logfire-ff-conhash-cache:
replicas: 2
resources:
cpu: "1"
memory: "1Gi"
logfire-ff-ingest:
volumeClaimTemplates:
storageClassName: my-storage-class
storage: "16Gi"
resources:
cpu: "2"
memory: "4Gi"
autoscaling:
minReplicas: 6
maxReplicas: 24
memAverage: 25
cpuAverage: 15
logfire-ff-compaction-worker:
replicas: 2
resources:
cpu: "4"
memory: "8Gi"
autoscaling:
minReplicas: 2
maxReplicas: 4
memAverage: 50
cpuAverage: 50
logfire-ff-maintenance-worker:
replicas: 2
resources:
cpu: "4"
memory: "8Gi"
autoscaling:
minReplicas: 2
maxReplicas: 4
memAverage: 50
cpuAverage: 50