Database Performance: Optimization Techniques for SQL and NoSQL

Why Database Performance Matters

Databases are often the bottleneck in application performance. A slow database query can bring an entire application to its knees. Understanding how to optimize database performance is crucial for building scalable, responsive applications that provide excellent user experiences.

Understanding Query Performance

Query Execution Plan

Before optimizing, understand how your database executes queries.

PostgreSQL EXPLAIN

EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
ORDER BY post_count DESC
LIMIT 10;

-- Output shows:
-- - Execution time
-- - Rows scanned
-- - Index usage
-- - Join methods
-- - Sort operations

MySQL EXPLAIN

EXPLAIN
SELECT * FROM orders
WHERE customer_id = 123
AND order_date > '2025-01-01';

-- Key columns to watch:
-- type: ALL (bad), index (good), ref (better)
-- possible_keys: Available indexes
-- key: Actually used index
-- rows: Estimated rows to examine

Indexing Strategies

When to Use Indexes

Columns in WHERE clauses
Columns in JOIN conditions
Columns in ORDER BY clauses
Columns frequently searched or filtered

When NOT to Use Indexes

Small tables (full scan is faster)
Columns with low cardinality (few unique values)
Columns frequently updated (index maintenance overhead)
Tables with heavy INSERT/UPDATE operations

Index Types

B-Tree Index (Default)

-- PostgreSQL
CREATE INDEX idx_users_email ON users(email);

-- MySQL
CREATE INDEX idx_orders_date ON orders(order_date);

-- Composite index
CREATE INDEX idx_posts_user_status ON posts(user_id, status);

Unique Index

CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Automatically created with:
ALTER TABLE users ADD CONSTRAINT email_unique UNIQUE (email);

Partial Index (PostgreSQL)

-- Index only active users
CREATE INDEX idx_active_users ON users(email)
WHERE active = true;

-- Index recent orders
CREATE INDEX idx_recent_orders ON orders(order_date)
WHERE order_date > '2025-01-01';

Full-Text Index

-- PostgreSQL
CREATE INDEX idx_posts_content_fts ON posts
USING gin(to_tsvector('english', content));

SELECT * FROM posts
WHERE to_tsvector('english', content) @@ to_tsquery('database & performance');

-- MySQL
CREATE FULLTEXT INDEX idx_posts_content ON posts(content);

SELECT * FROM posts
WHERE MATCH(content) AGAINST('database performance' IN NATURAL LANGUAGE MODE);

Covering Index

-- Include additional columns in index
CREATE INDEX idx_users_email_name ON users(email)
INCLUDE (name, created_at);

-- Query can be satisfied entirely from index
SELECT name, created_at FROM users WHERE email = 'user@example.com';

Query Optimization Techniques

1. Use SELECT Specific Columns

-- Bad: Returns all columns
SELECT * FROM users;

-- Good: Only needed columns
SELECT id, name, email FROM users;

2. Avoid SELECT DISTINCT When Possible

-- Instead of:
SELECT DISTINCT user_id FROM posts;

-- Consider:
SELECT user_id FROM posts GROUP BY user_id;

3. Use EXISTS Instead of IN for Subqueries

-- Slower:
SELECT * FROM users
WHERE id IN (SELECT user_id FROM posts WHERE published = true);

-- Faster:
SELECT * FROM users u
WHERE EXISTS (
  SELECT 1 FROM posts p
  WHERE p.user_id = u.id AND p.published = true
);

4. Avoid Functions on Indexed Columns

-- Bad: Can't use index
SELECT * FROM users WHERE YEAR(created_at) = 2025;

-- Good: Can use index
SELECT * FROM users
WHERE created_at >= '2025-01-01'
AND created_at < '2026-01-01';

5. Use LIMIT for Pagination

-- Efficient pagination
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;

-- Better: Use cursor-based pagination for large offsets
SELECT * FROM posts
WHERE id < 12345
ORDER BY id DESC
LIMIT 20;

6. Optimize JOIN Operations

-- Ensure joined columns are indexed
CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_comments_post_id ON comments(post_id);

-- Join with indexes
SELECT u.name, p.title, COUNT(c.id) as comment_count
FROM users u
JOIN posts p ON u.id = p.user_id
LEFT JOIN comments c ON p.id = c.post_id
WHERE u.active = true
GROUP BY u.id, u.name, p.id, p.title;

Database Design Best Practices

Normalization vs Denormalization

Normalization (3NF)

-- Normalized design (avoids redundancy)
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(255)
);

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title VARCHAR(200),
  content TEXT,
  user_id INTEGER REFERENCES users(id)
);

CREATE TABLE tags (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) UNIQUE
);

CREATE TABLE post_tags (
  post_id INTEGER REFERENCES posts(id),
  tag_id INTEGER REFERENCES tags(id),
  PRIMARY KEY (post_id, tag_id)
);

Strategic Denormalization

-- Add computed columns to avoid JOINs
ALTER TABLE posts ADD COLUMN comment_count INTEGER DEFAULT 0;

-- Update with trigger
CREATE OR REPLACE FUNCTION update_comment_count()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE posts SET comment_count = comment_count + 1
    WHERE id = NEW.post_id;
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE posts SET comment_count = comment_count - 1
    WHERE id = OLD.post_id;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_update_comment_count
AFTER INSERT OR DELETE ON comments
FOR EACH ROW EXECUTE FUNCTION update_comment_count();

Proper Data Types

-- Use appropriate types to save space
-- Bad
id VARCHAR(255)
age VARCHAR(10)
price VARCHAR(20)
active VARCHAR(10)

-- Good
id INTEGER or BIGINT
age SMALLINT
price DECIMAL(10,2)
active BOOLEAN

-- Use ENUM for fixed sets
CREATE TYPE user_role AS ENUM ('admin', 'user', 'moderator');
ALTER TABLE users ADD COLUMN role user_role DEFAULT 'user';

Caching Strategies

Query Result Caching with Redis

const redis = require('redis');
const client = redis.createClient();

async function getUsers() {
  const cacheKey = 'users:active';
  
  // Try cache first
  const cached = await client.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }
  
  // Query database
  const users = await db.query(
    'SELECT * FROM users WHERE active = true'
  );
  
  // Store in cache (expire in 5 minutes)
  await client.setEx(
    cacheKey,
    300,
    JSON.stringify(users)
  );
  
  return users;
}

// Invalidate cache on updates
async function updateUser(userId, data) {
  await db.query(
    'UPDATE users SET name = $1 WHERE id = $2',
    [data.name, userId]
  );
  
  // Clear cache
  await client.del('users:active');
}

Database Query Cache (MySQL)

-- Check query cache settings
SHOW VARIABLES LIKE 'query_cache%';

-- Set cache size
SET GLOBAL query_cache_size = 268435456; -- 256MB

-- Queries are automatically cached unless:
SELECT SQL_NO_CACHE * FROM users;

Connection Pooling

Node.js with pg-pool

const { Pool } = require('pg');

const pool = new Pool({
  host: 'localhost',
  port: 5432,
  database: 'mydb',
  user: 'postgres',
  password: 'password',
  // Pool configuration
  max: 20,                  // Maximum connections
  min: 5,                   // Minimum connections
  idleTimeoutMillis: 30000, // Close idle connections after 30s
  connectionTimeoutMillis: 2000, // Wait 2s for connection
});

// Using the pool
async function getUser(id) {
  const client = await pool.connect();
  try {
    const result = await client.query(
      'SELECT * FROM users WHERE id = $1',
      [id]
    );
    return result.rows[0];
  } finally {
    client.release(); // Always release!
  }
}

// Or use pool directly (auto-releases)
async function getUserSimple(id) {
  const result = await pool.query(
    'SELECT * FROM users WHERE id = $1',
    [id]
  );
  return result.rows[0];
}

NoSQL Optimization

MongoDB Indexing

// Single field index
db.users.createIndex({ email: 1 });

// Compound index
db.posts.createIndex({ userId: 1, createdAt: -1 });

// Text index for search
db.posts.createIndex({ title: "text", content: "text" });

// Geospatial index
db.locations.createIndex({ coordinates: "2dsphere" });

// Check index usage
db.posts.find({ userId: 123 }).explain("executionStats");

// Find unused indexes
db.posts.aggregate([
  { $indexStats: {} },
  { $sort: { "accesses.ops": 1 } }
]);

MongoDB Query Optimization

// Project only needed fields
db.users.find(
  { active: true },
  { name: 1, email: 1, _id: 0 }
);

// Use aggregation pipeline efficiently
db.posts.aggregate([
  // Match early to reduce data
  { $match: { published: true } },
  // Then lookup/join
  { $lookup: {
      from: "users",
      localField: "userId",
      foreignField: "_id",
      as: "author"
  }},
  // Project after all transformations
  { $project: {
      title: 1,
      "author.name": 1
  }}
]);

// Limit results
db.posts.find().limit(20).sort({ createdAt: -1 });

Redis Performance

// Use pipelining for multiple commands
const pipeline = client.pipeline();
pipeline.set('key1', 'value1');
pipeline.set('key2', 'value2');
pipeline.set('key3', 'value3');
await pipeline.exec();

// Use appropriate data structures
// Hash for objects
await client.hSet('user:123', {
  name: 'Alice',
  email: 'alice@example.com',
  age: '30'
});

// Sorted set for leaderboards
await client.zAdd('leaderboard', [
  { score: 100, value: 'user1' },
  { score: 95, value: 'user2' },
  { score: 90, value: 'user3' }
]);

// Get top 10
const top10 = await client.zRange('leaderboard', 0, 9, { REV: true });

Monitoring and Maintenance

Identify Slow Queries

PostgreSQL

-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1s
SELECT pg_reload_conf();

-- Install pg_stat_statements extension
CREATE EXTENSION pg_stat_statements;

-- Find slowest queries
SELECT
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

MySQL

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- Analyze slow query log
-- Use pt-query-digest from Percona Toolkit
pt-query-digest /var/log/mysql/slow-query.log

Database Maintenance

PostgreSQL VACUUM

-- Reclaim space and update statistics
VACUUM ANALYZE users;

-- Full vacuum (locks table)
VACUUM FULL users;

-- Auto-vacuum settings
ALTER TABLE users SET (autovacuum_enabled = true);
ALTER TABLE users SET (autovacuum_vacuum_scale_factor = 0.1);

Reindex

-- PostgreSQL
REINDEX TABLE users;
REINDEX INDEX idx_users_email;

-- MySQL
OPTIMIZE TABLE users;

Partitioning Large Tables

PostgreSQL Table Partitioning

-- Create partitioned table
CREATE TABLE orders (
  id BIGSERIAL,
  customer_id INTEGER,
  order_date DATE,
  total DECIMAL(10,2),
  PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);

-- Create partitions
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- Indexes on partitioned table
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Queries automatically use correct partition
SELECT * FROM orders WHERE order_date = '2025-06-15';

Performance Testing

Load Testing with pgbench

# Initialize test database
pgbench -i -s 50 testdb

# Run benchmark
pgbench -c 10 -j 2 -t 1000 testdb

# Custom script
cat > test.sql << EOF
\set id random(1, 100000)
SELECT * FROM users WHERE id = :id;
EOF

pgbench -c 10 -f test.sql -T 60 testdb

Database Checklist

✓ Indexes on frequently queried columns
✓ No SELECT * in production code
✓ Connection pooling configured
✓ Query result caching where appropriate
✓ Slow query monitoring enabled
✓ Regular VACUUM/ANALYZE (PostgreSQL)
✓ Appropriate data types used
✓ Foreign key constraints defined
✓ Backup and recovery procedures tested
✓ Performance baselines established

Conclusion

Database optimization is an ongoing process that requires monitoring, testing, and refinement. Start with proper indexing, optimize your queries, implement caching where appropriate, and use connection pooling. Always measure before and after optimization to ensure your changes have the desired effect.

Remember that premature optimization can be counterproductive. Focus first on correct functionality, then optimize based on actual performance metrics and user experience. The best optimization is often the simplest: proper indexes, efficient queries, and appropriate hardware resources.