20 Tips for Great Database Creation and Best Schema Practices
Normalize for integrity, but denormalize for performance. Start by structuring your database in 3rd Normal Form to reduce redundancy, but introduce denormalization selectively if specific joins are proven bottlenecks.
Every table must have a Primary Key. This is a strict requirement, as it acts as the unique identifier for a row, allows you to reliably update or delete records, and helps the database engine organize data efficiently.
Use surrogate keys over natural keys. Natural keys (like a Social Security Number or email) can change, forcing you to update every related foreign key. Surrogate keys (like UUIDs or auto-incrementing IDs) are immutable and safer.
Use NOT NULL by default. Allowing NULLs introduces complex “three-valued logic.” Forcing columns to be NOT NULL ensures data quality at the point of entry and simplifies application logic.
Enforce proper data types. Never store dates or numbers as strings. Proper data types allow the database to validate information, compress storage, and let you utilize built-in functions.
Standardize your naming conventions. Pick a consistent format (like snake_case), choose either singular or plural for your table names (e.g., user vs. users), and never mix them.
Always use Foreign Keys. Foreign keys are the only way to guarantee that a parent record actually exists before creating a related child record, which prevents orphaned data from breaking your application.
Maintain a “Single Source of Truth.” Do not store calculated values (such as a total price derived from quantity * price) directly in a table unless extreme performance requires it, to avoid the risk of values falling out of sync.
Include created_at and updated_at timestamps. Without these audit columns, you cannot properly troubleshoot data corruption or sync effectively to external analytics tools.
Version your schema.Never run manual ALTER TABLE commands in production. Track every change using migration tools (like Flyway or Liquibase) and version control.
Isolate Personally Identifiable Information (PII). Store sensitive data like passwords and addresses in heavily restricted, separate tables to apply stricter access controls.
Use Check Constraints. Constraints like CHECK (price > 0) act as a final line of defense against application bugs, ensuring your data remains logically sound.
Design for the “Delete.” Clearly define what should happen when a parent record is removed using ON DELETE CASCADE (to remove children) or SET NULL (to keep them), preventing database errors and garbage data.
Index your Foreign Keys. Most databases do not index foreign keys automatically. Since almost every join uses a foreign key, failing to index them is a massive cause of slow performance.
Utilize Partial Indexes. If you frequently query a specific subset of data (like WHERE status = 'active'), create a partial index to keep it tiny and incredibly fast compared to indexing the entire table.
Prefer BIGINT over INT. A standard INT caps out at around 2.1 billion rows. For high-traffic tables, use BIGINT from the start to avoid catastrophic system failures when the database runs out of IDs.
Use schemas and namespaces. Keep your database organized by grouping related tables into schemas (like billing, audit, and public) rather than dumping hundreds of tables into one space.
Document your columns. Add comments directly to the database columns (e.g., COMMENT ON COLUMN) so that the “living documentation” stays intimately tied to the data itself.
Use BOOLEAN instead of INT(1). Using proper true/false boolean flags is far more semantic and prevents invalid integers (like “99”) from entering a binary field.
Design in three tiers. Create a conceptual data model for business scope, a logical data model for detailed attributes without technical compromise, and a physical data model tailored to your specific database software.
10 Things People Constantly Do Wrong
Creating “The God Table.” Cramming 150+ columns into a single table makes the database slow to query, difficult to reason about, and implies multiple entities were improperly mashed together.
Using SELECT *. Fetching every single column wastes network I/O, prevents the database from using highly optimized “covering indexes,” and risks breaking the application if a massive new column is added.
Storing files or images directly in the database. Pushing large BLOBs into relational tables makes backups massive and painfully slow. You should store the actual files in an object store (like S3) and only save the file path in the database.
Falling for the “N+1 Query Problem.” Fetching a list of 100 records and then looping through your application to run 100 separate queries for their related data is a top application killer. You should use a JOIN to fetch it all in one network trip.
Overusing “Soft Deletes.” Adding an is_deleted column to everything breaks unique constraints and forces developers to clutter every single future query with WHERE is_deleted = false.
Performing math inside WHERE clauses. Wrapping an indexed column in a function or calculation (e.g., WHERE YEAR(date) = 2023) completely blinds the database to the index. Keep the column “naked” on one side of the operator.
Abusing EAV (Entity-Attribute-Value) patterns. Storing your schema as rows of attributes and values might seem infinitely flexible, but it makes even basic queries require dozens of joins, turning data retrieval into a nightmare.
Never testing backups. Believing you have a backup strategy when you have never successfully practiced a “Point-in-Time Recovery” (PITR). A backup is useless until you prove you can restore it.
Sorting with ORDER BY RAND(). This command forces the database to assign a random number to every single row, sort the entire massive table, and then pick one, acting as a massive performance bottleneck.
Using the database as a Message Queue. Treating a relational table as a high-volume “to-do list” for background jobs causes massive bloat and lock contention. Dedicated tools like RabbitMQ or Kafka should be used instead.