Database is the most fundamental thing about your project, probably even more than programming language. I've seen a lot of projects that didn't choose their DB carefully or had non-optimal practices while operating their DB and they had a lot of problems and suffering because of that. Here are some guidelines on how to choose a database and go forward with happiness.
2. Why bother
It's the most fundamental thing about the
project
Even programming languages are switched
more often
With wrong choice you'll suffer. But why?
3. Different cases
I certainly know
what I need
I'm not sure,
something will do
This tech is cool!
Serious business:
risk/bene鍖t
management
Small project:
??
Pet project:
fun/learning
management
4. I know what I need
Huge time series DB
Cross-datacenter replication
Petabytes of data
...
This talk is not for you
5. This tech is COOL
Pet projects
Experiments
Beware otherwise
6. Consider this
Data correctness (ACID, enforced schema)
Easy data modeling
Operational complexity
Migrations
Scaling
Project use cases no one knows them
yet!
7. Data correctness
Silently losing data is not much fun
Schema-less is a lie
Heterogeneous data is hard to analyze,
change, display
Especially true in data-heavy projects
8. Data modeling
In document-oriented DB (e.g. MongoDB)
we need to specially craft "tables"
according to anticipated queries.
And re-craft them when queries change!
Much easier in RDBMS, just dump it in
adequate tables and slap some indexes
Datomic is best here:
Entity-Attribute-Value
9. Migrations
Often overlooked part
Keep them in repository!
Transactional DDL
Developing migrations in REPL
Downgrade migrations are useless
10. Migration tools
Native migrations (SQL, CQL, Datalog, etc)
are best
Don't do auto-migrations ever
Don't use tools that give you auto-
migrations
Use something like nomad, migrate
12. Complex migrations
Create new column/table/database
Read from the old place and write to both
Migrate all old data to the new place
Read from both places and compare
Clean up old code and data
17. ORMs considered
harmful
They provoke massive data over-fetch
Easy-to-miss 1+N queries
Hard to refactor and move parts of data to
other DBs
Very leaky abstraction
34. Availability
Services mostly die from other problems
Untested "available" DB can be a problem
Properly available system (CAP-available) is
a pain and resource sink