Because the actual process of upgrading Postgres is terrible.
I say this as someone who absolutely loves using it, but the actual process of upgrading Postgres is something that takes significant downtime, is error-prone and you're often better off just dumping all the databases and reimporting them in a new folder. (A good idea in general since it'll vacuum and compact indexes as well if you do it, combining a few maintenance routines in your upgrade path.)
It requires having the previous version of Postgres installed, something which can mess with a number of distro policies (not to mention docker, which is the most popular way to deploy software that will typically rely on Postgres), and unlike most software with that issue, Postgres is software you want to be managed by your distro.
Therefore, most people only upgrade by necessity - when their distro forces the upgrade or the version they're using reaches EOL.
kardianos [3 hidden]5 mins ago
There are two primary issues with PostgreSQL:
* Upgrades
* Database Protocol
MS SQL Server does better on both of these fronts: Upgrades, when running MS SQL Server in a container, is rebooting the database with the new database engine version. The DBMS sees the old files, upgrades them in place and starts the DB completely seamlessly 100% of the time.
The MS SQL Server protocol, TDS, supports native named parameters, executes the T-SQL PL inline with SQL. TDS also supports inline query cancellations because the protocol is framed, and both T-SQL and TDS supports returning multiple result sets from a single query.
But yes, whenever I see PG docs saying, major versions require an export and re-load, I shudder.
Tostino [3 hidden]5 mins ago
As someone who has a decade of professional postgres experience (and built my company on it), and 6 years of SqlServer experience, I couldn't agree more with your comment. I really, really like the TDS wire protocol and what it enables.
I also really miss their query optimizer. It was just so damn smart.
chasil [3 hidden]5 mins ago
I've only done it once, but doing an "rpm -Uvh" on the SQL Server binaries in Linux, then starting up the database did an immediate upgrade of the database files.
Oracle's DBUA is so much more painful.
whartung [3 hidden]5 mins ago
I'm sorry, not to quibble about the quality of the TDS protocol, but what does it have to do with the upgradability of SqlServer?
Tostino [3 hidden]5 mins ago
Nothing, they're unrelated.
However, I agree with OP that SQL Server does have a solid upgrade story and it's generally pain free.
They are just two aspects I really wish Postgres could do better on, and I miss working with those parts of SQL Server.
jeltz [3 hidden]5 mins ago
While I partially agree the solution is pretty simple to the multiple versions problem: use the official RPM and APT repos.
But obviously this is not suitable for all use cases. I don't know why pg doesn't invest in this sector.
thomaslord [3 hidden]5 mins ago
I'll confess - I have a project that uses Heroku's managed Postgres and my preferred upgrade method is to set the maintenance window to the middle of the night, create a backup, and be awake at 1am to make sure that nothing is broken after they force the upgrade. Their auto-upgrade process hasn't failed me so far, but there's no way to manually trigger it.
ktosobcy [3 hidden]5 mins ago
This!
I already mentioned this a while back and was basically berated (by someone that seemed like a dev/close to dev) that current setup is just fine and because of postgress complexity (extensions) it has to be done that way... and while I like postgress a lot it's quite annoying that the upgrade is such a terrible experience... :|
anymouse123456 [3 hidden]5 mins ago
Also a fan of Postgresql, but compiling and installing this tool is also a complete nightmare.
Every bit of the experience before getting it working, feels like it was designed by an MBA who owns a Postgres hosting service.
Other than that, it's amazing.
forinti [3 hidden]5 mins ago
I have always compiled from source so that I can have more than one version in place. The installation takes less than 1GB and is quite easy and fast to compile and I've never had any issues with upgrades. If you forget to compile an extension, you can do it later, copy the .so to the right folder, and you don't even have to restart the cluster.
If you don't want to use pg_upgrade, you can dump from one cluster and pipe directly into another, with no need for a temporary file. It couldn't be easier.
Can't say the same for Oracle. Even applying a patch can ruin several weekends.
icedchai [3 hidden]5 mins ago
I worked at a place that was running a 8 year old install of Oracle 8i because upgrading was just too painful. Oracle 10g was out by that point. It's been over 15 years since... I wonder if they ever upgraded?
literalAardvark [3 hidden]5 mins ago
> It couldn't be easier
If you have a trivial database, yes.
If you have a large, complicated database that takes hours/days to restore and has an HA SLA, very hell no.
forinti [3 hidden]5 mins ago
That's not going to be easy with any database.
Too [3 hidden]5 mins ago
With mongodb in a HA setup you can perform a staggered upgrade of individual nodes in the cluster with no downtime. Very smooth in my experience.
tgv [3 hidden]5 mins ago
ADd to that: if it works, don't fix it.
chefandy [3 hidden]5 mins ago
And considering how many people use some sort of abstraction layer like an ORM, have databases installed as part of some deployed application, or just do basic CRUD stuff, it probably would have worked on a postgres install from the 90s.
I keep an eye out for vulnerabilities, and that's about it.
beeboobaa3 [3 hidden]5 mins ago
This has been solved by logical replication a few versions ago.
1. Leave your old database running
2. Start your new database with new version
3. Run logical replication and let it catch up
4. Decide you want to switch over.
5. Using a script: Stop your apps writing to the database (or switch to read only mode), let logical replication catch up one final time, stop logical replication, run a procedure to fix your sequences, and switch your apps over to the new database.
You are now done. Sure, it requires a one-time investment to work out the exact steps needed for your situation, but it's easy to do with minimal downtime.
Of course if you don't care about downtime just shut down your database server, make a backup(!), and then do pg_upgrade.
ellisv [3 hidden]5 mins ago
> Of course if you don't care about downtime just shut down your database server, make a backup(!), and then do pg_upgrade.
I think this is the right answer for 99% of companies. Yes, everyone wants zero downtime, but I see incredibly few cases where it is actually worth the cost and effort.
RegnisGnaw [3 hidden]5 mins ago
This has not been solved by logical replication. There are features in PostgreSQL not supported by logical replication. One of the issues I'm dealing with is our product switched from Oracle to PostgreSQL and the ORM stuff love LOBs, and you know LOBs can't be logical replicated.
crabbone [3 hidden]5 mins ago
> Postgres is software you want to be managed by your distro.
Why?
For production systems I generally try to avoid using anything not built internally. When it comes to PostgreSQL, for example, if you want to submit benchmarks, or you want to report bugs, an important or sometimes a mandatory component is to provide compilation flags. Also, seeing how every kind of storage can act in a different way, tuning of higher-level storage programs seems inevitable. Even though PostgreSQL has a lot of configurations that can be done to the compiled program, some (eg. block size) are compile-time that can only be changed by compiling the program.
digitallis42 [3 hidden]5 mins ago
Depends on your scale. If you're a startup or even just a small side service, performance isn't going to be a bottleneck and you/the org wants the thing to be fire and forget, including bug fixes and especially security patches. A distro takes care of all of those generally and makes sure the dependencies have the same care taken.
throw0101c [3 hidden]5 mins ago
> A distro takes care of all of those generally and makes sure the dependencies have the same care taken.
This way when you upgrade your OS you don't have to worry about suddenly getting a new version (which, in the case of MySQL, may take a long time to convert its database files to the new version format).
necovek [3 hidden]5 mins ago
> This way when you upgrade your OS you don't have to worry about suddenly getting a new version (which, in the case of MySQL, may take a long time to convert its database files to the new version format).
At least on Ubuntu (and likely Debian), your existing Postgres version is never dropped until you are ready to manually upgrade yourself.
RealStickman_ [3 hidden]5 mins ago
Using third party repos makes upgrading the OS in general more difficult though and is more dangerous than simply using what your distro provides.
jamespo [3 hidden]5 mins ago
How does that work for python/perl/ruby libs etc?
MichaelZuo [3 hidden]5 mins ago
So the real question is, why is the upgrade process so incompetently designed, and why has no one fixed this?
phil21 [3 hidden]5 mins ago
My opinion is Postgres was designed by software developers for software developers. The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set, and the sysadmins having to support maintenance and production availability preferring MySQL.
One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level. Apparently us lowly sysadmins may have had a point after all.
This is a bit tongue in cheek but really not far from my lived reality. When the focus is on features and “correctness” at the near total expense of sane systems tooling folks can develop some myopia on the subject. So many arguments with devs on my teams over this subject that were utterly horrified to find we were running MySQL for a given service.
Open source projects tend to fix the pain points its contributors experience, and I assume there were not too many contributors wanting to deal with the boring work of making administration and easy task - it’s thankless “sideways” work that won’t result in many accolades or personal satisfaction for most SWEs.
The end users are almost always developers, most of whose experiences in production entail either the equivalent of a docker container level scale system, or are simply given a connection string and the rest is a black box to them. Under those contexts I’d personally prefer Postgres as well and it wouldn’t even be close. When you get into backups, clustering, upgrades, and high availability under extreme load? IMO the story falls apart real fast.
jeltz [3 hidden]5 mins ago
That has not been my experience at all. The sysadmins I have worked with have strongly preferred PostgreSQL over running MySQL while the developers have cared, but less so. The reason is that when something goes wrong PostgreSQL is much easier to diagnose. So while some tasks like upgrades take less manual effort with MySQL PostgreSQL is more stable and gives better error messages for when you get paged in the middle of the night.
PostgreSQL has prioritized correctness and stability which while it has made certain features more clunky to use or taken longer time for them to be implemented, when you get paged n the middle of the night you get the time you spent back. PostgreSQL also has spent a lot of time on improving DBA experience so mixed DBA/sysadmins also usually prefer PG.
sgarland [3 hidden]5 mins ago
> The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set
I’ve seen this as well, but when pressed, none of them could articulate what part of its feature set they actually needed to use.
> One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level.
Are you (or your devs, rather) actually running your own DBs? If so, respect. My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.
ttfkam [3 hidden]5 mins ago
> none of them could articulate what part of its feature set they actually needed to use.
Transactional DDL: migration errors never leave the database in an intermediate/inconsistent state.
Range types + exclusion constraint: just no way to do this in MySQL without introducing a race condition.
Writeable CTEs: creating insert/update/delete pipelines over multiple tables deterministically. Seriously though, the RETURNING clause is something I use all the time both in and out of CTEs.
Filtered aggregates and grouping sets: cleanly get multiple data points for a dashboard in one shot.
Unnest: converting arrays into a set of rows. Inverse of array_agg(...).
Types: arrays, booleans, IP/subnets, UUIDs (without binary(16) hacks), etc.
Materialized views: seriously, how does MySQL not have this yet?
Statement-level triggers: another option from per-row.
Row-level security: setting data visibility based on configurable policies.
I can cite specific use cases I've deployed to production for each of these and more.
evanelias [3 hidden]5 mins ago
That's a good list. A handful of these are already doable in modern MySQL and/or MariaDB though.
JSON can often be used in place of arrays, and JSON_TABLE in both MySQL and MariaDB converts JSON into tabular data. MySQL supports multi-valued indexes over JSON, where each row can have multiple index entries (or no entries, e.g. partial index).
MariaDB has built-in convenience types for ipv4, ipv6, and uuid. Or in MySQL you can just use virtual columns to add human-readable conversions of binary columns, although that is admittedly slightly annoying.
MariaDB supports RETURNING.
ttfkam [3 hidden]5 mins ago
> My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks
That's a true shame considering how easy it is to make a read replica of any size and then fail over to it as the new primary. Definite skill issues.
dalyons [3 hidden]5 mins ago
> My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.
Yep that’s exactly what I’ve seen too :). I still overall prefer this distributed database model - yes you spend more and people make mistakes (and learn). But if you can afford it you get higher velocity and more incentive aligned ownership than the old central gate keeping DBA team model.
sgarland [3 hidden]5 mins ago
I don’t mind the model IFF the team has interest in learning how to do it correctly. My biggest complaint as both an SRE and now DBRE has been that dev-managed infrastructure inevitably means during an incident that I had nothing to do with, I’ll be paged to fix it anyway. Actually, that’s not the problem; the problem is later when I explain precisely how and why it broke, and how to avoid it in the future, there’s rarely any interest in doing so.
“You have an unpartitioned table with a few billion rows and a UUIDv4 PK. I’m amazed it’s been working as long as it has. You need to change your schema.”
“Or we can just buy a bigger instance.”
“…”
Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.
Re: incentives, yes, also a problem. Dev teams are generally incentivized by Product, who doesn’t care at all whether or not something is optimal, only that new features are being shipped. I despise this mentality, but it’s not usually the devs fault.
necovek [3 hidden]5 mins ago
UUID (version does not matter for storage, only for generation and distribution) is basically a 128-bit unsigned int, so a double "word" on 64-bit platforms, and it's natively supported by Postgres since at least 8.3 (earliest version with docs up).
While most versions ensure it's random, there are plenty of indexing algorithms that make searching through that quick and close to O(1), so that should not be the schema problem.
Unless you used a string field, but there is a quick workaround for that with Postgres too (make an index on `UUID(field)`, and look it up by `UUID(value) = UUID(field)`).
That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases: you can easily handle some small mistakes in schemas too while you do the full migration in parallel.
sgarland [3 hidden]5 mins ago
Who said this was Postgres? MySQL (with the default InnoDB engine) and MSSQL both are clustering indexes; they store tuples around the PK. For a UUIDv4 PK – or anything else non-k-sortable, for that matter – this results in a massive amount of B+tree bloat from the random inserts.
But sure, let’s talk about Postgres. After all, it stores tuples in a heap, and so is immune to this behavior.
Except that its MVCC implementation means that it has to maintain a Visibility Map [0] to keep track of which pages contain only tuples which are visible to every active transaction. This is nominally used for vacuum decisions, but is also cleverly exploited for use with index-only scans. This poses a problem when referencing non-k-sortable entries, because while the tuples are in a heap, the indices are generally a B+tree. So now we’re back to the problem of massive IO amplification. Even if the VM and index are entirely cached, reads are reads, and they add up.
Then there’s the issue of WAL bloat due to full page writes. tl;dr Postgres pages are nominally 8 KiB, and storage device pages are nominally 4 KiB. To guarantee an atomic write, Postgres writes the entire page for a given tuple for the first write after checkpoint, regardless of how many bytes were altered. Again, non-k-sortable: if your tuples are spread randomly across pages, you now have far more pages to write. This can and does matter for either write-heavy workloads, instances with limited network bandwidth (oh hi small RDS / Aurora), or the worst, both combined.
Re: search complexity, I’m pretty sure B+trees (and B-trees) have O(log n) time complexity, not O(1). Whether or not that’s “close” depends on n, I suppose, but in this scenario I specifically said “billions of rows.”
> That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases
I’m a DBRE, and like both MySQL and Postgres. They both have strengths and weaknesses, but you need to deeply understand those – and properly design your schema and query patterns around them – in order to make an informed decision.
> Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.
Easy enough: almost no one writes SQL queries by hand these days, not for querying the database nor for doing schema upgrades. It's all done by tools - Doctrine in the PHP world for example. And pretty much no one but actual CS graduates knows anything deeper about databases.
Result is, devs are happy enough if they found something that works, and don't want to risk being the one who broke prod because they applied some schema change suggested by their DBA who doesn't know some random thing about the application.
Tostino [3 hidden]5 mins ago
You would be very surprised to see the workflow of DB heavy development teams. Some oracle devs have entire backends coded in pl/sql.
My last company had an absolute ton of pl/pgsql written to support hundreds of ELT pipelines, migrations were all hand written and managed with liquibase.
There are more of them than you'd think out there. Just generally supporting some boring b2b or backend software.
mxey [3 hidden]5 mins ago
As someone who operates both I much prefer to run pg_upgrade every few years rather than dealing with mysqldump, replication issues and babysitting Orchestrator.
literalAardvark [3 hidden]5 mins ago
just as long as you don't forget to reindex after
mxey [3 hidden]5 mins ago
> All failure, rebuild, and reindex cases will be reported by pg_upgrade if they affect your installation; post-upgrade scripts to rebuild tables and indexes will be generated automatically.
djbusby [3 hidden]5 mins ago
It's a step of your script or runbook. Right?
eru [3 hidden]5 mins ago
Of course, sqlite is even easier on the sys-admins, (but not necessarily the right tool for the job.)
gramakri2 [3 hidden]5 mins ago
How does one backup a sqlite without stopping the app or disrupting service? I couldn't find a simple answer to this
zaphar [3 hidden]5 mins ago
The typical solution is https://litestream.io/. Comparatively easy to setup for most usages of sqlite.
tommy_axle [3 hidden]5 mins ago
With sqlite3 -backup
There's an online backup api
jeltz [3 hidden]5 mins ago
To avoid having to slow down development of new PostgreSQL features. Improving upgrades in a way where PostgreSQL does not need to either maintain multiple different versions of parts of the code and/or lock down internal interfaces which now can change freely every major version so they cannot be refactored and improved in the future is not a trivial task, maybe even impossible. Even just the existence of pg_upgrade has to some degree limited what can be done to improve PostgreSQL. Obviously pg_upgrade is worth it, but hindering development even further might not be popular.
The PostgreSQL team simply does not have the resources to do this. At least not without significantly slowing down development of everything else which there is no political will for. Maybe someone will come up with a genius idea which solves this but I am doubtful. Usually there is no free lunch.
Maybe some core dev will correct me, I am quite familiar with the PostgreSQL project but not one of the core devs. :)
paulryanrogers [3 hidden]5 mins ago
What is this holding back? A redo based alternative to MVCC?
jeltz [3 hidden]5 mins ago
Nope, that is totally unrated. To support upgrade in place without an old version of PostgreSQL:
1. The new version of PostgreSQL would need to able to read all old catalog table formats and migrate them.
2. The new version of PostgreSQL would need to support all old versions of the parse tree to migrate views.
3. Likely a bunch of more things that I do not know of. I for example doubt it is trivial to just read an old catalog without having a fully up and running cluster which supports almost everything in that old cluster. The catalog has TOAST tables and indexes for example.
Right now 1 and 2 are implemented in pg_dump plus by having pg_dump call functions in a running old version of PostgreSQL.
ttfkam [3 hidden]5 mins ago
It is a PITA, but I've written scripts that pg_dump just the schema, load the schema into the new db with the new version, set up logical replication between the two, wait for them to sync, reset all the sequences, and rebuild indexes before doing the handover.
It works with basically no downtime but I agree this kind of thing should definitely be easier, even turnkey.
paulryanrogers [3 hidden]5 mins ago
Does pg_upgrade not do all that?
Or do you mean the new Pg server should transparently do the upgrade automatically? And while online?
whizzter [3 hidden]5 mins ago
I think one really BIG factor is that built-in logical replication wasn't introduced until PostgreSQL 10 in 2017, before that you only had physical replication for master-slave but iirc that didn't work between versions so doing a "hot" upgrade was more or less impossible without third-party tools iirc.
So even if it's available these days, the amount of people still subjected to upgrades from older version still leaves an impression that it's really bad.
RegnisGnaw [3 hidden]5 mins ago
There is still major issues with logical replication, mainly limited feature support.
creshal [3 hidden]5 mins ago
Yeah, everywhere I worked there's usually two clusters of postgres databases
- whatever made it past the postgres 10 hump is on the most recent version and keeps getting migrated
- whatever is still on 9.x is scheduled to remain that way until the product using it will be EOL'd
WJW [3 hidden]5 mins ago
The answer to both is probably "because maintainer time is scarce and nobody is interested enough".
throwaway19972 [3 hidden]5 mins ago
There are many answers to this question and few are as satisfying as the question implies.
bityard [3 hidden]5 mins ago
You might as well ask, why does anyone run an older version or anything? The reasons will be largely the same.
Most of the software on my machines are "old" because they are part of a Linux distribution that (aside from security issues) was frozen in time a year or two ago so that it could be tested, released, and maintained. I am quite happy to have a system that I know is not going to break (either itself, or my workflow) when I apply security updates.
People who MUST HAVE the latest version of everything I feel either have some deeper FOMO issues to work out, suffer from boredom, or look at their computers as hobbies themselves rather than tools. (Which is fine, just be honest about what it is.)
That said, much of my career has been spent working at companies who got so busy shipping features that upgrading infrastructure never makes it above the fold. You can tell the managers that working around old software adds costs that scale with the age of the infrastructure, but they don't always listen. I currently work at a company that still has loads of CentOS 7 hosts still in production, and only fairly recently began upgrading them to RHEL 8. (Not 9!)
efields [3 hidden]5 mins ago
These are the companies you want to be at IMHO. Provided the compensation is adequate, slow and stable > fast and pivot-y.
necheffa [3 hidden]5 mins ago
> These are the companies you want to be at IMHO. Provided the compensation is adequate, slow and stable > fast and pivot-y.
Absolutely...not.
Slow does not mean stable. Slow means the floor is rotting out from under you constantly.
Being prudent about when and where to upgrade is a very active, intentional process that the typical company simply don't have the stomach or skill for.
chasil [3 hidden]5 mins ago
We are still running OS2200 EXEC-8.
That platform got SMP in 1964.
throwaway894345 [3 hidden]5 mins ago
Yeah, eventually you will have to upgrade and deal with all of the accumulated debt. You don’t have to be on the bleeding edge but you should still be updating regularly.
stackskipton [3 hidden]5 mins ago
SRE here, not at all.
Not chasing shiny is important but generally when tech debt builds up this high, life is generally hell in terms of outages, unable to accomplish basic tasks and dealing with a bunch of people who have NIH syndrome.
Spivak [3 hidden]5 mins ago
Which is why you build on a platform like Alma/Redhat that gives you 10 years of support. You can call it outdated I guess but I prefer "supported." Make everyone else work out the new bugs before upgrading-- it used to be the rule not to update to a .0 release but being a hip modern developer means moving the 0 to the other side and yoloing it.
ziml77 [3 hidden]5 mins ago
The problem is that software ends up not getting touched for all those years, but eventually needs an upgrade when it's at end of support. And at that point you end up having to make changes to a system where no one remembers how it works or how to deploy it. Keeping software up to date to me is similar to how you practice disaster recovery scenarios. You do it regularly so no one forgets the processes and so any issues can be dealt with while you're not under a short time limit.
willsmith72 [3 hidden]5 mins ago
older versions can also mean deprecated packages everyone's too scared to touch, failure to invest in maintenance and tech debt reduction, or use of old technologies which stopped receiving security updates
throwaway894345 [3 hidden]5 mins ago
I don’t necessarily need to be on the latest version, but I prefer to take many small upgrades rather than one big upgrade at least when it comes to databases. Frequent upgrades also forces an organization to get good at upgrading and managing the risk.
ellisv [3 hidden]5 mins ago
I enjoyed this thread (and the article) as I'm preparing to upgrade several Postgres databases from 14 to 16, which should take about 25 minutes or less.
My experience has been that most people (including devs) hardly think about their database at all. Everyone just takes the database for granted until a dev runs a bad query that locks a table. Devs let their ORM manage everything for them and don't take the time to think for themselves.
Also I rarely see teams with a culture that prioritizes maintenance, even for their own software. It's common for teams to become wary of upgrading because they do it infrequently. Like most things, the more often you perform an upgrade the easier it is. Smaller changes are easier to revert. The more often you do something, the better at it you get. Delaying maintenance often just increases the work later.
jgb1984 [3 hidden]5 mins ago
I've used postgresql in most of my previous jobs, but using mariadb in my current one, and I must say it's a joy to administer.
The replication and high availability that mariadb offers is rock solid, and much more advanced than what postgresql has to offer.
It works out of the box, no plugins or third party shenanigans needed, and there is "one obvious way" to do it. Not a dozen options like in the postgresql ecosystem, each with it's own quirks and drawbacks.
Also, upgrades are a dream. I did 4 major long term stable release upgrades so far and everything was fully automatic and smooth sailing all around.
All of that with about 10 seconds downtime only for each upgrade, despite being a chunky 6TB database, thanks to the user friendly replication process.
I respect postgresql a lot, but mariadb allows me to sleep much better at night thanks to replication, smooth upgrades and no VACUUM to worry about.
kstrauser [3 hidden]5 mins ago
To be fair to PostgreSQL, it has multiple replication setups because they have different inherent properties. For instance, it comes with built in streaming and logical replication setups. Which should you use? It’s impossible to say without knowing your own exact use case.
By analogy: should a database cluster optimize consistency or availability? Answer: there’s no way its author can guess which is more important to how you want to use it.
homebrewer [3 hidden]5 mins ago
MySQL/MariaDB have binary and logical replication too. They also support much more advanced topologies like multi-master out of the box. It's just that what you need for 99% of situations relies on a well trodden path and requires no configuration at all.
I have suspected for a long time that most people who criticize MySQL have never actually worked with it, or have done so a couple of decades ago. It's often the default choice if you don't need some of the PostgreSQL features (like PostGIS) and can work with either one.
ttfkam [3 hidden]5 mins ago
Since v16, Postgres supports bidirectional logical replication out of the box. In other words, multiple writers.
mxey [3 hidden]5 mins ago
MySQL has only logical replication, there is no equivalent to PostgreSQL‘s WAL/physical replication.
throw0101c [3 hidden]5 mins ago
> They also support much more advanced topologies like multi-master out of the box.
This is the one thing in My/MariaDB that I miss in Pg: Galera. So handy for in-house stuff that needs some kind of HA: either a three-server setup, or a two-server+arbitrator.
The same is true for MariaDB so I have no idea what he is talking about. Both databases have multiple options.
mxey [3 hidden]5 mins ago
> The replication and high availability that mariadb offers is rock solid, and much more advanced than what postgresql has to offer. It works out of the box, no plugins or third party shenanigans needed, and there is "one obvious way" to do it.
This is highly interesting to me because I was on the PostgreSQL bandwagon way before it was cool(mostly because I’m a BSD/MIT license fanatic).
jeremycarter [3 hidden]5 mins ago
I've maintained a few large postgres databases and completely agree.
elric [3 hidden]5 mins ago
Lots of dogmatism in this discussion, it seems. A couple of things:
1. Most psql deployments are not exposed to the interwebz, they are typically only accessible to the applications that need them by virtue of network setup (firewalls etc). This limits the attack vector to whatever the application does. Good.
2. Distro vendors (RHEL et al) often stick to major psql release for the lifecycle of the OS version. If the OS lives longer than the psql major version, they take on the responsability of backporting critical security issues.
3. While upgrades aren't hard, they're not easy either.
4. Psql is pretty much feature complete for many workloads, and pretty stable in general. For many people, there is little need to chase the latest major version.
bravetraveler [3 hidden]5 mins ago
> 4. Psql is pretty much feature complete for many workloads, and pretty stable in general. For many people, there is little need to chase the latest major version.
To drive this a little further, "latest and greatest" doesn't always apply. I've chosen software - even databases - for greenfield deployments one or two releases behind for their known characteristics.
Stability doesn't imply perfection, but rather, predictability.
mhd [3 hidden]5 mins ago
> 3. While upgrades aren't hard, they're not easy either.
And in my experience, apart from ease of use there's also a major trust issue here. If you're upgrading your app server framework/language, it's easy enough to do a rollback. With databases, people are worried that they might not notice errors right away and then you have to merge the data accumulated since the upgrade with the last backup in case of a rollback.
Not saying that this is entirely rational...
Also, new features on the SQL level are hard to sell if all you're doing is lowest common denominator ORM ("New window functions and faster lateral joins? But we're doing all that in our code!").
atoav [3 hidden]5 mins ago
Also:
5. If your IT department is spread thin already and that old version is running fine, the incentive to potentially create more work for yourself is not gigantic.
Dalewyn [3 hidden]5 mins ago
One of the first laws of the universe that a good engineer learns is: Do not fix what is not broken.
And no, being old is not broken.
WJW [3 hidden]5 mins ago
The entire field of maintenance engineering would like a word. Over longer periods of time it's vastly cheaper to regularly fix things even before they break, and software is no exception.
Amongst other reasons:
- Performing regular upgrades is an easy and cheap way to maintain a healthy knowledge base about the system. It's always easier to fix a system that is well understood than a black box which nobody has touched in 20 years. Upgrading regularly also leads to people being more comfortable with the upgrade process and it is likely to become a "regular thing" with checklists and other process improvements that make maintenance safer and more reliable.
- Doing preemptive maintenance means YOU get to choose when the system is unavailable and then you can plan accordingly, perhaps by planning it during a period with low usage or even arranging for a redundant system to take up the load. The alternative is leaving it up to chance when a system will fail, and that's frequently at high-load moments when you can least afford it.
- As a corollary to the previous point: a team which is in control of its systems can have more predictable output of feature work, since there will be fewer "unexpected" events.
Not maintaining your systems is only cheaper in the long run if you don't count the engineering time required to fix things when they break or need to be replaced. Ounce of prevention vs pound of cure and all that.
diffeomorphism [3 hidden]5 mins ago
Maintenance engineering is in full agreement with that. You are maintaining the old version, backporting fixes etc. instead of tearing down the whole thing and replacing it with new shiny all the time.
caeril [3 hidden]5 mins ago
Not to mention:
- If you're scared to perform upgrades, this is a good indicator that you lack confidence in your backup/recovery/reversion policies and procedures, probably for good reason, so maybe fix that.
LaGrange [3 hidden]5 mins ago
> The entire field of maintenance engineering would like a word. Over longer periods of time it's vastly cheaper to regularly fix things even before they break, and software is no exception.
I mean I think it's because maintenance is so unglamorous. So when it happens and everything doesn't collapse, nobody remembers and starts asking why we even do it (and then sysadmins and middle management suddenly aren't a thing, and companies and IT become exactly as fragile as anyone with _actual_ understanding of management would expect).
Meanwhile when regular maintenance fails in progress, it often ends up in the news with a body count attached.
One of my favourite podcasts has a running joke that you should never do maintenance (because so many industrial disasters happen during it). Of course the bias is the point of the joke - and usually the reason things went bad is because either the maintenance was neglected until that point, or the engineering picked Postgresql, I mean, didn't consider the necessity of maintenance.
It leads to a lot of old software which is not going to be upgraded ever. Then the entire project dies and gets rewritten from the scratch, because nobody wants to work with Windows 2003 server running Delphi 7, Java 1.4 and Oracle 9i in 2020 (personal experience).
Old software is not necessarily broken, but it is always a tech debt. And you can't live in debt forever, our IT does not work this way.
sunnybeetroot [3 hidden]5 mins ago
I do agree, however I think it’s often easier to upgrade iteratively and deal with smaller issues that arise as opposed to upgrading a huge version diff and struggling to understand and fix all the failing parts.
elric [3 hidden]5 mins ago
I think there's a balance to be struck there. On large databases, upgrade time can be very slow. Is it worth upgrading iteratively at great expense if there are no benefits to be gained (e.g. no bugfixes that affect you, no critical security issues)? Maybe, maybe not.
comprev [3 hidden]5 mins ago
Preventative maintenance is a thing in both software and hardware world.
It's often lower risk to apply frequent smaller changes than wait years for one enormous "bundled" upgrade.
yxhuvud [3 hidden]5 mins ago
Being old is not being broken, but being old can make it so that something that is broken cannot be fixed.
atoav [3 hidden]5 mins ago
This is solid advice, however I would caveat that you can't know if it is broken if you are not checking. E.g. if your old as hell database has a known vulernability that can be expoited, unless you can rule out that it can be used in your setup it is broken by all definitions of the word.
Dalewyn [3 hidden]5 mins ago
It's not a caveat since checking is a prerequisite to the law that shall not be violated.
If you are "fixing" a Schroedinger's Cat, my dude the Supreme Court of the Universe will not be kind.
LaGrange [3 hidden]5 mins ago
One of the first laws of universe that an experienced engineer learns is that "do not fix what is not broken" never actually applies, and is only brought up by people invulnerable to consequences.
That doesn't mean "upgrade recklessly," but it does mean you should know _why_ you're either upgrading or _NOT_ upgrading. That's your job, much more than the act of upgrading itself.
Unpublished vulnerabilities in old software are not a hypothetical. And very old packages are usually broken, just coped with at the expense of significant lost opportunity cost - or because the failure is a combination of rare and impactful that means once it happens everyone is out of job anyway.
Seriously, I've yet have to encounter a sysadmin using that old, silly adage at me and not later have to admit I was right.
Edit: so no, you don't stay on an ancient version of the database because "it's not broken." You're staying on it because _the upgrade process itself_ is so broken you're terrified of it.
kayodelycaon [3 hidden]5 mins ago
I generally follow if it’s not broken, fixes need to be carefully planned. I can’t tell you how many times I thought I’d quickly do an upgrade and things would go wrong, like all of my home automation stop working right before bed.
KaiserPro [3 hidden]5 mins ago
Upgrading a datastore is a massive, massive risk.
It requires a lot of work, planned downtime, or some way to smear updates across the estate.
The cost of any failure is very high. The benefit of any major upgrade is also vanishingly small. Unless you need a specific feature, its just not worth it.
ttfkam [3 hidden]5 mins ago
Or when the version is EOL, not getting security patches, and/or only compatible with an OS that isn't getting security updates.
…or you're on AWS RDS, which will automatically bump your db cluster if it goes EOL and you ignore the notices for more than a year.
newaccount74 [3 hidden]5 mins ago
> 3. While upgrades aren't hard, they're not easy either
I guess it depends on scale? I was surprised how easy it was on Ubuntu. There was an automatic migration script, and it worked. Took less than 5 minutes to upgrade.
Sure, there was downtime, but I think most applications out there can live with scheduled downtime of a few minutes.
If you can't have downtime, then nothing is easy.
ForHackernews [3 hidden]5 mins ago
Lots of companies pretend like they can't afford 5 minutes of scheduled downtime. Then they end up with 12 hours of unscheduled downtime when shtf.
bravetraveler [3 hidden]5 mins ago
For SRE (site reliability engineering) the term of art is 'error budget'
To others reading, be mindful: database upgrade times depend greatly on the data stored within them/where/who is using them. Your development environment doesn't represent production. If the distinction even exists!
A five minute upgrade can become indefinite with a single lock [row/table, depending on the storage engine/etc]
JeremyNT [3 hidden]5 mins ago
I think this is a really good take. It all boils down to "if it ain't broke don't fix it."
No matter how easy it is, it takes more effort and thought to do an upgrade than it does to not do an upgrade at all, and for most users the upside is basically invisible if it exists at all.
xvinci [3 hidden]5 mins ago
"What the application does" may not be what you think of, as it is dependent on how secure the application or the layers beneath it are. This is how people get everything pwned step by step. The database server may then reveal credentials to other apps etc.
Dylan16807 [3 hidden]5 mins ago
If the database server has significant "other apps", which it probably doesn't.
xvinci [3 hidden]5 mins ago
Sure, but then chances are it's hosted on a nas with other data which you dont want ransomware'd, has access to other parts of the network, etc. - it's easy to underestimate the potential impact
elric [3 hidden]5 mins ago
Sure. Defense in depth is important. But I hope that your application is only able to talk TCP/5432 to psql. No amount of psql upgrading will protect you against SQL injections in shitty application code.
paulryanrogers [3 hidden]5 mins ago
Upgrades are hard. There was no replication in the before times. The original block-level replication didn't work among different major versions. Slony was a painful workaround based on triggers that amplified writes.
Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.
At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.
darth_avocado [3 hidden]5 mins ago
> At least when you cannot spare more than a few minutes of downtime.
I think it boils down to this. We used to constantly be running the obsolete pg version until it became an emergency mostly because upgrading with the tooling available at the time was very painful. Today however, we stay relatively up to date. Once you figure out the data replication, you can almost do blue green deployments on databases with almost no down time.
"MySQL is a popular relational database. We revisit Kleppmann’s 2014 Hermitage and confirm that MySQL’s Repeatable Read still allows G2-item, G-single, and lost update. Using our transaction consistency checker Elle, we show that MySQL Repeatable Read also violates internal consistency. Furthermore, it violates Monotonic Atomic View: transactions can observe some of another transaction’s effects, then later fail to observe other effects of that same transaction. We demonstrate violations of ANSI SQL’s requirements for Repeatable Read. We believe MySQL Repeatable Read is somewhat stronger than Read Committed. As a lagniappe, we show that AWS RDS MySQL clusters routinely violate Serializability."
"PostgreSQL is a widely-known relational database system. We evaluated PostgreSQL using Jepsen’s new transactional isolation checker Elle, and found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable. Under normal operation, transactions could occasionally exhibit G2-item: an anomaly involving a set of transactions which (roughly speaking) mutually fail to observe each other’s writes. In addition, we found frequent instances of G2-item under PostgreSQL “repeatable read”, which is explicitly proscribed by commonly-cited formalizations of repeatable read. As previously reported by Martin Kleppmann, this is due to the fact that PostgreSQL “repeatable read” is actually snapshot isolation. This behavior is allowable due to long-discussed ambiguities in the ANSI SQL standard, but could be surprising for users familiar with the literature. A patch for the bug we found in serializability is scheduled for the next minor release, on August 13th, and the presence of G2-item under repeatable read could be readily addressed through documentation."
slotrans [3 hidden]5 mins ago
"Not as robust as MySQL"? Surely you're joking.
sgarland [3 hidden]5 mins ago
They’re not wrong. If you’ve ever spent meaningful time administering both, you’ll know that Postgres takes far more hands-on work to keep it going.
To be clear, I like both. Postgres has a lot more features, and is far more extensible. But there’s no getting around the fact that its MVCC implementation means that at scale, you have to worry about things that simply do not exist for MySQL: vacuuming, txid wraparound, etc.
lelanthran [3 hidden]5 mins ago
My experience of both is that MySQL is easier for developers, PostgreSQL is easier for sysads.
That was true in 2012; dunno if it still applies though.
sofixa [3 hidden]5 mins ago
I doubt it was true in 2012, because sysadmins would be the ones trying to make it run reliably, including things like replication, upgrades, etc.
Pretty sure that even in 2012 MySQL had very easy to use replication, which Postgres didn't have well into the late 2010s (does it today? It's been a while since I've ran any databases).
yxhuvud [3 hidden]5 mins ago
In 2012 MySQL had several flavors of replications, each with its own very serious pitfalls that could introduce corruption or loss of data. I saw enough of MySQL replication issues in those days that I wouldn't want to use it.
But sure, it was easy to get a proof of concept working. But when you tried to break it by turning off network and/or machines, then shit broke down in very broken ways that was not recoverable. I'm guessing most that set up MySQL replication didn't actually verify that it worked well when SHTF.
est [3 hidden]5 mins ago
> pitfalls that could introduce corruption or loss of data
sometimes, repairing broken data is easier than, say, upgrading a god damn hot DB.
MVCC is overrated. Not every row in a busy MySQL table is your wallet balance.
evanelias [3 hidden]5 mins ago
Many of the largest US tech companies were successfully using MySQL replication in 2012 without frequent major issues.
source: direct personal experience.
sofixa [3 hidden]5 mins ago
Maybe that was true in 2012 (maybe it was related to MyISAM) but by ~2015 with InnoDB MySQL replication was rock solid.
yxhuvud [3 hidden]5 mins ago
It was not related to MyISAM.
How did you verify that it was rock solid? And which of the variants did you use?
lelanthran [3 hidden]5 mins ago
> I doubt it was true in 2012, because sysadmins would be the ones trying to make it run reliably, including things like replication, upgrades, etc.
Possibly I got it wrong and switched around which was easier on the devs and which was easier on the sysads?
In my defence, ISTR, when talking to sysads about MySQL vs PostgreSQL, they preferred the latter due to having less to worry about once deployed (MySQL would apparently magically lose data sometimes).
karamanolev [3 hidden]5 mins ago
MyISAM in the olden days could/would magically lose data. InnoDB has been the de facto standard for a while and I haven't seen data loss attributed to it.
GoblinSlayer [3 hidden]5 mins ago
>MySQL is easier for developers
Except that search doesn't work, because all text is in latin1 encoding.
sgarland [3 hidden]5 mins ago
While obviously I prefer Unicode, latin-1 is perfectly acceptable for most use cases in Western cultures.
What part of searching have you found to not work well using it?
homebrewer [3 hidden]5 mins ago
This was never true if you know what you're doing, and hasn't been true since 2018 even if you don't.
yxhuvud [3 hidden]5 mins ago
Either latin1 or that absolutely horrible max-three-byte utf8 encoding that used to be the only available option for utf8.
williamdclt [3 hidden]5 mins ago
Interestingly, someone else in another comment is arguing the exact opposite!
wbl [3 hidden]5 mins ago
Yeah but you don't need to worry about your data existing. MySQL has been known to silently fail the one job of a DB.
maxk42 [3 hidden]5 mins ago
Not in around 15 years. You're thinking of when MyISAM was the default storage engine for MySQL. It has been InnoDB for over a decade. InnoDB is very reliable - I've never had a single data loss incident in all that time, and I've managed some very large (PB-scale) and active databases.
Postgres is definitely more difficult to administer.
yxhuvud [3 hidden]5 mins ago
MySQL used to have horrible and very unsafe defaults for new installations that persisted well after the introduction of InnoDB. Those went unfixed for a very long time.
Symbiote [3 hidden]5 mins ago
People are criticising decade-old releases of PostgreSQL, so it seems fair to compare them to similarly aged MySQL releases.
SavageBeast [3 hidden]5 mins ago
I recall this being the case A LOOOONG time ago but I haven't heard of, read about, been warned to look out for or personally seen such a thing in forever. Have you?
* I'm running a lot of MySQL stuff and such a topic might be of interest to me
o11c [3 hidden]5 mins ago
From what I can tell, MySQL is supposed to be safe since 2018 if you have no data from before 2010.
The fact that you still can't use DDL in transactions makes life exceedingly painful, but it's technically safe if you write your migration code carefully enough.
p_l [3 hidden]5 mins ago
Some places still have columns declared as utf8 instead of utf8mb4, and there's a special place in hell for authors of the MySQL general clusterfuck regarding encodings - it was all nice and great if you didn't care about anything other than latin1 or ASCII - go outside that before utf8 option and it was horror that even experienced operators managed to fuckup (I have a badge from a Google conference in 2017 with nicely visible effect of "we have mixed up one of the three separate encoding settings in MySQL and now you have mojibake in your badge").
And then there's UTF8 not actually being UTF8, which can result in total lockup of a table if someone inputs a character that does not fit in UCS-2 and now you need to recover the database from backup and preferably convert all instances of utf8 to utf8mb4, because fuck you that's why.
lol768 [3 hidden]5 mins ago
Yeah, the whole charset/collation design is a complete dumpster fire.
sgarland [3 hidden]5 mins ago
In fairness, reasoning about collations is like peering into the abyss. I get why they’re required to have so many levels of detail, and the Unicode Consortium has done a fantastic job, but to say they’re complicated is putting it mildly.
The_Colonel [3 hidden]5 mins ago
Lack of transactional DDL is certainly painful, but not unique for MySQL. Oracle doesn't support it either.
justinclift [3 hidden]5 mins ago
Both databases are owned by the same company. Wonder if it's something inherent to the company which has keep that limitation going?
elisbce [3 hidden]5 mins ago
Yes, it is messy when you want your MySQL databases to be mission critical in production, e.g. handling a large amount of customer data. Historically MySQL's High Availability architecture has a lot of design and implementation issues because it was an afterthought. Dealing with large amount of critical data means you need it to be performant, reliable and available at the same time, which is hard and requires you to deal with caching, sharding, replication, network issues, zone/resource planning, failovers, leader elections and semi-sync bugs, corrupted logs, manually fixing bad queries that killed the database, data migration, version upgrades, etc. There is a reason why big corps like Google/Meta has dedicated teams of experts (like people who actually wrote the HA features) to maintain their mission critical MySQL deployments.
pritambarhate [3 hidden]5 mins ago
My experience has been exactly opposite. Ability to do Vacuums is good. MySQL doesn’t free up space taken by deleted rows. The only option to free up the space is to mysqldump the db and load it again. Not practical in most of the situations.
sgarland [3 hidden]5 mins ago
VACUUM rarely reclaims space from the OS’ perspective, if that’s what you meant. It can in certain circumstances, but they’re rare. VACUUM FULL is the equivalent to OPTIMIZE TABLE – both lock the table to do a full rewrite, and optimally binpack it to the extent that is posssible.
EDIT: my mistake, OPTIMIZE TABLE is an online DDL. I’ve been burned in the past from foreign key constraint metadata locks essentially turning it into a blocking operation.
iamjkt [3 hidden]5 mins ago
Running 'optimize table <table>' reclaims the space on MySQL/MariaDB.
benoitg [3 hidden]5 mins ago
Not really, the innodb_file_per_table variable has been set to 1 for a long time. Running OPTIMIZE TABLE frees up the disk space in this case.
sudhirj [3 hidden]5 mins ago
Is this process materially different from a vacuum? Does it manage to optimise without a write lock?
I don't know how VACUUM works, I couldn't tell you about the differences.
The OPTIMIZE works almost exclusively with online DDL statements. There's only a brief table lock held during table metadata operations, but I haven't found that to be a problem in practice. (https://dev.mysql.com/doc/refman/8.4/en/optimize-table.html#...)
erik_seaberg [3 hidden]5 mins ago
Early MySQL versions made egregious design choices like quietly ignoring missing foreign keys and enum typos, truncating long strings, and randomly choosing rows from groups.
Yeah, it was bad. What kills me is SQLite has its own absurd set of gotchas [0] yet is seen as amazing and wonderful by devs. PKs can have NULLs? Sure! Strings can have \0 in the middle of them? Why not? FKs aren’t enforced by default? Yeah, who needs referential integrity, anyway?
My only conclusion is that the majority of devs don’t actually read documentation, and rely purely on the last blog post they read to influence their infrastructure decisions.
MySQL was immortalized as the database in every LAMP stack. And just like PHP it improved considerably since then.
p_l [3 hidden]5 mins ago
Also for licensing reasons for a long time there was surviving contingent of MySQL 3.23 in LAMP hosting.
johannes1234321 [3 hidden]5 mins ago
While that change from LGPL to GPL affected only the client library (server always was GPL(+commercial)) and the MySQL company relatively quickly reacted with a FOSS exception to the GPL and by providing a reimplementation of the client library under PHP license (mysqlnd) to serve that market.
(I joined MySQL shortly after that mess, before the Sun acquisition)
cenamus [3 hidden]5 mins ago
Is slony some sort of language joke? Slon is czech (probably slavic in general) for elephant.
(which may be actually derived from the turkish aslan, for lion, but somehow the animal got mixed up)
newaccount74 [3 hidden]5 mins ago
I think so. The PostgreSQL logo is an elephant, and the community calls him "Slonik", probably derived from the Russian word for elephant. There is also a node.js driver that is called "slonik"
Symbiote [3 hidden]5 mins ago
I think you're describing versions before 9.2, which was released 12 years ago.
aorloff [3 hidden]5 mins ago
A shudder went through my body hearing that, I had forgotten all about Slony.
api [3 hidden]5 mins ago
I've always wondered why Postgres is so insanely popular. I mean it has some nice things like very powerful support for a very comprehensive subset of SQL functionality, but most apps don't need all that.
It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.
tpmoney [3 hidden]5 mins ago
> I've always wondered why Postgres is so insanely popular.
In no particular order, my preference for postgres is driven by:
* Date / time functions that don't suck
* UTF-8 is really UTF-8
* 99% of a backup can be done live with nothing more than rsyncing the data directory and the WAL files
* Really comprehensive documentation
* LTREE and fuzzy string match extensions
* Familiarity from using it for years
MySQL/Maria I'm sure is fine, but it's one of hose things where it's just different enough and I haven't encountered a compelling use case for changing my preference.
fhdsgbbcaA [3 hidden]5 mins ago
UTF-8 is what made me switch. It’s insane MySQL has something called UTF-8 that isn't really UTF-8, but do have a type UTF8MB4 that actually is correct. This means if you use UFT-8 in MySQL, you can’t use emoji for example.
homebrewer [3 hidden]5 mins ago
I won't defend that utf8 brain damage, but the defaults are sane since 2018 — you don't need to set the encoding, it's set to proper utf8 out of the box. MySQL 8 cleaned up a lot of this legacy stuff.
bastawhiz [3 hidden]5 mins ago
And the fact that adding real utf-8 support limited (limits?) the length of strings that can be indexed
evanelias [3 hidden]5 mins ago
Postgres limits btree keys to 2704 bytes, which is actually slightly smaller than MySQL's limit of 3072 bytes, assuming the default InnoDB storage engine.
That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).
bastawhiz [3 hidden]5 mins ago
> So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.
This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.
sgarland [3 hidden]5 mins ago
You should not be indexing 768 characters in any circumstance I can imagine. Go ahead and try it. Spin up two tables, fill them with a few million rows, and slap and index on them. Give one a reasonable prefix limit, and let the other go wild. Make sure you ANALYZE each, then run queries in a loop and check the times.
Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.
evanelias [3 hidden]5 mins ago
Why did you need to index fairly long strings in their entirety in a way that preserves collation behaviors?
And why is a 768 character limit woefully bad, but a 2704 character limit is totally fine?
bastawhiz [3 hidden]5 mins ago
A URL, for instance, can't be safely stored in 768 characters, but it can be stored safely in 2704. If you then wanted to sort those URLs so that all URLs for each domain and path within that domain are adjacent, you need an index. Especially if you want to paginate over them with a cursor. Doing that without an index on the raw value is a royal pain in the ass.
Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.
The_Colonel [3 hidden]5 mins ago
> it can be stored safely in 2704
No, it can't. URL doesn't have any length limit, regardless of the fact that different software will impose different limits.
evanelias [3 hidden]5 mins ago
> A URL, for instance
VARBINARY is typically ok for that I'd think? Then you can utilize the full 3072 byte limit for the key, since there's no character set applied.
> even just being able to sort user-submitted strings up to a kilobyte
As a software engineer, I completely agree. But as a DBA, I am obligated to make a "tsk" sound and scowl disapprovingly!
crazygringo [3 hidden]5 mins ago
To be honest, indexes aren't designed for that. They're meant for fast lookup of short identifiers. Things like people's names and product ID's. Not long URL's. It's not performant.
If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
Creating short numerical primary keys for long strings is a common database technique.
fweimer [3 hidden]5 mins ago
You might just load someone else's data, and the index is desirable in general for speeding up analytic queries. It's possible to work around that, of course. But depending on what you do, it can make writing efficient queries against the data more difficult. That's just a distraction because most of the time, those long columns won't matter anyway.
sgarland [3 hidden]5 mins ago
> This means if you use UFT-8 in MySQL, you can’t use emoji for example.
I for one have always viewed this as a perk.
fhdsgbbcaA [3 hidden]5 mins ago
A database that doesn’t give you back what you put into it is never a perk. It literally can’t handle storing and retrieving the data.
sgarland [3 hidden]5 mins ago
I don’t want to see emoji in my database. The customer is only right in matters of taste, not engineering.
ttfkam [3 hidden]5 mins ago
Uhh… not wanting to see emojis is a matter of taste, not engineering.
ttfkam [3 hidden]5 mins ago
Transactional DDL!
sgarland [3 hidden]5 mins ago
MySQL does have ON UPDATE for its DATETIME, though; something that Postgres inexplicably still lacks.
Maybe they're thinking of TIMESTAMP in MySQL, which IIRC would auto update its value on any update to the row. Which was useful for uodated_at like columns. Though I think they later limited it to only the first TIMESTAMP column in a table.
sgarland [3 hidden]5 mins ago
No, it works for both [0] types. The first TIMESTAMP thing you’re referring to is that if a specific variable isn’t set, the first TIMESTAMP column automatically gets auto updates applied on creation and update, unless you explicitly defined it to not. This was the default behavior in 5.7, but has since been changed.
* extremely thought-out and standardized approach to development: if a feature is included it generally works well in interaction with everything else
* syntax, semantics and performance are all very predictable
* great documentation
Regarding MySQL / MariaDB: MySQL optimized for performance first. Until 2010 the standard-storage-engine MyISAM didn't even support transactions.
PostgreSQL always focused on correctness and stability and then made sure everything performed.
arkh [3 hidden]5 mins ago
> * custom datatypes
Good in theory. But last time I checked the main libs to connect to pgsql, everything you get back from the database are strings. So you need something in your app to convert those strings to the equivalent data structures.
ttfkam [3 hidden]5 mins ago
You're thinking only in terms of application. Types in the db save storage space, allow for better validation than plain strings, can be correlated cleanly with other columns with the same type, etc.
Yes, more drivers and libraries should support the more expansive data type list, but even just within the database itself there are multiple advantages.
stickfigure [3 hidden]5 mins ago
What's the alternative? MySQL? No transactional DDL, immediate fail.
jes5199 [3 hidden]5 mins ago
I worked for a company that migrated from mysql to postgres, but then got big enough they wanted to hire fulltime database experts and ended up migrating back to mysql because it was easier to find talent
icedchai [3 hidden]5 mins ago
Ugh. I worked with MySQL earlier in my career (until about 10 years ago.) All the companies since have been Postgres. All my personal projects are Postgres. I can't imagine going back.
bastawhiz [3 hidden]5 mins ago
Dunno if that says much about Postgres, but it says a lot about the company
cosmotic [3 hidden]5 mins ago
It's not just DDL that isn't transactional, there's a whole bunch of other things that aren't. And they break the transactionality silently. It's like an obstical course where bumping into something might be fatal.
evanelias [3 hidden]5 mins ago
What specific non-DDL things are you referring to here?
Aside from DDL, the only other major ones are manipulating users/grants, manipulating replication, a small number of other administrative commands, and LOCK TABLES.
> It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.
How software "feels" is subjective. Can you be more specific?
dalyons [3 hidden]5 mins ago
It requires a ton of somewhat arcane maintenance at scale. Vacuum shenanigans, Index fragmentation requiring manual reindexing, Txid wraparounds. I like Postgres but it’s definitely way more work to maintain a large instance than mysql. MySQL just kinda works
arkh [3 hidden]5 mins ago
Having to tinker with pg_hba.conf files on the server so manage how users can connect.
paulryanrogers [3 hidden]5 mins ago
I'd agree that is annoying yet usually just a one off task, unless you really want different IP allowlists per user.
Tostino [3 hidden]5 mins ago
In complex environments it is not just a one off task. I dealt with it by automating my infrastructure with ansible, but without some tooling it sucks.
threeseed [3 hidden]5 mins ago
The command line experience is old school style i.e. to show tables.
\c database
\dt
Versus:
use database
show tables
georgyo [3 hidden]5 mins ago
I started with MySQL in 2006 for my personal projects, but what first won me over to psql was those commands.
Today I use CLIs like usql to interact with MySQL and SQLite so I can continue to use those commands.
At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
stephenr [3 hidden]5 mins ago
> At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
In MySQL either `?` or `help` or `\?` will show you the help...
rootusrootus [3 hidden]5 mins ago
I assume this is really what it comes down to. If psql added those verbose-but-descriptive commands a whole bunch of people comfortable with mysql would be a lot happier using postgres.
dventimi [3 hidden]5 mins ago
That's psql.
fhdsgbbcaA [3 hidden]5 mins ago
It’s also faster to type.
eYrKEC2 [3 hidden]5 mins ago
Not after you have to google, "What's the equivalent of `show tables` in postgres?", because the psql command names are completely arbitrary.
Symbiote [3 hidden]5 mins ago
They are clearly abbreviations.
\c is for connect.
\dt is for describe tables.
mxey [3 hidden]5 mins ago
\? shows the help
fhdsgbbcaA [3 hidden]5 mins ago
Which you need to do exactly once.
kalleboo [3 hidden]5 mins ago
I need to manually admin my database server maybe once every 2 years or so. Definitely not remembering them 2 years later.
Scramblejams [3 hidden]5 mins ago
> I've always wondered why Postgres is so insanely popular.
Just another anecdote: MySQL lost data for me (2004). I spent some time evaluating the projects and Postgres’ development process seemed much more mature — methodical, careful, and focused on correctness. Boring, which I loved.
I didn’t need whatever perf advantage MySQL had so I switched to Postgres and never looked back. And then the Oracle drama and Monty’s behavior around it — not saying he was wrong or right, but it was the opposite of boring — just reinforced my decision.
I like to play with new tech in various spots of the stack, but for filesystems and databases I go boring all the way.
paulryanrogers [3 hidden]5 mins ago
I've never lost data with PostgreSQL. MySQL had enough data loss bugs and foot guns that I ran into a few of them.
vbezhenar [3 hidden]5 mins ago
For me Postgres is 100% predictable and reliable. It's neither clunky nor arcane in my experience. I don't need to think about it, I just SQL it and that's about it. It quietly works in the background. At some scale there might be some issues, but there is always known path to solve things.
moogly [3 hidden]5 mins ago
> I've always wondered why Postgres is so insanely popular
Real answer: no licensing cost
DonHopkins [3 hidden]5 mins ago
Because it's not tainted and cursed by Oracle, like MySQL (and Oracle).
immibis [3 hidden]5 mins ago
That's what MariaDB is for, right? I'm surprised to hear people recommend the Oracle fork of MySQL (still called MySQL because they own the trademark) rather than the original project (now called MariaDB)
dwwoelfel [3 hidden]5 mins ago
Here's how we did it at OneGraph (RIP), where we not only upgraded versions without downtime, but we also moved hosting providers from GCP to Aurora without downtime.
1. Set up logical replication to a new database server. We used https://github.com/2ndQuadrant/pglogical, but maybe you don't need that any more with newer versions of postgres?
2. Flip a feature flag that pauses all database queries and wait for the queue of queries to complete.
3. Wait for the query queue to drain and for replication to catch up.
4. Flip a feature flag that switches the connection from the old db to the new db.
5. Flip the flag to resume queries.
It helped that we were written in OCaml. We had to write our own connection pooling, which meant that we had full control over the query queue. Not sure how you would do it with e.g. Java's Hikari, where the query queue and the connection settings are complected.
We also had no long-running queries, with a default timeout of 30 seconds.
It helped to over-provision servers during the migration, because any requests that came in while the migration was ongoing would have to wait for the migration to complete.
droobles [3 hidden]5 mins ago
That is awesome, I dream of being able to do zero downtime SQL migrations.
In that post we walk through all the steps we took to go from Postgres 11.9 to 15.3.
buro9 [3 hidden]5 mins ago
Upgrading isn't automatic.
Let me check what I'm on... brb... Postgres 14.
Because it's not automatic I leave it, I leave it until it's so unsupported that I must upgrade the whole system, then I build a new system with a new Postgres and I migrate the old to the new.
I want, so badly, for Postgres to just automatically update itself, that a new binary just works with the data directory of an old version, and that if required it does an in-place upgrade to those data files when it can (i.e. if it can detect the last version was the same major as the current version, upgrade the files transparently to the admin).
My databases are all backed up each night, and these are single server Postgres with no replication or other trickery, an automatic upgrade for a single-server Postgres should be possible.
As it's not done... I assume (incorrectly?) that there be dragons and risks, and I mitigate that by never upgrading, just waiting and migrating. Migrating puts all of the risk on me, human error, and I am definitely fallible, so I can best handle this risk by just not doing it until I must.
Last migration I performed was from Postgres 7 > Postgres 14 in October 2021... I guess I have quite a few years of Postgres 14 ahead of me still. I would take downtime, my systems aren't zero downtime critical, just give me a super simple one-command no-questions upgrade as I really dislike migrations.
ttfkam [3 hidden]5 mins ago
Unless you want support for the MERGE command (v15), bidirectional replication (v16), and some behind the scenes optimizations, you're not really missing anything.
Whether you go from v14 to v17 or v14 to v20, I doubt it'll make a difference in migration strategy. You've still got a fair amount of time before EOL. No need to stress or FOMO.
justinclift [3 hidden]5 mins ago
If you use Docker, then the pgautoupgrade project might be your kind of thing:
Thank you for this! I was looking for an "easy" way to upgrade a dev container DB.
Symbiote [3 hidden]5 mins ago
If you're using Debian or Ubuntu, investigate 'pg_upgradecluster'.
teddyh [3 hidden]5 mins ago
A full example procedure is documented in /usr/share/doc/postgresql-common/README.Debian.gz, under the heading “Default clusters and upgrading”.
pelagicAustral [3 hidden]5 mins ago
I feel like I've been on 14 for an eternity... everything is running smooth... I reckon I should upgrade a few of the servers... oh well..
olavgg [3 hidden]5 mins ago
The PostgreSQL team releases a data incompatible format every year. Do they really need to break the data format every year? In my opinion, the release model for PostgreSQL should be refactored. Breaking stuff is no fun for users, please stop doing that if you are a software developer. And if you have to, make it autonomous for the user.
ttfkam [3 hidden]5 mins ago
If you want performance to steadily increase rather than steadily decrease, the answer to your question is yes. That doesn't mean tools couldn't be better, but there are clear technical advantages to their on-disk format updates.
Havoc [3 hidden]5 mins ago
The risk/reward ratio of fucking with something that works perfectly fine as is is not great.
So for fresh installs yes but existing ones not so much
cedws [3 hidden]5 mins ago
That and capitalism doesn’t reward slow and steady, keeping things stable and well oiled. It rewards delivering shoddy features at break neck speed even if they need to be recalled after two weeks. That’s what Wall Street and the shareholders want. Hence why senior management rarely cares about addressing tech debt.
“Show me the incentive, I’ll show you the outcome”
justin_oaks [3 hidden]5 mins ago
My upgrade policy for everything:
Significant security vulnerability? Upgrade
Feature you need? Upgrade
All other reasons: Don't upgrade.
Upgrading takes effort and it is risky. The benefits must be worth the risks.
occz [3 hidden]5 mins ago
Upgrading when multiple versions behind is significantly more risky than doing it when the update is relatively fresh.
Additionally, actions done frequently are less risky than actions done rarely, since you develop skills in performing that action as an organization - see high deployment frequency as a strategy of managing deployment risk.
This adds up to continuous upgrading being the least risky option in aggregate.
ttfkam [3 hidden]5 mins ago
Upgrading from v11 to v16 is not materially different in Postgres from v14 to v16. Same tools. Same strategies.
enraged_camel [3 hidden]5 mins ago
We are planning to upgrade from 11 to 17 soon. Even thinking about it is giving me ulcers. Our infra provider said we actually need to upgrade to 13 first, and then to 17. They did not provide a reason.
kortilla [3 hidden]5 mins ago
Not if software regressions are the main concern.
hinkley [3 hidden]5 mins ago
Once your version doesn’t receive security fixes you’re one CERT advisory away from having your whole week pre-empted by an emergency upgrade.
I’ve been there with products that were still internal at the time. I can only imagine how much fun that is with a public product. But then I do have a pretty vivid imagination. We changed to periodic upgrades after that to avoid the obvious problem staring us in the face.
natmaka [3 hidden]5 mins ago
Suggestion: add "End of life (no more maintenance for this version)? Upgrade"
Gormo [3 hidden]5 mins ago
Why? If the implemented featureset meets your needs, and there are no unresolved bugs or security vulnerabilities relevant to your use cases, what further "maintenance" do you need?
abraham [3 hidden]5 mins ago
When a critical security patch comes out, you don't want to have to to do a major version upgrade to get it.
FearNotDaniel [3 hidden]5 mins ago
Because when the maintainers have stopped patching that version against all known security vulnerabilities, that doesn't stop the bad guys from looking for more vulnerabilities. When they find one, it will get exploited. So you either wake up to an email from Have I Been Pwned to say all your customer data has been exfiltrated [0], or (if you're lucky) you have a mad scramble to do that update before they get you.
[0] Probably including those passwords you didn't hash, and those credit card numbers you shouldn't be storing in the first place because, what the heck, it meets your needs.
throwaway918299 [3 hidden]5 mins ago
Here’s another reason to upgrade: your version is end of life and your cloud provider forced it.
Thank you Amazon!
mkesper [3 hidden]5 mins ago
Yes, this is actually a good thing and comes with warnings beforehand.
throwaway918299 [3 hidden]5 mins ago
I agree. It helped me completely bypass any discussion from management about “not high enough priority”. Amazon definitely did me a favour in many ways.
Gigachad [3 hidden]5 mins ago
Eventually you get forced to update it when the other stuff you use starts having minimum version requirements.
c0balt [3 hidden]5 mins ago
Ancient version like 15 or 16? I have recently encountered a PostgreSQL 12 instance.
Once the database runs you are inclined to let it keep on running until you must do an upgrade (Bugfixes, EOS) . Upgrading for new features/ improved performance is nice but can be a pain and is prone to cause downtime. PostgreSQLs upgrade tolling story is also a bit painful but it has proven to be an awesome way to semi-regularly test my backup infrastructure.
chasil [3 hidden]5 mins ago
In Oracle, ALTER TABLE MOVE in 8i was a godsend, finally enabling a table reorganization without export/import.
My timid management forbade an upgrade from Oracle 7.3.4 until 2013. It was agony to remain on that museum piece for as long as we did.
I am upgrade-minded, but my management is not. I always lose.
I am retiring in two years. I will not miss their problems, not at all.
Edit: Oracle 10g was the last release that (for us) brought must-have features. Sure, upgrading to 19 or 23 would be great, but it doesn't bring anything that I really want.
Narkov [3 hidden]5 mins ago
> Postgres 17.0 has been out for a bit
From the same blog:
> Sep 26, 2024 - Postgres 17 is Now Available
3 weeks....for a new major release...and we are asking ourselves why people haven't updated?
jeltz [3 hidden]5 mins ago
Yeah, upgrading to PostgreSQL 17 now would be weird unless you have some very specific feature you need in it and spent resources testing your application on the betas and rcs.
majewsky [3 hidden]5 mins ago
My team has upgraded several dozen databases from 16.x to 17.3. Went entirely smoothly. The thing is that we're running on a process of upgrading all dependencies every Friday, and then promoting to prod on Monday unless there are specific issues, so our definition of "would be weird" is the reverse from what you say.
(Granted, we have rather small DBs and simple applications where ON UPDATE SKIP LOCKED is about the most fancy feature we use.)
webprofusion [3 hidden]5 mins ago
Lol, try upgrading old MongoDB stuff.
Database engines (every single one) are notorious for incompatibilities between major versions, upgrading mission critical stuff means updating and re-testing entire applications, which in some cases can be a multi-million dollar process, before going into production.
Even if you deeply know/think that there's no problem upgrading, if something does fail in production after an upgrade and it's mission critical..
x-_-x [3 hidden]5 mins ago
This. I was tasked with upgrading Postgresql from a very old version (I think 9?) to one that was still supported a couple of years ago. Backwards compatibility is paramount and from my experience upgrading MySQL/MariaDB I know that changes in versions can break it.
For this reason, I chose to upgrade to version 11 because it was only a couple of versions apart and still had repositories available at the time.
So the first thing I do is stop the VM and take a snapshot. Then I start it back up and go check for database corruption before I dump them... wait there's no utility to check for corruption...? Yep that's right! You basically have to YOLO the whole thing and hope it works. OK...
So I dump the databases and back up the directory. I shut down the old version and then install the new version from the repo. I start to import the databases and notice in the scrolling logs that there's some incompatibility... Oh F*$&. I google the error and spend a good hour trying to figure it out. Apparently there are external plugins for postgres that were installed in the old version. I search for the plugins online and they are long discontinued. OK, so let's just copy them over from the old version. I stop postgres, copy them over, and start it back up. It starts up ok. I reimport the databases and no more errors. Yay! I start the applicationsand pray to the SQL gods. So far so good, everything seems to work.
Thankfully the applications tested well and worked post upgrade.
All of this was done for a critical multi-million dollar healthcare platform. There were no official guides. Nothing. I had to find some random admin's blog for guidance. MySQL on the other hand has full documentation on just about every aspect of everything. The whole process was super hackish and not having any way to check database integrity would have been a show stopper for me had I designed this configuration.
Netcob [3 hidden]5 mins ago
My personal reason: While I haven't had to deal with a Postgres update at work yet, I've been running a pretty large Postgres 12 database in my homelab for a few years now.
My homelab projects mostly center around a "everything is an MQTT message" idea. Zigbee sensors, Tasmota power readings, OwnTracks locations, surveillance camera events, motion sensors for light switches, currently active app on my PC, status of my 3D printer, whatever my vacuum robots are up to and so on. It all gets recorded into a Postgres db. From there I can use it for data mining experiments, but mostly as a source for Grafana. I tried counting the rows but that query didn't even complete while I was writing this comment.
I like trying out all kinds of dockerized oss services, and I keep them updated using watchtower. I run a gitlab instance which is usually the most annoying service to update because it there's an upgrade path and post-start-migrations. With my Postgres instance, which is isolated from the internet, I'll have to figure out what the fastest way is to move all that data around, not leave a huge gap in the record and so on. Sounds like at least a day of work - and since it's technically all for "fun", it'll have to wait until it actually is that.
Symbiote [3 hidden]5 mins ago
A good approach for this is to use pg_upgrade in-place, which should give you a downtime of a few minutes at most. (I have 800GB at work and would expect 1-2 minutes for this.)
I recommend installing PG12 on a temporary VM, duplicating the existing database, and test the upgrade in isolation.
A more complicated approach uses replication, and upgrades the standby server before promoting it to the primary server.
Netcob [3 hidden]5 mins ago
Thank you! Looks like the best way to do this.
And since I have backups, I might not even need the testing step, considering the low risk. Might do it anyway just out of curiosity at how long it would take to duplicate.
yen223 [3 hidden]5 mins ago
Databases tend to be "stickier" than other parts of any large software system. Largely because database migrations are costly. You can't just tear down an old database and rebuild a new one, you have to figure out how to move all that data across too.
The consequence is that things in database-land tends to move slower than other types of software. This I think is the major reason why we still use SQL.
jart [3 hidden]5 mins ago
Have postgres updates actually been requiring users do migrations? Or is this just a fear that something will go wrong?
magicalhippo [3 hidden]5 mins ago
From what I can gather, yes[1]:
A dump/reload of the database or use of the pg_upgrade application is required for major upgrades.
Now, seems one can mostly use pg_upgrade, which only rewrites the system tables[2] so is fairly quick. But if on-disk format has changed it seems you're forced to dump and reload.
At work we've mainly been using SQLAnywhere, which would just disable new functionality for databases using old on-disk format. So upgrading major versions has usually been fairly painless and quick.
Well if it's self-hosted you have to do it yourself. You can either backup your databases from the old version and restore it to the new version once installed, or you can use pg_upgrade to upgrade/copy a old version data directory to the new version.
I don't think this is done automatically when you simply install a new postgres version, but I'm not certain of that.
polishdude20 [3 hidden]5 mins ago
What's the SQL alternative?
aitchnyu [3 hidden]5 mins ago
Hopefully Edgedb. It could make most ORMs and heavyweight web frameworks redundant, maybe some BI tools and jobs too.
JSON - er JSON-based document storage - documents with unique identifiers. and the ability to define and set schemas for the JSON, and ... we're back to a relational database
ogogmad [3 hidden]5 mins ago
I think he was talking about the query language.
KronisLV [3 hidden]5 mins ago
In many orgs out there, the version that's picked when a project is started will stick around for a while.
Suppose you join a project and see that the PostgreSQL version used is pretty old.
Do you:
A) convince people to migrate it for some nebulous benefits (features that aren't currently used, performance improvements that will be difficult to measure in lieu of tooling and aggregated statistics) while also taking on the risks of breaking everything or even worse, data loss (since you won't always be able to provision multiple instances and backup restore might take a while and you could still mess that up)
B) or just leave it as it is and focus on anything else
Many will prefer to not get burned even if there shouldn't be that many risks with upgrading your average PostgreSQL install, which is why you'll get minor/patch releases as best, alongside whatever the runtime environment is getting upgraded.
Containers and bind mounts make all of this way easier, but then again, many places don't use containers.
gonzo41 [3 hidden]5 mins ago
So i've got a small but important app that I run on PG14 via a container. I literally just put the pgdata dir outside the container and I can keep the host OS totally fine and have an isolated db environment. it's a very nice way to not worry about the db.
KronisLV [3 hidden]5 mins ago
Some might disagree (believing that the system package manager should be responsible for the PostgreSQL install), but I agree that the approach you’re using is one of the better ones.
That way you can update the database version when you want, while still installing the base system updates quite frequently. Plus, I think it’s really nice to separate the runtime from the persistent data, which such setups make obvious.
nomercy400 [3 hidden]5 mins ago
For me it is a data vs code thing.
If I run my application/code v1 right now, I generate data. I expect that if I move to application/code v2, I can leave my data in place and it will automatically apply changes to my data.
I do not get that with postgres. If I am on postgres 16, and I want to upgrade to postgres 17, I want to leave my data folder untouched. When I then start postgres 17, it should just work (tm).
It should also work with code that assumes postgres 16, so I can upgrade my database separate from my application. I can not wait 10 days for a large database to be migrated from 16 to 17 without being able to run it. However, I can wait 10 days before updating my code to support features in 17.
The current upgrade process does not give me such confidence in restoring data and uptime. So I don't upgrade until I really have to.
wiz21c [3 hidden]5 mins ago
Maybe they don't upgrade because currently installed PG instances are enough? Meaning that Oracle still holds the "big" deployments ?
cpburns2009 [3 hidden]5 mins ago
Why? Because it's risky when a lot of business critical data and processes rely on it. I'm just happy I was able to convince my boss to let me upgrade our 9.1 instance to 14 two years ago. Now to upgrade those two pesky 9.4 instances.
Postgres is the only thing on my Debian that doesn't seamlessly automatically upgrade across dist-upgrades, but instead leaves old versions around for me to deal with manually... which I seem to never get around to.
That way you can have multiple versions of the same package.
xpasky [3 hidden]5 mins ago
You mean there is some other way to install postgresql in Debian?
dboreham [3 hidden]5 mins ago
Quick note that the article is talking about why folks are on versions of PG that are not ancient (14, 15).
craigds [3 hidden]5 mins ago
right? we're on 13 and only now starting to consider upgrading to 16, and I don't think we're very abnormal and I don't consider 13 ancient. We have customers still using 9.3. (That last one does actually count as ancient.)
briffle [3 hidden]5 mins ago
Same exact boat. Logical replication is nicer than our 'in place' upgrades we did before, but still leaves lots of other issues.
We are constantly making changes to our schemas (adding tables, columns, etc). Its never an issue on physical standby's, it just gets created, but logical replication, we have to manually run the changes on the subscriber.
We have lots of instances where we create a new table for a feature coming, and alter another to add a column.
If those get missed on the logical subscriber, you don't even know until someone tries to write data to that new table or new column.
I know logical is supposed to be flexible, but I wish there was a setting to have a replica via logical, so I can handle upgrades easier.
Apreche [3 hidden]5 mins ago
Because upgrading is a lot of work, and is higher risk than upgrading other software.
kevin_thibedeau [3 hidden]5 mins ago
Seems like a massive design fail if they can't maintain backwards compatability and provide a safe, low friction upgrade process.
ggregoire [3 hidden]5 mins ago
I think it's more about avoiding downtime (I just upgraded a pg with 1TB of data from v11 to v16 and I didn't notice any breaking changes). In an ideal world, every client of the DB should be able to handle the case where the DB is down and patiently wait for the DB to come back to keep doing its job. But from my experience, it's rarely the case, there is always at least 1 micro service running somewhere in the cloud that everybody forgot about that will just crash if the DB is down, which could mean losing data.
voidfunc [3 hidden]5 mins ago
95% of projects are design fails in this regard... few projects think about Day 2 Ops
forinti [3 hidden]5 mins ago
I find that upgrading Postgresql is really easy.
Testing all the apps that use it, not so much.
Ozzie_osman [3 hidden]5 mins ago
If you're on something like RDS, major version upgrades are pretty easy with Blue Green deployments. You can do it with just seconds of downtime and pretty low risk because it uses replication under the hood and handles the cutover for you.
vivzkestrel [3 hidden]5 mins ago
i ll tell you why from my end. I installed Postgres14 via homebrew many years ago on my Apple M1 mac mini. I searched a lot on how to "upgrade" this installation but found nothing. I have a few databases running with data on it which I can't afford to lose if something goes down for more than 1 hour. I wish someone would guide me on how to actually install a newer postgres such as v17 without breaking an existing v14 install or losing data since I am not an expert by any means
sofixa [3 hidden]5 mins ago
And this, kids, is a good example of why homebrew is not a real package manager, and why macOS is not the best at running production services.
I hope you have a backup somewhere else, not on the Mac Mini. One option would be to restore it on another machine on Postgres 14, follow the upgrade path to 17 (idk if you can jump directly but a few Google searches will cover this), verify the data, back up the new db, somehow upgrade Posgtres on your Mac, and restore the new back up. Done.
mxey [3 hidden]5 mins ago
I don’t see how this is homebrew‘s fault? Homebrew lets you install multiple versions of PostgreSQL at once, which you need for pg_upgrade, just like you can do with APT
brew install postgresql@17
thrdbndndn [3 hidden]5 mins ago
Forget about something as major as Postgres, I have trouble updating packages (that aren't install via pip/npm/cargo) on Linux all the time as a newbie. The experience is worse than Windows for some reason.
Hell, I have a hard time to tell the version of some system build-in binaries.
A few months ago, I have trouble to unzip a file which turns out ot be AES-encrypted. Some answers on SO [1] saying I should update my `unzip` to newer version but I can't find any updates for my distro, and I have no idea (still no, so feel free to teach me) to update it manually to make my `unzip` supporting AES. And all the versions, the good and the bad, all say they're "version 6.0.0" despite they behavior obviously differently.
> I have trouble updating packages (that aren't install via pip/npm/cargo) on Linux all the time as a newbie. The experience is worse than Windows for some reason
If you haven't installed them via your programming language's package manager, you either installed them manually or via the OS package manager. The first one you'd know how to upgrade, and for the second you can ask it what version it is and what version is available to upgrade to (for compatibility reasons it might not be the latest, or latest major, unless you use the software vendor's own package manager repositories).
It's actually much easier than in Windows, because you have a piece of software (package manager) that is your one stop shop to knowing what is installed, update it, check versions available, etc. unless you've manually installed stuff.
In Windows you... google and download random
.exes? Cool. As good as the worst possible option on Linux.
reshlo [3 hidden]5 mins ago
> In Windows you… google and download random .exes?
Windows has an official package manager now too.
sofixa [3 hidden]5 mins ago
And there's Microsoft provided and supported software in it?
Not exactly random. It's not hard to tell which website is official 7-zip website. Also choco and scoop exist on Windows.
> As good as the worst possible option on Linux.
I understand this is not a fair comparison, but in practice, they're not as easy. When using Windows, I usually use it with a proper GUI interface, so popping up a browser and download the newest installer for a software from their official website would take me less than 1 min.
Doing similar for my Linux VPS with only a terminal is much more complicated.
KronisLV [3 hidden]5 mins ago
> Doing similar for my Linux VPS with only a terminal is much more complicated.
Of course, if the package you need isn't available in the standard repos, then you'll need to look elsewhere (e.g. PPAs or third party repos). There's also options like Flatpak and AppImage if you want something that's a bit closer to how you'd choose to install new releases on Windows.
If I wanted to update all of the installed software I have on my Windows install, there'd basically be no way for me to do this, outside of shady update manager software.
At the same time, I get the appeal of being able to just download a new release and install it, both AppImage on Linux and the way you install software on macOS (just drag the file into Applications) seem similarly pleasant to me in that regard.
To expand on the latter (the response got deleted), you can very much do something like https://peazip.github.io/peazip-linux.html (I wish 7-Zip was available on Linux natively, but as far as GUI software goes, PeaZip is pretty nice) however that's not the most common approach. You should generally prefer using the package manager when you can.
thrdbndndn [3 hidden]5 mins ago
Sorry I wasn't very clear, it's totally on me.
On average, the experience of upgrading/managing packages is obviously much better than Windows.
I meant to say in certain cases (like the `unzip` example I mentioned above), when the system's build-in package manager fails, I seem to not be able to find alternatives like what I did on Windows (just find the piece of binary I want and manually install it). I to this day still can't find a way to update `unzip` to a version that supports AES on my Debian VPS.
MaKey [3 hidden]5 mins ago
> I to this day still can't find a way to update `unzip` to a version that supports AES on my Debian VPS.
The author in this answer clearly has a version of unzip that can detect "AES_WG". Unfortunately they only vaguely said (in one of the comment) "Since then the main Linux distros have added patches to fix various issues" and didn't specify which distro.
MaKey [3 hidden]5 mins ago
> Doing similar for my Linux VPS with only a terminal is much more complicated.
sudo apt-get install p7zip-full
thrdbndndn [3 hidden]5 mins ago
I'm replying to the "the worst possible option on Linux", i.e. when the said software is not available in package manager. 7-zip is just a (bad) example; since you can install 7-zip using `choco install 7zip.install` on Windows too.
I meant to say when you can't find the software you want in package manager, it's easier to download it manually and install it on Windows than (again, unfair comparison) a terminal-only Linux server.
MaKey [3 hidden]5 mins ago
> I meant to say when you can't find the software you want in package manager, it's easier to download it manually and install it on Windows than (again, unfair comparison) a terminal-only Linux server.
In that case you would just copy the download link and paste it into your terminal session. It's rarely needed though as most software is available through your distribution's software repositories.
reshlo [3 hidden]5 mins ago
> choco and scoop
And winget.
Klonoar [3 hidden]5 mins ago
This kind of sounds like a Homebrew issue…?
I’ve run multiple variants of PostgreSQL on my MacBook under MacPorts. Can’t even remember the last time I thought about it, to be honest.
Any db upgrade is a crapshoot. You need to test -everything-, and even then you may not catch things that run occasionally.
mebcitto [3 hidden]5 mins ago
If there is anyone from Neon watching this thread, is there a way to suggest updates to the pgversions website? It currently pins Xata to 15.5, which is true for the shared cluster environment depending on the region, but one can start dedicated clusters up to 16.4 at the moment.
arend321 [3 hidden]5 mins ago
I guess I'm lucky and 10 minutes of downtime is not a problem for my customers during off-peak hours, every year or so. Upgrading has always been a painless experience with just a pg_dump and pg_restore on the upgraded db server.
smellybigbelly [3 hidden]5 mins ago
I wouldn’t upgrade major versions until the x.2 version is out. New major versions come with new bugs and I’d like to wait two minor versions until I start using the image in production.
p5v [3 hidden]5 mins ago
It’s always a trade-off of how much you’ll get from an upgrade, vs. how much time, effort, and pain you’ll have to invest to do the switch. Postgres is at a stage, where a single version can easily outlive the duration of the software you’ve built with it. Let’s be honest here, a vast majority of software doesn’t need a fraction of the requirements we think it does, frankly, because no one uses it to the extent that latest and greatest features would make the experience more pleasant.
TomK32 [3 hidden]5 mins ago
Ha, I run arch on my dev machine and they just LOVE upgrading postgres. t has always been a pain until my projects all got their postgres in a docker container. I just don't like to do extra steps for the upgrade process between major versions.
Hawxy [3 hidden]5 mins ago
As a counterpoint a lot of managed postgres providers offer automatic upgrades. Upgrading Aurora Postgres is very easy and we just schedule it in once or twice a year.
ZiiS [3 hidden]5 mins ago
Because they already work great, you are unlikly to get forced to upgrade, and they are the part of your system doing the most critical work under the heviest load.
redslazer [3 hidden]5 mins ago
Funnily enough neon does not offer an easy way to upgrade database compared to other managed database hosting.
andrenotgiant [3 hidden]5 mins ago
(Neon employee) We auto-upgrade minor versions as long as they can be done autonomously. For major versions, you're right it's still manual but we're working on improving that. Here is our version policy: https://neon.tech/docs/postgresql/postgres-version-policy
liampulles [3 hidden]5 mins ago
Our team uses AWS managed postgres, and the ease of operations and upgrades really alleviates stress for us. Managed DBs are worth the cost.
yobert [3 hidden]5 mins ago
I have a large production deployment that is still on 9.6 because the software depends on table inheritance. (Oh man!)
mnahkies [3 hidden]5 mins ago
Unless I'm mistaken table inheritance is still a thing in current PG versions, in terms of partitioning at least it's just less commonly used in favour of declarative partitioning since it's easier to manage.
It's been a long time since I worked with v9.x in anger, so I could well be forgetting things though
Our problem isn't the inheritance feature itself-- it's the removal of the config knob `sql_inheritance` which lets you change whether inherited tables are included in queries by default or not (behavior controlled manually by the * notation). It's a goofy behavior nobody liked, but we have a mountain of SQL that depends on that knob. It's fixable... Just a pain :)
forinti [3 hidden]5 mins ago
For a while I had a separate version 9 cluster because of one application.
But it wasn't really a nuisance. Postgresql doesn't give a DBA much to worry about.
phendrenad2 [3 hidden]5 mins ago
Why don't people upgrade? Why don't we ask why upgrading is necessary? I understand that Postgres is free software, and if you choose to use it, you're signing up for whatever upgrade cadence the developers give you. But really, I with the developers would spend a bit more time patching "old" versions. 6 years is really too short a time to have to do an upgrade, especially if you have many database servers.
pjmlp [3 hidden]5 mins ago
For the same reason lots of companies are stuck on Java 8, .NET Framework, C++98 / C++11, C89,.....
lousken [3 hidden]5 mins ago
pg15 is not ancient at all and it has been a part of latest debian hence I don't see an issue. Same with python or any other dependency. You rarely need the newest shiny these days and being within the support window is fine
DonnyV [3 hidden]5 mins ago
One of the many things I love about Mongodb is the upgrade process. Shutdown server, delete old exes, copy new ones and start server. Done
Any needed updates to databases are done by the new exe, automatically.
Obviously backup everything before you do the upgrade.
I've never had one fail.
erik_seaberg [3 hidden]5 mins ago
If PostgreSQL has replication, why are they talking about "minimal" downtime? Is there no quorum strategy that delivers high availability? I don't know as much as I should.
bastawhiz [3 hidden]5 mins ago
Writes happen on your primary. At some point, you need to stop accepting writes, wait for the replica to fully catch up, reverse the replication so the replica is the new primary, then direct writes to the new primary. That's hard to do without any downtime.
There's no option where the nodes all accept writes.
That still only has one primary: you can't just start pointing writes at the upgraded standbys. Synchronous replication (besides having its own downsides) just shortens the time needed for the replicas to catch up. You still need to perform a cutover.
chistev [3 hidden]5 mins ago
Use MySQL
sublinear [3 hidden]5 mins ago
corporate friction
linuxandrew [3 hidden]5 mins ago
Exactly this. At my old employer we (the sysadmins/operational DBAs) were pushing for newer, supported versions of Postgres. The push back was always the regression testing cycle, amount of desired change and willingness of the product owner to even engage. The testing cycle was so long that I tried to convince them to test on Postgres beta, because it would well and truly be prod with a few bugfix releases by the time testing was completed (alas, they went with n-1 release of Postgres instead).
ldjkfkdsjnv [3 hidden]5 mins ago
Honestly, I've aside from React and Java (8 -> 21 is big but still not that big), there's very little software that I updated and noticed a major step change difference in the system. Once it works, its fine
ggregoire [3 hidden]5 mins ago
Postgres and mysql usually have changes in each new version that are important enough to motivate an upgrade, whatever it is new features or better performance or both. Although it really depends if your are using the features they are improving or not (e.g. if you don't use partitions, well of course that 30% perf improvement on write operations on partitions won't benefit you).
You can check this article about Uber migrating its Mysql from v5 to v8 posted here 3 days ago [1]. Among other things, they observed a "~94% reduction in overall database lock time." The before/after graph is pretty impressive. It also gave them window functions and better JSON support, which are two very big features.
I've always found it fascinating that there is a vocal contingent at HN that seems to legitimately hate advertising. But then an article like this turns up that is obvious advertising and is also a good article - we get a nice summary of what the major performance features over different postgres versions are, and some interesting case studies (I'd never even heard of the NOT VALID option although apparently it is nothing new).
esperent [3 hidden]5 mins ago
This is something I've heard called "permission marketing". The idea is that you show genuinely useful ads to only the few people who will benefit from them, rather than indiscriminately blasting millions of innocent bystanders. Then these few people will actually welcome your marketing efforts.
The classic example is advertising a new improved fishing reel in a fishing magazine. People buy the magazine (well, 20 years ago they did) because they want to know about things like new improved fishing reels.
It's a world away from the overwhelming avalanche of bullshit that is modern advertising/spam. There's nothing at all weird about hating advertising in general but being ok with permission marketing.
If you follow this idea further you'll find that very few people, even the most vocal, genuinely hate advertising. We all want to know about useful products and services. We just don't want to see a million ads a day for Apple, Coke, Pepsi, Nike, erectile dysfunction, fake single women in your area, Nigerian princes...
Because when it reaches a certain scale, and when too many psychological tricks are being played, and everything is always, BRIGHT, BIG, hyper-sexualized, when you can't walk down any street, watch anything, read anything, without seeing people richer, smarter, younger, sexier, happier than you, it goes far beyond just advertising. It's brainwashing. It has to stop because it's extremely unhealthy for our societies, our mental health, our children.
shiroiushi [3 hidden]5 mins ago
Well said. If I'm reading about fishing, for instance, an ad for a new piece of fishing gear would not be too annoying, as long as it isn't too intrusive (like popping up in the middle of my reading).
But when I'm watching a YouTube video, having the video cut mid-sentence to some hyper-annoying and unrelated noisy ad simply angers me and makes me look for an ad-blocker.
OJFord [3 hidden]5 mins ago
I rarely see much objection to contentful 'advertising' like this. Anyway, the answer really is that it's fully handled by submission/voting/flagging mechanisms, doesn't matter what anyone might say.
jart [3 hidden]5 mins ago
Yes but Neon databases is a funder of Postgres development. So I'm interested in hearing what they have to say. If they're advertising then I think helping open source is the right way to go about it. To me it sounds like they just want to make sure people benefit from all the money they're spending.
TacticalCoder [3 hidden]5 mins ago
> Postgres 17.0 has been out for a bit and ...
No. It's been released in September 2024. That's not "quite a bit".
Now as to why people aren't all on 17 and not even on 16 yet, here's an acronym for you: LTS [1]
Debian 11 Bullseye is the current LTS. It came out in 2021.
Weird that the maybe-AI-generated image of a column says "Postgres 13" on it when the article talks about Postgres 17.
codetrotter [3 hidden]5 mins ago
Seems perfectly reasonable to me. The article is about people not upgrading from older versions. One could imagine that PostgreSQL 13.0 is the “pillar” of some company, that their whole system relies upon. The article then goes into detail on what they are missing out on by not upgrading PostgreSQL to a more recent major version, and why it might be that so many stay on ancient versions, and also how you can actually perform major version upgrades of PostgreSQL.
FearNotDaniel [3 hidden]5 mins ago
Yeah, I mean it's not hugely imaginative, but it kind of makes sense, if you want to describe pg 13 as "ancient" that you use some kind of Greek/Roman temple as a visual analogy for that. Not particularly well executed - a serif font might have helped to drive the "joke" home - but nobody's trying to win an art contest here.
0xbadcafebee [3 hidden]5 mins ago
Because we don't really like it to begin with.
The RDBMS has not fundamentally changed much since the 80's. It's the same basic design inside and out, with a shitload of tweaks and optimizations. Don't get me wrong - you can get very far with tweaks and optimizations. But the foundations root you to a particular model, and some things in that model will always be painful.
The important question to me isn't why don't people upgrade. It's why do people run Postgres?
Is there no other kind of database? No, there are hundreds of different databases. Is it because Postgres does things fundamentally different and better than anything else? No, lots of things work somewhat like Postgres, with pros and cons. Is it because you can't do things without Postgres? No, there are always alternatives that you can make work. Is it because it's impossible for it to keep working without upgrading? No, any database with long-term support could continue working without upgrading.
So why use Postgres at all? Answer: it's the incumbency, stupid.
When a big fucking honking giant "thing" sucks up all the air in the room, provides for most of the use cases, and is accepted by all the users, then it's extremely hard to justify not using it. Incumbents usually win, even if they're old, disreputable, buggy, and annoying. Even if they're missing some obvious features other things have. Even if people loudly complain that they want change. It's just more annoying not to use them.
We're used to them. We've adapted. Trying to do something else is going to be annoying and hard. So we stick to what we know. But that doesn't mean we have to stroke the balls while working the shaft. Upgrading isn't exactly fun. It doesn't even buy us anything, other than the promise of "support", or "compatibility" with whatever else will eventually require it. So we upgrade, eventually, when we must.
But the constant mind-numbing march away from entropy isn't a fantastic reason to keep using the thing. When you have to convince yourself to stick with it, it's likely you're in an toxic relationship. If you're honest with yourself, you'll agree that it's time for a change. You deserve better.
But switching is full of unknowns. Ominous. Time-consuming. Hard. You know you want to, maybe even need to. But there's too much to lose. So you stick to the familiar, to what you can handle. Maybe something better will come down the pike soon. You even tell yourself you're lucky to be here. But deep down you know you're just comforting yourself. You wish you had something else. Something that brings you joy. Something better. Something... right. But that something isn't here right now. So until Mr. Right gets here, you'll stick with Mr. Right Now.
Time to get ready for the next upgrade..... sigh...
ttfkam [3 hidden]5 mins ago
You think people choose Postgres because it's the incumbent? That's an… interesting take.
I say this as someone who absolutely loves using it, but the actual process of upgrading Postgres is something that takes significant downtime, is error-prone and you're often better off just dumping all the databases and reimporting them in a new folder. (A good idea in general since it'll vacuum and compact indexes as well if you do it, combining a few maintenance routines in your upgrade path.)
It requires having the previous version of Postgres installed, something which can mess with a number of distro policies (not to mention docker, which is the most popular way to deploy software that will typically rely on Postgres), and unlike most software with that issue, Postgres is software you want to be managed by your distro.
Therefore, most people only upgrade by necessity - when their distro forces the upgrade or the version they're using reaches EOL.
* Upgrades
* Database Protocol
MS SQL Server does better on both of these fronts: Upgrades, when running MS SQL Server in a container, is rebooting the database with the new database engine version. The DBMS sees the old files, upgrades them in place and starts the DB completely seamlessly 100% of the time.
The MS SQL Server protocol, TDS, supports native named parameters, executes the T-SQL PL inline with SQL. TDS also supports inline query cancellations because the protocol is framed, and both T-SQL and TDS supports returning multiple result sets from a single query.
But yes, whenever I see PG docs saying, major versions require an export and re-load, I shudder.
I also really miss their query optimizer. It was just so damn smart.
Oracle's DBUA is so much more painful.
However, I agree with OP that SQL Server does have a solid upgrade story and it's generally pain free.
They are just two aspects I really wish Postgres could do better on, and I miss working with those parts of SQL Server.
https://apt.postgresql.org/
https://yum.postgresql.org/
But obviously this is not suitable for all use cases. I don't know why pg doesn't invest in this sector.
Every bit of the experience before getting it working, feels like it was designed by an MBA who owns a Postgres hosting service.
Other than that, it's amazing.
If you don't want to use pg_upgrade, you can dump from one cluster and pipe directly into another, with no need for a temporary file. It couldn't be easier.
Can't say the same for Oracle. Even applying a patch can ruin several weekends.
If you have a trivial database, yes.
If you have a large, complicated database that takes hours/days to restore and has an HA SLA, very hell no.
I keep an eye out for vulnerabilities, and that's about it.
1. Leave your old database running
2. Start your new database with new version
3. Run logical replication and let it catch up
4. Decide you want to switch over.
5. Using a script: Stop your apps writing to the database (or switch to read only mode), let logical replication catch up one final time, stop logical replication, run a procedure to fix your sequences, and switch your apps over to the new database.
You are now done. Sure, it requires a one-time investment to work out the exact steps needed for your situation, but it's easy to do with minimal downtime.
Of course if you don't care about downtime just shut down your database server, make a backup(!), and then do pg_upgrade.
I think this is the right answer for 99% of companies. Yes, everyone wants zero downtime, but I see incredibly few cases where it is actually worth the cost and effort.
Why?
For production systems I generally try to avoid using anything not built internally. When it comes to PostgreSQL, for example, if you want to submit benchmarks, or you want to report bugs, an important or sometimes a mandatory component is to provide compilation flags. Also, seeing how every kind of storage can act in a different way, tuning of higher-level storage programs seems inevitable. Even though PostgreSQL has a lot of configurations that can be done to the compiled program, some (eg. block size) are compile-time that can only be changed by compiling the program.
So does using the vendor repos:
* https://apt.postgresql.org/ / https://yum.postgresql.org
* https://dev.mysql.com/downloads/repo/
This way when you upgrade your OS you don't have to worry about suddenly getting a new version (which, in the case of MySQL, may take a long time to convert its database files to the new version format).
At least on Ubuntu (and likely Debian), your existing Postgres version is never dropped until you are ready to manually upgrade yourself.
One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level. Apparently us lowly sysadmins may have had a point after all.
This is a bit tongue in cheek but really not far from my lived reality. When the focus is on features and “correctness” at the near total expense of sane systems tooling folks can develop some myopia on the subject. So many arguments with devs on my teams over this subject that were utterly horrified to find we were running MySQL for a given service.
Open source projects tend to fix the pain points its contributors experience, and I assume there were not too many contributors wanting to deal with the boring work of making administration and easy task - it’s thankless “sideways” work that won’t result in many accolades or personal satisfaction for most SWEs.
The end users are almost always developers, most of whose experiences in production entail either the equivalent of a docker container level scale system, or are simply given a connection string and the rest is a black box to them. Under those contexts I’d personally prefer Postgres as well and it wouldn’t even be close. When you get into backups, clustering, upgrades, and high availability under extreme load? IMO the story falls apart real fast.
PostgreSQL has prioritized correctness and stability which while it has made certain features more clunky to use or taken longer time for them to be implemented, when you get paged n the middle of the night you get the time you spent back. PostgreSQL also has spent a lot of time on improving DBA experience so mixed DBA/sysadmins also usually prefer PG.
I’ve seen this as well, but when pressed, none of them could articulate what part of its feature set they actually needed to use.
> One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level.
Are you (or your devs, rather) actually running your own DBs? If so, respect. My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.
Transactional DDL: migration errors never leave the database in an intermediate/inconsistent state.
Range types + exclusion constraint: just no way to do this in MySQL without introducing a race condition.
Writeable CTEs: creating insert/update/delete pipelines over multiple tables deterministically. Seriously though, the RETURNING clause is something I use all the time both in and out of CTEs.
Filtered aggregates and grouping sets: cleanly get multiple data points for a dashboard in one shot.
Unnest: converting arrays into a set of rows. Inverse of array_agg(...).
Types: arrays, booleans, IP/subnets, UUIDs (without binary(16) hacks), etc.
Materialized views: seriously, how does MySQL not have this yet?
Statement-level triggers: another option from per-row.
Row-level security: setting data visibility based on configurable policies.
I can cite specific use cases I've deployed to production for each of these and more.
JSON can often be used in place of arrays, and JSON_TABLE in both MySQL and MariaDB converts JSON into tabular data. MySQL supports multi-valued indexes over JSON, where each row can have multiple index entries (or no entries, e.g. partial index).
MariaDB has built-in convenience types for ipv4, ipv6, and uuid. Or in MySQL you can just use virtual columns to add human-readable conversions of binary columns, although that is admittedly slightly annoying.
MariaDB supports RETURNING.
That's a true shame considering how easy it is to make a read replica of any size and then fail over to it as the new primary. Definite skill issues.
Yep that’s exactly what I’ve seen too :). I still overall prefer this distributed database model - yes you spend more and people make mistakes (and learn). But if you can afford it you get higher velocity and more incentive aligned ownership than the old central gate keeping DBA team model.
“You have an unpartitioned table with a few billion rows and a UUIDv4 PK. I’m amazed it’s been working as long as it has. You need to change your schema.”
“Or we can just buy a bigger instance.”
“…”
Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.
Re: incentives, yes, also a problem. Dev teams are generally incentivized by Product, who doesn’t care at all whether or not something is optimal, only that new features are being shipped. I despise this mentality, but it’s not usually the devs fault.
While most versions ensure it's random, there are plenty of indexing algorithms that make searching through that quick and close to O(1), so that should not be the schema problem.
Unless you used a string field, but there is a quick workaround for that with Postgres too (make an index on `UUID(field)`, and look it up by `UUID(value) = UUID(field)`).
That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases: you can easily handle some small mistakes in schemas too while you do the full migration in parallel.
But sure, let’s talk about Postgres. After all, it stores tuples in a heap, and so is immune to this behavior.
Except that its MVCC implementation means that it has to maintain a Visibility Map [0] to keep track of which pages contain only tuples which are visible to every active transaction. This is nominally used for vacuum decisions, but is also cleverly exploited for use with index-only scans. This poses a problem when referencing non-k-sortable entries, because while the tuples are in a heap, the indices are generally a B+tree. So now we’re back to the problem of massive IO amplification. Even if the VM and index are entirely cached, reads are reads, and they add up.
Then there’s the issue of WAL bloat due to full page writes. tl;dr Postgres pages are nominally 8 KiB, and storage device pages are nominally 4 KiB. To guarantee an atomic write, Postgres writes the entire page for a given tuple for the first write after checkpoint, regardless of how many bytes were altered. Again, non-k-sortable: if your tuples are spread randomly across pages, you now have far more pages to write. This can and does matter for either write-heavy workloads, instances with limited network bandwidth (oh hi small RDS / Aurora), or the worst, both combined.
Re: search complexity, I’m pretty sure B+trees (and B-trees) have O(log n) time complexity, not O(1). Whether or not that’s “close” depends on n, I suppose, but in this scenario I specifically said “billions of rows.”
> That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases
I’m a DBRE, and like both MySQL and Postgres. They both have strengths and weaknesses, but you need to deeply understand those – and properly design your schema and query patterns around them – in order to make an informed decision.
[0]: https://www.postgresql.org/docs/current/storage-vm.html
Easy enough: almost no one writes SQL queries by hand these days, not for querying the database nor for doing schema upgrades. It's all done by tools - Doctrine in the PHP world for example. And pretty much no one but actual CS graduates knows anything deeper about databases.
Result is, devs are happy enough if they found something that works, and don't want to risk being the one who broke prod because they applied some schema change suggested by their DBA who doesn't know some random thing about the application.
My last company had an absolute ton of pl/pgsql written to support hundreds of ELT pipelines, migrations were all hand written and managed with liquibase.
There are more of them than you'd think out there. Just generally supporting some boring b2b or backend software.
The PostgreSQL team simply does not have the resources to do this. At least not without significantly slowing down development of everything else which there is no political will for. Maybe someone will come up with a genius idea which solves this but I am doubtful. Usually there is no free lunch.
Maybe some core dev will correct me, I am quite familiar with the PostgreSQL project but not one of the core devs. :)
1. The new version of PostgreSQL would need to able to read all old catalog table formats and migrate them.
2. The new version of PostgreSQL would need to support all old versions of the parse tree to migrate views.
3. Likely a bunch of more things that I do not know of. I for example doubt it is trivial to just read an old catalog without having a fully up and running cluster which supports almost everything in that old cluster. The catalog has TOAST tables and indexes for example.
Right now 1 and 2 are implemented in pg_dump plus by having pg_dump call functions in a running old version of PostgreSQL.
It works with basically no downtime but I agree this kind of thing should definitely be easier, even turnkey.
Or do you mean the new Pg server should transparently do the upgrade automatically? And while online?
So even if it's available these days, the amount of people still subjected to upgrades from older version still leaves an impression that it's really bad.
- whatever made it past the postgres 10 hump is on the most recent version and keeps getting migrated
- whatever is still on 9.x is scheduled to remain that way until the product using it will be EOL'd
Most of the software on my machines are "old" because they are part of a Linux distribution that (aside from security issues) was frozen in time a year or two ago so that it could be tested, released, and maintained. I am quite happy to have a system that I know is not going to break (either itself, or my workflow) when I apply security updates.
People who MUST HAVE the latest version of everything I feel either have some deeper FOMO issues to work out, suffer from boredom, or look at their computers as hobbies themselves rather than tools. (Which is fine, just be honest about what it is.)
That said, much of my career has been spent working at companies who got so busy shipping features that upgrading infrastructure never makes it above the fold. You can tell the managers that working around old software adds costs that scale with the age of the infrastructure, but they don't always listen. I currently work at a company that still has loads of CentOS 7 hosts still in production, and only fairly recently began upgrading them to RHEL 8. (Not 9!)
Absolutely...not.
Slow does not mean stable. Slow means the floor is rotting out from under you constantly.
Being prudent about when and where to upgrade is a very active, intentional process that the typical company simply don't have the stomach or skill for.
That platform got SMP in 1964.
Not chasing shiny is important but generally when tech debt builds up this high, life is generally hell in terms of outages, unable to accomplish basic tasks and dealing with a bunch of people who have NIH syndrome.
My experience has been that most people (including devs) hardly think about their database at all. Everyone just takes the database for granted until a dev runs a bad query that locks a table. Devs let their ORM manage everything for them and don't take the time to think for themselves.
Also I rarely see teams with a culture that prioritizes maintenance, even for their own software. It's common for teams to become wary of upgrading because they do it infrequently. Like most things, the more often you perform an upgrade the easier it is. Smaller changes are easier to revert. The more often you do something, the better at it you get. Delaying maintenance often just increases the work later.
The replication and high availability that mariadb offers is rock solid, and much more advanced than what postgresql has to offer. It works out of the box, no plugins or third party shenanigans needed, and there is "one obvious way" to do it. Not a dozen options like in the postgresql ecosystem, each with it's own quirks and drawbacks.
Also, upgrades are a dream. I did 4 major long term stable release upgrades so far and everything was fully automatic and smooth sailing all around.
All of that with about 10 seconds downtime only for each upgrade, despite being a chunky 6TB database, thanks to the user friendly replication process.
I respect postgresql a lot, but mariadb allows me to sleep much better at night thanks to replication, smooth upgrades and no VACUUM to worry about.
By analogy: should a database cluster optimize consistency or availability? Answer: there’s no way its author can guess which is more important to how you want to use it.
I have suspected for a long time that most people who criticize MySQL have never actually worked with it, or have done so a couple of decades ago. It's often the default choice if you don't need some of the PostgreSQL features (like PostGIS) and can work with either one.
This is the one thing in My/MariaDB that I miss in Pg: Galera. So handy for in-house stuff that needs some kind of HA: either a three-server setup, or a two-server+arbitrator.
Looking at https://mariadb.com/kb/en/standard-replication/, is the one obvious way binlog replication or Galera?
1. Most psql deployments are not exposed to the interwebz, they are typically only accessible to the applications that need them by virtue of network setup (firewalls etc). This limits the attack vector to whatever the application does. Good.
2. Distro vendors (RHEL et al) often stick to major psql release for the lifecycle of the OS version. If the OS lives longer than the psql major version, they take on the responsability of backporting critical security issues.
3. While upgrades aren't hard, they're not easy either.
4. Psql is pretty much feature complete for many workloads, and pretty stable in general. For many people, there is little need to chase the latest major version.
To drive this a little further, "latest and greatest" doesn't always apply. I've chosen software - even databases - for greenfield deployments one or two releases behind for their known characteristics.
Stability doesn't imply perfection, but rather, predictability.
And in my experience, apart from ease of use there's also a major trust issue here. If you're upgrading your app server framework/language, it's easy enough to do a rollback. With databases, people are worried that they might not notice errors right away and then you have to merge the data accumulated since the upgrade with the last backup in case of a rollback.
Not saying that this is entirely rational...
Also, new features on the SQL level are hard to sell if all you're doing is lowest common denominator ORM ("New window functions and faster lateral joins? But we're doing all that in our code!").
5. If your IT department is spread thin already and that old version is running fine, the incentive to potentially create more work for yourself is not gigantic.
And no, being old is not broken.
Amongst other reasons:
- Performing regular upgrades is an easy and cheap way to maintain a healthy knowledge base about the system. It's always easier to fix a system that is well understood than a black box which nobody has touched in 20 years. Upgrading regularly also leads to people being more comfortable with the upgrade process and it is likely to become a "regular thing" with checklists and other process improvements that make maintenance safer and more reliable.
- Doing preemptive maintenance means YOU get to choose when the system is unavailable and then you can plan accordingly, perhaps by planning it during a period with low usage or even arranging for a redundant system to take up the load. The alternative is leaving it up to chance when a system will fail, and that's frequently at high-load moments when you can least afford it.
- As a corollary to the previous point: a team which is in control of its systems can have more predictable output of feature work, since there will be fewer "unexpected" events.
Not maintaining your systems is only cheaper in the long run if you don't count the engineering time required to fix things when they break or need to be replaced. Ounce of prevention vs pound of cure and all that.
- If you're scared to perform upgrades, this is a good indicator that you lack confidence in your backup/recovery/reversion policies and procedures, probably for good reason, so maybe fix that.
I mean I think it's because maintenance is so unglamorous. So when it happens and everything doesn't collapse, nobody remembers and starts asking why we even do it (and then sysadmins and middle management suddenly aren't a thing, and companies and IT become exactly as fragile as anyone with _actual_ understanding of management would expect).
Meanwhile when regular maintenance fails in progress, it often ends up in the news with a body count attached.
One of my favourite podcasts has a running joke that you should never do maintenance (because so many industrial disasters happen during it). Of course the bias is the point of the joke - and usually the reason things went bad is because either the maintenance was neglected until that point, or the engineering picked Postgresql, I mean, didn't consider the necessity of maintenance.
https://en.wikipedia.org/wiki/G._K._Chesterton#Chesterton's_...
Old software is not necessarily broken, but it is always a tech debt. And you can't live in debt forever, our IT does not work this way.
It's often lower risk to apply frequent smaller changes than wait years for one enormous "bundled" upgrade.
If you are "fixing" a Schroedinger's Cat, my dude the Supreme Court of the Universe will not be kind.
That doesn't mean "upgrade recklessly," but it does mean you should know _why_ you're either upgrading or _NOT_ upgrading. That's your job, much more than the act of upgrading itself.
Unpublished vulnerabilities in old software are not a hypothetical. And very old packages are usually broken, just coped with at the expense of significant lost opportunity cost - or because the failure is a combination of rare and impactful that means once it happens everyone is out of job anyway.
Seriously, I've yet have to encounter a sysadmin using that old, silly adage at me and not later have to admit I was right.
Edit: so no, you don't stay on an ancient version of the database because "it's not broken." You're staying on it because _the upgrade process itself_ is so broken you're terrified of it.
It requires a lot of work, planned downtime, or some way to smear updates across the estate.
The cost of any failure is very high. The benefit of any major upgrade is also vanishingly small. Unless you need a specific feature, its just not worth it.
…or you're on AWS RDS, which will automatically bump your db cluster if it goes EOL and you ignore the notices for more than a year.
I guess it depends on scale? I was surprised how easy it was on Ubuntu. There was an automatic migration script, and it worked. Took less than 5 minutes to upgrade.
Sure, there was downtime, but I think most applications out there can live with scheduled downtime of a few minutes.
If you can't have downtime, then nothing is easy.
To others reading, be mindful: database upgrade times depend greatly on the data stored within them/where/who is using them. Your development environment doesn't represent production. If the distinction even exists!
A five minute upgrade can become indefinite with a single lock [row/table, depending on the storage engine/etc]
No matter how easy it is, it takes more effort and thought to do an upgrade than it does to not do an upgrade at all, and for most users the upside is basically invisible if it exists at all.
Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.
At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.
I think it boils down to this. We used to constantly be running the obsolete pg version until it became an emergency mostly because upgrading with the tooling available at the time was very painful. Today however, we stay relatively up to date. Once you figure out the data replication, you can almost do blue green deployments on databases with almost no down time.
https://jepsen.io/analyses/mysql-8.0.34 ( MySQL 8.0.34 2023-12-19 )
HN: https://news.ycombinator.com/item?id=38695750
"MySQL is a popular relational database. We revisit Kleppmann’s 2014 Hermitage and confirm that MySQL’s Repeatable Read still allows G2-item, G-single, and lost update. Using our transaction consistency checker Elle, we show that MySQL Repeatable Read also violates internal consistency. Furthermore, it violates Monotonic Atomic View: transactions can observe some of another transaction’s effects, then later fail to observe other effects of that same transaction. We demonstrate violations of ANSI SQL’s requirements for Repeatable Read. We believe MySQL Repeatable Read is somewhat stronger than Read Committed. As a lagniappe, we show that AWS RDS MySQL clusters routinely violate Serializability."
---------------
https://jepsen.io/analyses/postgresql-12.3 ( PostgreSQL 12.3 2020-06-12 )
HN: https://news.ycombinator.com/item?id=23498781
"PostgreSQL is a widely-known relational database system. We evaluated PostgreSQL using Jepsen’s new transactional isolation checker Elle, and found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable. Under normal operation, transactions could occasionally exhibit G2-item: an anomaly involving a set of transactions which (roughly speaking) mutually fail to observe each other’s writes. In addition, we found frequent instances of G2-item under PostgreSQL “repeatable read”, which is explicitly proscribed by commonly-cited formalizations of repeatable read. As previously reported by Martin Kleppmann, this is due to the fact that PostgreSQL “repeatable read” is actually snapshot isolation. This behavior is allowable due to long-discussed ambiguities in the ANSI SQL standard, but could be surprising for users familiar with the literature. A patch for the bug we found in serializability is scheduled for the next minor release, on August 13th, and the presence of G2-item under repeatable read could be readily addressed through documentation."
To be clear, I like both. Postgres has a lot more features, and is far more extensible. But there’s no getting around the fact that its MVCC implementation means that at scale, you have to worry about things that simply do not exist for MySQL: vacuuming, txid wraparound, etc.
That was true in 2012; dunno if it still applies though.
Pretty sure that even in 2012 MySQL had very easy to use replication, which Postgres didn't have well into the late 2010s (does it today? It's been a while since I've ran any databases).
But sure, it was easy to get a proof of concept working. But when you tried to break it by turning off network and/or machines, then shit broke down in very broken ways that was not recoverable. I'm guessing most that set up MySQL replication didn't actually verify that it worked well when SHTF.
sometimes, repairing broken data is easier than, say, upgrading a god damn hot DB.
MVCC is overrated. Not every row in a busy MySQL table is your wallet balance.
source: direct personal experience.
How did you verify that it was rock solid? And which of the variants did you use?
Possibly I got it wrong and switched around which was easier on the devs and which was easier on the sysads?
In my defence, ISTR, when talking to sysads about MySQL vs PostgreSQL, they preferred the latter due to having less to worry about once deployed (MySQL would apparently magically lose data sometimes).
Except that search doesn't work, because all text is in latin1 encoding.
What part of searching have you found to not work well using it?
Postgres is definitely more difficult to administer.
* I'm running a lot of MySQL stuff and such a topic might be of interest to me
The fact that you still can't use DDL in transactions makes life exceedingly painful, but it's technically safe if you write your migration code carefully enough.
And then there's UTF8 not actually being UTF8, which can result in total lockup of a table if someone inputs a character that does not fit in UCS-2 and now you need to recover the database from backup and preferably convert all instances of utf8 to utf8mb4, because fuck you that's why.
EDIT: my mistake, OPTIMIZE TABLE is an online DDL. I’ve been burned in the past from foreign key constraint metadata locks essentially turning it into a blocking operation.
The OPTIMIZE works almost exclusively with online DDL statements. There's only a brief table lock held during table metadata operations, but I haven't found that to be a problem in practice. (https://dev.mysql.com/doc/refman/8.4/en/optimize-table.html#...)
https://web.archive.org/web/20230922210124/https://grimoire....
My only conclusion is that the majority of devs don’t actually read documentation, and rely purely on the last blog post they read to influence their infrastructure decisions.
[0]: https://www.sqlite.org/quirks.html
MySQL was immortalized as the database in every LAMP stack. And just like PHP it improved considerably since then.
(I joined MySQL shortly after that mess, before the Sun acquisition)
(which may be actually derived from the turkish aslan, for lion, but somehow the animal got mixed up)
It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.
In no particular order, my preference for postgres is driven by:
MySQL/Maria I'm sure is fine, but it's one of hose things where it's just different enough and I haven't encountered a compelling use case for changing my preference.That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).
This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.
This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.
Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.
And why is a 768 character limit woefully bad, but a 2704 character limit is totally fine?
Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.
No, it can't. URL doesn't have any length limit, regardless of the fact that different software will impose different limits.
VARBINARY is typically ok for that I'd think? Then you can utilize the full 3072 byte limit for the key, since there's no character set applied.
> even just being able to sort user-submitted strings up to a kilobyte
As a software engineer, I completely agree. But as a DBA, I am obligated to make a "tsk" sound and scowl disapprovingly!
If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
Creating short numerical primary keys for long strings is a common database technique.
I for one have always viewed this as a perk.
[0]: https://dev.mysql.com/doc/refman/8.0/en/timestamp-initializa...
* comprehensive transaction model using different modes
* PostGIS and lots of other great extensions
* supports most of the current SQL standard and is clear on interpretation of edge-cases in the documentation
* support for writing stored procedures in any major programming language
* many useful functions regarding dates, sets, ranges, json, xml, ...
* custom datatypes
* extremely thought-out and standardized approach to development: if a feature is included it generally works well in interaction with everything else
* syntax, semantics and performance are all very predictable
* great documentation
Regarding MySQL / MariaDB: MySQL optimized for performance first. Until 2010 the standard-storage-engine MyISAM didn't even support transactions.
PostgreSQL always focused on correctness and stability and then made sure everything performed.
Good in theory. But last time I checked the main libs to connect to pgsql, everything you get back from the database are strings. So you need something in your app to convert those strings to the equivalent data structures.
Yes, more drivers and libraries should support the more expansive data type list, but even just within the database itself there are multiple advantages.
Aside from DDL, the only other major ones are manipulating users/grants, manipulating replication, a small number of other administrative commands, and LOCK TABLES.
This is all documented very clearly on https://dev.mysql.com/doc/refman/8.4/en/implicit-commit.html. Hardly an "obstical course".
How software "feels" is subjective. Can you be more specific?
Today I use CLIs like usql to interact with MySQL and SQLite so I can continue to use those commands.
At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
In MySQL either `?` or `help` or `\?` will show you the help...
\c is for connect.
\dt is for describe tables.
Just another anecdote: MySQL lost data for me (2004). I spent some time evaluating the projects and Postgres’ development process seemed much more mature — methodical, careful, and focused on correctness. Boring, which I loved.
I didn’t need whatever perf advantage MySQL had so I switched to Postgres and never looked back. And then the Oracle drama and Monty’s behavior around it — not saying he was wrong or right, but it was the opposite of boring — just reinforced my decision.
I like to play with new tech in various spots of the stack, but for filesystems and databases I go boring all the way.
Real answer: no licensing cost
1. Set up logical replication to a new database server. We used https://github.com/2ndQuadrant/pglogical, but maybe you don't need that any more with newer versions of postgres?
2. Flip a feature flag that pauses all database queries and wait for the queue of queries to complete.
3. Wait for the query queue to drain and for replication to catch up.
4. Flip a feature flag that switches the connection from the old db to the new db.
5. Flip the flag to resume queries.
It helped that we were written in OCaml. We had to write our own connection pooling, which meant that we had full control over the query queue. Not sure how you would do it with e.g. Java's Hikari, where the query queue and the connection settings are complected.
We also had no long-running queries, with a default timeout of 30 seconds.
It helped to over-provision servers during the migration, because any requests that came in while the migration was ongoing would have to wait for the migration to complete.
In that post we walk through all the steps we took to go from Postgres 11.9 to 15.3.
Let me check what I'm on... brb... Postgres 14.
Because it's not automatic I leave it, I leave it until it's so unsupported that I must upgrade the whole system, then I build a new system with a new Postgres and I migrate the old to the new.
I want, so badly, for Postgres to just automatically update itself, that a new binary just works with the data directory of an old version, and that if required it does an in-place upgrade to those data files when it can (i.e. if it can detect the last version was the same major as the current version, upgrade the files transparently to the admin).
My databases are all backed up each night, and these are single server Postgres with no replication or other trickery, an automatic upgrade for a single-server Postgres should be possible.
As it's not done... I assume (incorrectly?) that there be dragons and risks, and I mitigate that by never upgrading, just waiting and migrating. Migrating puts all of the risk on me, human error, and I am definitely fallible, so I can best handle this risk by just not doing it until I must.
Last migration I performed was from Postgres 7 > Postgres 14 in October 2021... I guess I have quite a few years of Postgres 14 ahead of me still. I would take downtime, my systems aren't zero downtime critical, just give me a super simple one-command no-questions upgrade as I really dislike migrations.
Whether you go from v14 to v17 or v14 to v20, I doubt it'll make a difference in migration strategy. You've still got a fair amount of time before EOL. No need to stress or FOMO.
https://github.com/pgautoupgrade/docker-pgautoupgrade
So for fresh installs yes but existing ones not so much
“Show me the incentive, I’ll show you the outcome”
Significant security vulnerability? Upgrade
Feature you need? Upgrade
All other reasons: Don't upgrade.
Upgrading takes effort and it is risky. The benefits must be worth the risks.
Additionally, actions done frequently are less risky than actions done rarely, since you develop skills in performing that action as an organization - see high deployment frequency as a strategy of managing deployment risk.
This adds up to continuous upgrading being the least risky option in aggregate.
I’ve been there with products that were still internal at the time. I can only imagine how much fun that is with a public product. But then I do have a pretty vivid imagination. We changed to periodic upgrades after that to avoid the obvious problem staring us in the face.
[0] Probably including those passwords you didn't hash, and those credit card numbers you shouldn't be storing in the first place because, what the heck, it meets your needs.
Thank you Amazon!
Once the database runs you are inclined to let it keep on running until you must do an upgrade (Bugfixes, EOS) . Upgrading for new features/ improved performance is nice but can be a pain and is prone to cause downtime. PostgreSQLs upgrade tolling story is also a bit painful but it has proven to be an awesome way to semi-regularly test my backup infrastructure.
My timid management forbade an upgrade from Oracle 7.3.4 until 2013. It was agony to remain on that museum piece for as long as we did.
I am upgrade-minded, but my management is not. I always lose.
I am retiring in two years. I will not miss their problems, not at all.
Edit: Oracle 10g was the last release that (for us) brought must-have features. Sure, upgrading to 19 or 23 would be great, but it doesn't bring anything that I really want.
From the same blog:
> Sep 26, 2024 - Postgres 17 is Now Available
3 weeks....for a new major release...and we are asking ourselves why people haven't updated?
(Granted, we have rather small DBs and simple applications where ON UPDATE SKIP LOCKED is about the most fancy feature we use.)
Database engines (every single one) are notorious for incompatibilities between major versions, upgrading mission critical stuff means updating and re-testing entire applications, which in some cases can be a multi-million dollar process, before going into production.
Even if you deeply know/think that there's no problem upgrading, if something does fail in production after an upgrade and it's mission critical..
For this reason, I chose to upgrade to version 11 because it was only a couple of versions apart and still had repositories available at the time.
So the first thing I do is stop the VM and take a snapshot. Then I start it back up and go check for database corruption before I dump them... wait there's no utility to check for corruption...? Yep that's right! You basically have to YOLO the whole thing and hope it works. OK...
So I dump the databases and back up the directory. I shut down the old version and then install the new version from the repo. I start to import the databases and notice in the scrolling logs that there's some incompatibility... Oh F*$&. I google the error and spend a good hour trying to figure it out. Apparently there are external plugins for postgres that were installed in the old version. I search for the plugins online and they are long discontinued. OK, so let's just copy them over from the old version. I stop postgres, copy them over, and start it back up. It starts up ok. I reimport the databases and no more errors. Yay! I start the applicationsand pray to the SQL gods. So far so good, everything seems to work.
Thankfully the applications tested well and worked post upgrade.
All of this was done for a critical multi-million dollar healthcare platform. There were no official guides. Nothing. I had to find some random admin's blog for guidance. MySQL on the other hand has full documentation on just about every aspect of everything. The whole process was super hackish and not having any way to check database integrity would have been a show stopper for me had I designed this configuration.
My homelab projects mostly center around a "everything is an MQTT message" idea. Zigbee sensors, Tasmota power readings, OwnTracks locations, surveillance camera events, motion sensors for light switches, currently active app on my PC, status of my 3D printer, whatever my vacuum robots are up to and so on. It all gets recorded into a Postgres db. From there I can use it for data mining experiments, but mostly as a source for Grafana. I tried counting the rows but that query didn't even complete while I was writing this comment.
I like trying out all kinds of dockerized oss services, and I keep them updated using watchtower. I run a gitlab instance which is usually the most annoying service to update because it there's an upgrade path and post-start-migrations. With my Postgres instance, which is isolated from the internet, I'll have to figure out what the fastest way is to move all that data around, not leave a huge gap in the record and so on. Sounds like at least a day of work - and since it's technically all for "fun", it'll have to wait until it actually is that.
I recommend installing PG12 on a temporary VM, duplicating the existing database, and test the upgrade in isolation.
https://www.postgresql.org/docs/current/pgupgrade.html
A more complicated approach uses replication, and upgrades the standby server before promoting it to the primary server.
And since I have backups, I might not even need the testing step, considering the low risk. Might do it anyway just out of curiosity at how long it would take to duplicate.
The consequence is that things in database-land tends to move slower than other types of software. This I think is the major reason why we still use SQL.
A dump/reload of the database or use of the pg_upgrade application is required for major upgrades.
Now, seems one can mostly use pg_upgrade, which only rewrites the system tables[2] so is fairly quick. But if on-disk format has changed it seems you're forced to dump and reload.
At work we've mainly been using SQLAnywhere, which would just disable new functionality for databases using old on-disk format. So upgrading major versions has usually been fairly painless and quick.
[1]: https://www.postgresql.org/support/versioning/
[2]: https://www.postgresql.org/docs/current/pgupgrade.html
I don't think this is done automatically when you simply install a new postgres version, but I'm not certain of that.
https://www.edgedb.com/
Suppose you join a project and see that the PostgreSQL version used is pretty old.
Do you:
Many will prefer to not get burned even if there shouldn't be that many risks with upgrading your average PostgreSQL install, which is why you'll get minor/patch releases as best, alongside whatever the runtime environment is getting upgraded.Containers and bind mounts make all of this way easier, but then again, many places don't use containers.
That way you can update the database version when you want, while still installing the base system updates quite frequently. Plus, I think it’s really nice to separate the runtime from the persistent data, which such setups make obvious.
If I run my application/code v1 right now, I generate data. I expect that if I move to application/code v2, I can leave my data in place and it will automatically apply changes to my data.
I do not get that with postgres. If I am on postgres 16, and I want to upgrade to postgres 17, I want to leave my data folder untouched. When I then start postgres 17, it should just work (tm).
It should also work with code that assumes postgres 16, so I can upgrade my database separate from my application. I can not wait 10 days for a large database to be migrated from 16 to 17 without being able to run it. However, I can wait 10 days before updating my code to support features in 17.
The current upgrade process does not give me such confidence in restoring data and uptime. So I don't upgrade until I really have to.
That way you can have multiple versions of the same package.
We are constantly making changes to our schemas (adding tables, columns, etc). Its never an issue on physical standby's, it just gets created, but logical replication, we have to manually run the changes on the subscriber.
We have lots of instances where we create a new table for a feature coming, and alter another to add a column.
If those get missed on the logical subscriber, you don't even know until someone tries to write data to that new table or new column.
I know logical is supposed to be flexible, but I wish there was a setting to have a replica via logical, so I can handle upgrades easier.
Testing all the apps that use it, not so much.
I hope you have a backup somewhere else, not on the Mac Mini. One option would be to restore it on another machine on Postgres 14, follow the upgrade path to 17 (idk if you can jump directly but a few Google searches will cover this), verify the data, back up the new db, somehow upgrade Posgtres on your Mac, and restore the new back up. Done.
brew install postgresql@17
Hell, I have a hard time to tell the version of some system build-in binaries.
A few months ago, I have trouble to unzip a file which turns out ot be AES-encrypted. Some answers on SO [1] saying I should update my `unzip` to newer version but I can't find any updates for my distro, and I have no idea (still no, so feel free to teach me) to update it manually to make my `unzip` supporting AES. And all the versions, the good and the bad, all say they're "version 6.0.0" despite they behavior obviously differently.
[1] https://stackoverflow.com/questions/60674080/how-to-open-win...
If you haven't installed them via your programming language's package manager, you either installed them manually or via the OS package manager. The first one you'd know how to upgrade, and for the second you can ask it what version it is and what version is available to upgrade to (for compatibility reasons it might not be the latest, or latest major, unless you use the software vendor's own package manager repositories).
It's actually much easier than in Windows, because you have a piece of software (package manager) that is your one stop shop to knowing what is installed, update it, check versions available, etc. unless you've manually installed stuff.
In Windows you... google and download random .exes? Cool. As good as the worst possible option on Linux.
Windows has an official package manager now too.
https://github.com/microsoft/winget-pkgs/tree/master/manifes...
> As good as the worst possible option on Linux.
I understand this is not a fair comparison, but in practice, they're not as easy. When using Windows, I usually use it with a proper GUI interface, so popping up a browser and download the newest installer for a software from their official website would take me less than 1 min.
Doing similar for my Linux VPS with only a terminal is much more complicated.
Debian/Ubuntu:
Fedora/RHEL: Arch: Alpine Linux: Of course, if the package you need isn't available in the standard repos, then you'll need to look elsewhere (e.g. PPAs or third party repos). There's also options like Flatpak and AppImage if you want something that's a bit closer to how you'd choose to install new releases on Windows.If I wanted to update all of the installed software I have on my Windows install, there'd basically be no way for me to do this, outside of shady update manager software.
At the same time, I get the appeal of being able to just download a new release and install it, both AppImage on Linux and the way you install software on macOS (just drag the file into Applications) seem similarly pleasant to me in that regard.
To expand on the latter (the response got deleted), you can very much do something like https://peazip.github.io/peazip-linux.html (I wish 7-Zip was available on Linux natively, but as far as GUI software goes, PeaZip is pretty nice) however that's not the most common approach. You should generally prefer using the package manager when you can.
On average, the experience of upgrading/managing packages is obviously much better than Windows.
I meant to say in certain cases (like the `unzip` example I mentioned above), when the system's build-in package manager fails, I seem to not be able to find alternatives like what I did on Windows (just find the piece of binary I want and manually install it). I to this day still can't find a way to update `unzip` to a version that supports AES on my Debian VPS.
Maybe because there is none? I quickly googled and found this bug: https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/220654
For archives encrypted with aes-256 p7unzip-full can be used.
This is not a Linux only issue though, the native Windows unzip tool also doesn't seem to support aes-256 (yet): https://answers.microsoft.com/en-us/windows/forum/all/how-do...
The author in this answer clearly has a version of unzip that can detect "AES_WG". Unfortunately they only vaguely said (in one of the comment) "Since then the main Linux distros have added patches to fix various issues" and didn't specify which distro.
sudo apt-get install p7zip-full
I meant to say when you can't find the software you want in package manager, it's easier to download it manually and install it on Windows than (again, unfair comparison) a terminal-only Linux server.
In that case you would just copy the download link and paste it into your terminal session. It's rarely needed though as most software is available through your distribution's software repositories.
And winget.
I’ve run multiple variants of PostgreSQL on my MacBook under MacPorts. Can’t even remember the last time I thought about it, to be honest.
It's been a long time since I worked with v9.x in anger, so I could well be forgetting things though
https://www.postgresql.org/docs/current/ddl-partitioning.htm...
But it wasn't really a nuisance. Postgresql doesn't give a DBA much to worry about.
Any needed updates to databases are done by the new exe, automatically. Obviously backup everything before you do the upgrade. I've never had one fail.
There's no option where the nodes all accept writes.
You can check this article about Uber migrating its Mysql from v5 to v8 posted here 3 days ago [1]. Among other things, they observed a "~94% reduction in overall database lock time." The before/after graph is pretty impressive. It also gave them window functions and better JSON support, which are two very big features.
[1] https://www.uber.com/en-JO/blog/upgrading-ubers-mysql-fleet
The classic example is advertising a new improved fishing reel in a fishing magazine. People buy the magazine (well, 20 years ago they did) because they want to know about things like new improved fishing reels.
It's a world away from the overwhelming avalanche of bullshit that is modern advertising/spam. There's nothing at all weird about hating advertising in general but being ok with permission marketing.
If you follow this idea further you'll find that very few people, even the most vocal, genuinely hate advertising. We all want to know about useful products and services. We just don't want to see a million ads a day for Apple, Coke, Pepsi, Nike, erectile dysfunction, fake single women in your area, Nigerian princes...
Because when it reaches a certain scale, and when too many psychological tricks are being played, and everything is always, BRIGHT, BIG, hyper-sexualized, when you can't walk down any street, watch anything, read anything, without seeing people richer, smarter, younger, sexier, happier than you, it goes far beyond just advertising. It's brainwashing. It has to stop because it's extremely unhealthy for our societies, our mental health, our children.
But when I'm watching a YouTube video, having the video cut mid-sentence to some hyper-annoying and unrelated noisy ad simply angers me and makes me look for an ad-blocker.
No. It's been released in September 2024. That's not "quite a bit".
Now as to why people aren't all on 17 and not even on 16 yet, here's an acronym for you: LTS [1]
Debian 11 Bullseye is the current LTS. It came out in 2021.
[1] https://en.wikipedia.org/wiki/Long-term_support
[1] https://www.postgresql.org/support/versioning/
Debian has Stable. That's it.
The RDBMS has not fundamentally changed much since the 80's. It's the same basic design inside and out, with a shitload of tweaks and optimizations. Don't get me wrong - you can get very far with tweaks and optimizations. But the foundations root you to a particular model, and some things in that model will always be painful.
The important question to me isn't why don't people upgrade. It's why do people run Postgres?
Is there no other kind of database? No, there are hundreds of different databases. Is it because Postgres does things fundamentally different and better than anything else? No, lots of things work somewhat like Postgres, with pros and cons. Is it because you can't do things without Postgres? No, there are always alternatives that you can make work. Is it because it's impossible for it to keep working without upgrading? No, any database with long-term support could continue working without upgrading.
So why use Postgres at all? Answer: it's the incumbency, stupid.
When a big fucking honking giant "thing" sucks up all the air in the room, provides for most of the use cases, and is accepted by all the users, then it's extremely hard to justify not using it. Incumbents usually win, even if they're old, disreputable, buggy, and annoying. Even if they're missing some obvious features other things have. Even if people loudly complain that they want change. It's just more annoying not to use them.
We're used to them. We've adapted. Trying to do something else is going to be annoying and hard. So we stick to what we know. But that doesn't mean we have to stroke the balls while working the shaft. Upgrading isn't exactly fun. It doesn't even buy us anything, other than the promise of "support", or "compatibility" with whatever else will eventually require it. So we upgrade, eventually, when we must.
But the constant mind-numbing march away from entropy isn't a fantastic reason to keep using the thing. When you have to convince yourself to stick with it, it's likely you're in an toxic relationship. If you're honest with yourself, you'll agree that it's time for a change. You deserve better.
But switching is full of unknowns. Ominous. Time-consuming. Hard. You know you want to, maybe even need to. But there's too much to lose. So you stick to the familiar, to what you can handle. Maybe something better will come down the pike soon. You even tell yourself you're lucky to be here. But deep down you know you're just comforting yourself. You wish you had something else. Something that brings you joy. Something better. Something... right. But that something isn't here right now. So until Mr. Right gets here, you'll stick with Mr. Right Now.
Time to get ready for the next upgrade..... sigh...