Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’ve found shared app, shared database completely workable by utilizing Postgres’ row level security. Each row in any table is locked by a “tenant.id” value matching a tenant_id column. At the application level, make all requests set the appropriate tenant ID at request time. You get the data “isolation” while using the simplest infrastructure setup.


There's another benefit to using a tenant_id. You can partition your tables using Postgres partitioning. That keeps the tenant's data together and keeps queries fast.

It's also amenable to distributed processing if you use something like Citus.

I'm really looking for an alternative to Citus, though. Citus itself is a bit tricky to use, and the SaaS version of it is owned by Microsoft, which means Azure-only. Also, Microsoft makes the SaaS version insanely expensive. If they had Citus-like features on Amazon Aurora I'd be there in a heartbeat.


I use Citus. Their team is great, but it is very expensive.

The move from AWS to Azure came with a reduction of IOPS per instance. We have not had the same level of performance since migrating to Azure.

I am considering running our own Citus instances.


What about postgres' schema to isolate tenants?

https://www.postgresql.org/docs/current/ddl-schemas.html


These guys wrote the most popular Ruby on Rails multitenant gem and eventually moved away from it:

https://influitive.io/our-multi-tenancy-journey-with-postgre...

Some things were mistakes but others look like pretty fundamental flaws. Performance is a problem, database changes are a problem, and you aren't able to query across tenants.


I'm using this in practice and it has served me well for years.


100% this works. Just be careful to ensure all your queries and indexes include the tenant Id.


this is essential the key (haha) to all shared/shared scenarios, regardless of what tech they are implemented with. The challenge can be migrating from single tennant to this could be fairly impactful, depending on how you built your original solution.


Not really. Your queries don't have to contain tenant Id if as part of your tenant context they have a connection string to their tenant DB.


Wouldn’t that mean you’re holding lots of non reusable connections?

I have recently been using row level security with a transaction middleware where I set the tenant ID.

Nice article regarding it - https://aws.amazon.com/blogs/database/multi-tenant-data-isol...


We have about 6 physical DB servers, each client has their own db/schema on one of those boxes. Several thousand clients each with 10s of users per client. It brings in $15m ARR.

So, it works.

We're wanting to move off that architecture to something more future proof, but it's not our biggest pain point at this point in time.


The issue I've seen with this is that to get true security you need 1 db user per tenant. That makes connection pooling difficult and presents its own security issues on limiting the tenant to only their DB user.


>That makes connection pooling difficult and presents its own security issues on limiting the tenant to only their DB user.

Can you use SET LOCAL ROLE <user> on each transaction ?


If you do it that way then you don't gain much security. Any SQL exploit would just need to add the Set Local Role to break out of the tenant row level security. Any code error would (probably) still allow unauthorized access because that error will likely also set the incorrect user.

It adds a layer of security so it might prevent some bugs leading to exploits. But in itself is not enough to rely on to separate tenants.


Well if you have SQL injection bugs then you have bigger issues to worry about - I've used this to enforce multi-tenancy on database access level (like another poster said - preventing queries accessing wrong data by accident, which is far more common I think).


A SQL injection bug is (probably) not that big of a deal as long as the tenant boundary isn't crossed. They'd be stealing their own data.


True, I'm just not sure that I'd trust the DB isolation once the user has SQL injection. I never saw a SQL injection report on a project (well since the PHP days) ORMs solved this for the most part, but I did see multiple instances of accidental data leaks from bugs on different projects.

It looks like you could also use SET SESSION AUTHORISATION for this but I haven't used it so I don't know how this works with data access/pooling


If you are running a copy of the same software for each tenant anyways it doesn't matter much as a SQL injection for one tenant is most likely available on all tenants.

I think for this use case security is focused on accidentally returning the wrong tenant's data (fully or partially)


> available on all tenants

Yes, but typically not across tenants. Maybe the flaw is only exploitable to admins of each tenant and they shouldn’t see other tenants data.

I.e. https://news.ycombinator.com/item?id=24216009


Thanks, this really helped me understand how row level security can be implemented effectively to partition tenants. It probably seems an obvious idea to many, but I appreciate it nonetheless


My strategy is using a separate, but equivalent schema for every tenant and setting the tenant context with every request before running queries.


This works and solves a lot of problems. The downside is that schema changes are cumbersome because you have to make them in many places. If you want to roll out a new feature in a shared app which depends on a schema change, it's hard to do without downtime or complicated feature flags.


It seems like some continuous deployment strategies could work here, as long as you limit the number of schema changes in flight to a handful.

Customer A is running the new schema, so gets one cluster. Customer Z will get migrated in the next hour, and then none of the old system is running.


Ideally setting the tenant context happens early during request authorization, is a required to get access to a database connection, and is configured outside the scope of any request business logic.


Yeah, I'm using Java, so in practice this means I extended my JDBC DataSource. On every call to getConnection() it sets the schema.


Same. This is the most sane way, in my experience. It’s pretty easy to move a tenant out of this model and into isolation if needed (never had to do it, but I dry-ran it). It’s harder to go the other way. Deployments, total system queries for analysis, etc are all simpler with this approach.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: