Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What cloud advocates always say is that the $50k monthly will save you money from not needing to hire a team to manage it for you, and that over the course of 10+ years you will be ahead.

Is that true in anyone's experience? Every once in a while somebody posts about their competing bare-metal system and it looks like a lot of people have managed to cut their server costs by 99% (based on the numbers they post) by avoiding the cloud as a service

Honestly curious



There's a lot -- and an increasing amount -- of knowledge that's specific to each cloud platform, and increasingly specialized (and complex) software. In the last couple of years or so I've definitely seen companies having to hire people to manage their cloud setups.

I suspect it depends on a lot of things -- the complexity of the project, its architecture, the development, maintenance and management practices. For a few years, me and a colleague used to manage a 30+ server setup without needing more than 8-10 hours a month. But we managed to pull it off not because of where the servers were, but because we chose a good and stable tech stack, we had the knowledge and experience required to manage it efficiently, and no one decreed that we shall henceforth move everything to the cloud because it cuts costs.

Given the same situation -- right stack, right experience, useful management practices -- I'm 100% sure you can pull off the same thing in the cloud, at least as far as efficiently managing the whole setup is concerned.

But IMHO the idea that cloud services give you all of that for free is snake oil. As soon as you need more than a virtual machine running a web server or whatever, what you end up with is exactly what you build. If what you build is crap, it's gonna run like crap, and you're going to need a crapload of money to keep it running, and two craploads of money to fix it.


I have an experience related to this. We used to run a [scalable SQL engine, PaaS] for ~$1000/mo, but it wasn't cool enough, so people started migrating stuff to Spark, Databricks, S3 etc. Now the infra costs are ~$8000/mo + several people to manage it build tooling around it (~$6000 per person).

People just want to run their selects, man.


>>I have an experience related to this. We used to run a [scalable SQL engine, PaaS] for ~$1000/mo, but it wasn't cool enough, so people started migrating stuff to Spark, Databricks, S3 etc. Now the infra costs are ~$8000/mo + several people to manage it build tooling around it (~$6000 per person).<<

IMHO cloud services are only really logical options for new startups or ventures. Existing IT shops that are already heavily invested in infrastructure ops will NOT readily move to the cloud unless there's an obvious attempt at power-grab or subvert the IT/Ops fiefdom.

>>People just want to run their selects, man.<<

I read that in the "Dude's" voice. :-)


This is so true. Also, I see tremendous value in not being reliant on any of the three major cloud providers. They all kinda seem untrustworthy in their own, unique way.

And also:

That JOIN really tied the tables together.


In the venn diagram of "new startups or ventures", "Existing IT shops that are already heavily invested in infrastructure", and "every business", the first two do not cover even close to the entire area of the third.


Hey man there's a query here!


5 years ago I built our own company CDN using bare metal clusters hosted in multiple POP's that was 1/10th the cost of using commercial CDN's. It brought us ahead by literally hundreds of thousands of dollars per year and it serves us extremely well (continues to this day). The problem is that I am the CEO of the company which has now expanded to 45 people and to date, I've been unsuccessful at assigning a team to manage this piece of tech due to economics. At the point where I have to get engineers to work on the CDN, it becomes cheaper to move to a commercial CDN where it becomes someone else's problem. But at the moment, it ain't broke, the team loves the stability of it and it continues to save us hundreds of thousands of dollars every year....


Something to turn into a product? Economies of scale could help staff it then?


It's something to think about! :)


What product? A CDN?


woah, are you the ceo of artstation? I like that site


Yessir, reporting in. Thanks for the thumbs up.


Great, could you tell your Android contractor:

- "Magazine" in the sidebar just shows "No internet connection" (which is wrong) when clicking on it.

- Why can't the app remember my Filter settings? I'm only interested in digital 2D when on the "Browse" screen, but I have to apply the filter every single time I open the app.

- On Browse selected, click on Search, type in anything (e.g. Blender), then "View All Projects." Scroll through the first "page" (so that results show up that weren't on the screen before). Now these results begin to show up duplicated later when continuing scrolling (I assume pagination and/or the RecyclerView is implemented incorrectly).

- The app doesn't seem to cache images. Click on any project. Let the image load. Hit back. Click the same project again. The image needs to load again.

Looking at it again, some UI elements feel kind of uncanny which makes me think that the app isn't full native. There are some other small things that make the app look a little bit unpolished which I think a site like ArtStation doesn't deserve.


I’ve sent this over for the team to take a look, thanks.


Is the stack too complicated? I mean from a layman perspective id doesn't look like fancy tech, am I wrong?


It depends on who is working on it. :) For devs who are comfortable with bare/close-to-metal clusters and networking, it's really not complex (and why over the years it's just not been an issue for me to just hack on it on the side). But for other devs, just the context switching from working on application domains becomes an overhead you can't ignore. Then when you factor in all the other overheads of assigning a "team" to it which includes scrum events, refinement, just talking about it ends up driving the cost up.

It is a little bit more complex now - e.g. I added on-the-fly image resizing routines. But the core concepts of caching on multiple pops, being able the purge, etc. no it is not fancy tech by any stretch of the imagination.


It can be true, but not from a difference between 50k and 400; chances of you getting ahead of that are not very large. The chance of your company existing in 10 years is not either.

As I have said here before; many people are abusing AWS (etc) touting this reason, but in many cases it is used with the 'we have infinite resources' (no constraints) so literally nothing is optimised and as well the case at hand is often something that would most likely not require much work on bare metal either. So for that price difference, you secure against that chance that you maybe have to fork out $2.5k ah let's splurge, let's say $25k, in those 10+ years for server management/problems etc. Of all the bare metal we have running 10+ years, only one thing breaks and that is not bare metal but an old fashioned VPS. The rest has never had any issues. That's luck, I know, but even if it would have broken, it cost to fix would've been vastly less than 25k/10years. Let alone 50k/mo.

Sure there are plenty of exceptions, but I dare say literally most companies, by far, don't need setups like that. But then again; let them set them up; usually they make things so needlessly complex, expensive and slow that I get hired to fix it somewhere in those 10 years (yes, fixing! in the cloud! as far as 'not needing to hire a team to manage it for you').


We maintain 30 bare metal servers at a colo center, and between me (primarily a developer) and the CTO we spend maybe 1 day per month "managing" them. The last time we had a hardware failure was months ago. The last hardware emergency was years ago.

Servers run on electricity, not sysadmin powered hamster wheels.


Yes, the maintenance is cheap. The changes are more costly.

We run a dozen bare metal servers and I see the difference what it takes to spin up a new VM vs. set up a new physical server. There's planning, OS installation (we use preseed images but we weren't able to automate everything), sometimes the redundant network setup doesn't play well with what the switches expect (so you need to call the datacenter).

Still, it works out in favor of the bare metal servers. But I'm looking forward to a bit bigger scale to justify a MaaS tool to avoid this gruntwork.


I completely agree that some types of changes are much more expensive with a bare metal architecture than with cloud.

6 years ago, I worked for a company in the mobile space. This was around the time of the Candy Crush boom, and our traffic and processing/storage needs doubled roughly every six months. Our primary data center was rented space co-located near our urban office. For a while, our sysadmins could simply drive over and rack more servers. We reached a point where our cages were full, and the data center was not willing rent us adjacent space. We were now looking at a very large project to stand up additional capacity elsewhere to augment what we had (with pretty serious implications on the architecture of the whole system beyond the hardware) or move the whole operation to a larger space.

This problem ended up hamstringing the business for many months, as many of our decisions were affected by concern about hitting the scale ceiling. We also devoted significant engineering/sysadmin resources to dealing with this problem instead of building new features to grow the business. If the company had chosen a cloud provider or even VPS, it would have been less critical to try to guess how much capacity we'd need a few years down the road to avoid the physical ceiling we dealt with.


Yes, the cloud premium is also a kind of insurance - you know you'll probably be able to double your capacity anytime you need it.


Openstack ironic and bifrost are pretty useful OSS tools for managing baremetal servers.


Yeah, "CAPEX always increases, OPEX always decreases".


[I'm speaking only from the US perspective]

The only difference is if you own the gear or not. If you do own it, then it is CAPEX, and the gear goes on the balance sheet and you can only depreciate it according to the schedules ( in some cases immediately 100% but most of companies blow through that number really quickly ).

In all other cases it is OPEX.

The rule of thumb is that all OPEX can be used to offset the revenue, which is god sent to most companies that aren't printing gobs of money.

So if you make some money and you are past 100% deduction thresholds, when owning gear beefs up the balance sheets and at best slightly decreases taxes while spending money on OPEX significantly decreases taxes.


I didn't parse that in terms of bookkeeping. I took that to mean that the ongoing operational expenses such as, salary for the ops team, dev team, expertise, etc. may be difficult to estimate, once you figure out how to do something, they can be automated.


Under the 2017 Tax Cuts and Jobs Act, you can take a 100% deduction on new and old equipment in the first year. It expires in 2022, but that's because of the shenanigans to avoid formally violating deficit rules, which use a 10-year window for assessing budget impact. I think there's a good chance that it'll be extended as you can game the budget analysis that way indefinitely.


Does that time include security updates for the OS and installed services? I would assume so, but 8 (16?) hours a month seems lower than I'd expect given the frequency of vulnerability discovery and security patches.


Yes.


Thanks!


Everything in one colo facility with no geographic redundancy?


If you look beyond marketing a single top tier DC has better uptime than AWS AZ. You are way more likely to be bitten by AWS control layer issues than by DC in a good location dropping.


Most use cases there is just no need.


Some colo companies, even the small ones, offer multiple datacenters. You then have to either use public IPs for the inter-service traffic, maintain a VPN or contract it from them.


Your suggested solutions aren't reliable at all. AWS/GCP/Azure backbones work absolutely differently.


What about software updates and reboots?


There is the same issue with a cloud provider unless you run immutable deployments. Then, you need to invest in the tooling to produce the immutable images.


We didn't find that managed elasticsearch saved us any money over running our own ES cluster, but the best part of the cloud for my company is the scalability -- our peak monthly load is 10X more than our light weekend load. So to accommodate that scalability, instead of buying and maintaining 1000 servers to handle our light base load, we'd need to own 10,000 servers that would only be less than 30% utilized on average. And for redundancy, we'd need to spread those servers across multiple data centers, as well as manage a 2 petabyte storage system (also replicated across data centers) to replace our S3 usage.

And we'd need to have staff to manage the physical servers, run our network infrastructure, etc.

We've been through the numbers many times and the cloud always wins, even when compared to running our own servers for base load and have some hybrid cloud that lets us expand into AWS for peak loads. (which saves some dollars in servers, but that doesn't make up for the added complexity).

And nearly for "free", we have a cold standby site that has a replica of our hot data (a small portion of our full dataset), so if there was a full region outage, we could be back up in running in 30 minutes with reduced functionality.


As someone who doesn't know a lot about this stuff, is there a way to have the base load internal and use a cloud provider for elastic load?


Yes, with a caveat which is latency. For example it's hard to split a typical webapp so the application servers are far from the database. The latency kills performance.

On the other hand, for stuff like CI jobs, batch processing or anything else which doesn't depend on tight synchronization with the other location, you can mix and match.


I found most large companies who use cloud at scale have a team/s managing cloud, providing tooling around management and deployment or even full on PaaS solutions on top of providers. So you might not have you traditional sysadmins but you have a team of what recruiters like to call devops engineers instead.

The main advantage of cloud is the flexibility, treat resources as ephemeral, being able to click a button and get more/less resources. You don’t need to wait for a server to ship and be installed in a dc, if you don’t know what specs you need just twiddle the dials until you get something that performs as you want. It allows you to deploy / release quicker and easier.

It’s possible to architect so the cloud is cheaper but almost never happens. Optimising is intensive, often it’s called over engineering by pm’s/product owners. Costs normally start out low, blow up as products ramp out feature after feature and teams start onboarding, if performance is an issue it’s easier to increase an instance size instead of profile and change architecture. Only when the person paying the bill says this is to much does architecture and other optimisations happen, sometimes at a point when it’s hard to retrofit.


DevOps tools are force multipliers though. Cloud has its own set of problems to deal with, but manualOps BaU work requires so much more labor to accomplish so much less.


not needing to hire a team to manage it

You absolutely do need an experienced team to operate any significant cloud deployment, or misunderstanding the cost model will kill you.

Good cloud people cost more than good conventional sysadmins.

Unless cloud is simply an excuse to kill off a really, really complacent in-house entrenched IT dept, it will not save anyone anything cash-wise.

These are the facts.


I work for a massive corporation who's been on the cloud journey for about 10 years now with no signs of slowing. I think what my company loves most about cloud is that you can throw money at the bad management and volatility problems.

Even if a department spins up 500 servers for some executive's sacred cow when the project goes bust it's trivial to tear down the whole thing without requiring semi trucks.

The other very important thing is inventory auditing. Managing inventory in 10 datacenters built and populated by 100 teams over 30 years is a nightmare. Cloud provides a mechanism to build the coveted "single pane of glass" which large companies need desperately.


The answer to bad management is the cloud. Well put.


The answer is: it depends

Clearly in the example above, you can afford to hire two sys admins to manage the bare metal servers (you can probably afford more, but 2 is the minimum that gives some peace of mind)

We had an application that was a glorified quiz engine and our customers would mass enrol and mass take the quizzes so the scaling that was offered by Azure made it a no brainer for us, specially as given the nature of our customers, quizzes would only take place during working hours, so we'd scale right down for half the day and scale up and out at peak times.

Total costs were about 1/2 of what we estimated for bare metal


2 sysadmins for 2 servers? What are they going to do all day? Managing those two servers should probably be 5-10% of their job -- 2-4h per week for each of them seems plenty --, while doing 90-95% other stuff. Like a sibling comment said, "Servers run on electricity, not sysadmin powered hamster wheels." I'm our de facto sysadmin here (university lab) and I spend maybe an hour a week taking care of our ~dozen bare-metal servers.


> What are they going to do all day?

Be on-call. You need the sysadmins to be available to fix issues when they happen, even if the fixing itself takes little time.

> ... university lab ...

I developed a few web sites for universities, and they were hosted by the university. You really have to hope nothing breaks on Friday afternoon, or you have to wait until Monday morning to get it fixed...


Facebook with 400m users had just one DBAs if I remember correctly. I wonder if people ever runs the numbers.


I was on the DBA team there around that MAU. I had lots of company and support from SRO (aka jr DBA's) as well as other teams (provisioning, etc...).


> Facebook with 400m users had just one DBAs if I remember correctly.

You don’t remember correctly. There are multiple and to my knowledge there hasn’t been just one in at least the past ten years.


Millions of Facebook accounts are a pretty harmless size metric. DBA workload increases with the number of employees, which is proportional to support requests, accidents, changes, strange needs that need to be addressed, and so on.

Buying and provisioning disks before space runs out is a small part of a DBA's job, and a modest constant-size task compared to predicting how fast space is running out.


Have you got a citation for that claim because that figure seems a little hard to believe.

In terms of your general point: it really depends on your business. In my last job there were 4 DBAs out of 12 total IT staff and they were constantly busy. In my current job there are no DBAs in a much larger team and yet no requirement to need one either. The two businesses produce vastly different products.


Yes, if you can benefit from frequent scaling up and down an order of magnitude, this is where the cloud really shines.

For more continuous workloads, you can overprovision on bare metal more cheaply to deal with the spikes.


Would that not just be part of your existing sysadmin teams job.

Back when I worked at BT the unix developers all went on the basic sysadmin course from sun as part of their induction.


That might have worked for the specific hardware BT needed those developers to manage but it's not good advice in the general sense. Systems Administration is as much a detailed speciality as being a software developer. There's so many edge cases to learn -- particularly when it comes to hardening public facing infrastructure -- that you really should be hiring an experienced sysadmin if you're company is handling any form of user data.

As an aside, this is one of the other reasons company directors like the cloud -- or serverless specifically: it absolves responsibility for hardening host infrastructure. Except it doesn't because you then need to manage AWS IAM policies, cloud watch logs and security groups instead of UNIX user groups, syslog and iptables (to name a few arbitrary examples). But that reality is often not given as part of the cloud migration sales pitch.


True but in BT that was SD's (Security Division) Job to set standards and our teams Sysadmin handled that

SD was the employer Of Bruce Schinner for a few years BTW


Kahoot?


I am very happy to pay for Heroku, exactly because I don't want to be the poor guy getting up in the middle of the night to restart Apache.

But for a system that is invisible from customers, like our logging system here, I don't care that much about timely maintenance.


> I am very happy to pay for Heroku, exactly because I don't want to be the poor guy getting up in the middle of the night to restart Apache.

As an employee I am also happy if my company pays more, so I don't have to be awaken in the middle of the night.

As a small business owner with limited resources and liquidity waking up once a month for 50K savings looks like a good deal.


I have a very nice quote from a discussion I remember:

"If you need to get up at 3AM to keep services running, you're doing something wrong."

You can make sure that most of the services in *NIX world to take care of itself while you're away without using any fancy SaaS or PaaS offering.

Heck, even you can do failovers with heartbeatd. Even with a serial cable if you feel fancy.

Bonus: This is the first thing I try to teach anyone in system administration and software engineering. "Do your work with good quality, so you don't have to wake up at 3AM".


I'd pick a call in the morning any time given that the cause of the call occurs rarely and the alternative is to spend a lot of time automating things with possibility of blowing things up in a much bigger way. If situation like [0] had happened to me at night, I'd happily take a time off my sleep and do manual standby server promotion (or no promotion at all) rather than spend days recovering from diverged servers that Raft kool-aid was supposed to save me from.

[0] https://github.blog/2018-10-30-oct21-post-incident-analysis/


I'm not against your point of view to be honest. It's a perfectly rational and pragmatic to act this way.

I'm not also advocating that "complete, complex automation" is the definitive answer to this problem. On the contrary, I advocate "incremental automation" which, solves a single problem in a single step. If well documented, it works much better and reliably in the long run & can be maintained with ease.

Quoting John Gall:

> A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work.


I'm a good engineer, but I'm not qualified as a system administrator. I know that I will do something wrong, and I don't have the time to learn everything.

So I'd rather pay amazon.


So instead of learning systems you have to learn the AWS spaghetti of microservices.


You can try and build and test redundancy and contingency management, and you can lower the frequency of surprises through good choices.

But you're still going to get woken up at 3am sometimes. Things break, in unexpected ways. Maybe the hot spare didn't actually work when a raid set started rebuilding onto it. Maybe third party software did something unexpected. Or maybe something broke and your failover didn't actually work because of subtle configuration drift since the last test.


We have a standard routine called The restart test. We reboot the machine in a normal way to see how it behaves, but in the middle of the workload. Also, if the system is critical, sometimes we just yank the power cables to see what happens.

Normally all plausible scenarios are tested and systems are well tortured before putting into production.

It also helps that our majority of servers are cattle, not pets. So a missing cattle can wait until morning. Also all "pet" servers have active and tested failover, so they can wait until morning too.

We once had a problem with a power outage when our generators failed to kick in, so we lost the whole datacenter. Even in this case we can return to all-operational in 2hrs or less.

I forgot to add: We use installations from well tested templates, so installations have no wiggle-room configuration wise. If something is working, we can replicate that pretty reliably.


Sure, this is typical of well-run environments.

But you probably don't yank power on critical things mid-load after making a trivial change. Excessive testing breeds its own risks.

But it's really, really easy to gank up a trivial change now and then.

In the past 10 years, I've been woken up three times. One was from third-party software having a certificate that we didn't know about expiring; one was from a very important RAID set degrading and failing to auto-rebuild to the hotspare (it was RAID-10, so didn't want to leave it with a single copy of one stripe any longer than necessary); and one was from a bad "trivial change" that actually wasn't. I don't see how you can get to a rate much lower than this if you are running critical, 24x7 infrastructure.


Don't looks like you have much of experience with what's you're talking about - there is no such thing as heartbeatd, it's called keepalived (or pacemaker if you prefer unnecessary complex solutions), any ops person can't even misspell that.


Sorry, you're right. I confused it with its cousin, which is indeed called heartbeatd [0].

I'm new at Linux and system administration. I'm using Linux just for 20 years and managing systems for 13 years.

[0]: https://www.manpagez.com/man/8/heartbeatd/


Having just setup a HA Postgres with Patroni - I disagree. Honestly I think we should've just stuck with a single Postgres server.

Sure you can have an orchestration tool "Make sure everything is running, and respond to failures", but that's yet another tool that can break, be misconfigured, etc.


> If you need to get up at 3AM to keep services running

I just turn off the phone until I get up. That way I don't have to get up at 3 AM, I don't even know they were down until five hours later. :)


Same here.

As a small business owner I run my own apps on Digital Ocean, which to me offers very nice balance between features, reliability and price.


If your application needs to be restarted in the middle of the night, how does Heroku help?


Looking at how the market rate for an AWS architect (I wear this hat as well) or DevOps engineer is, it doesn't work out cost-wise on that front either.

The fact is that cloud is expensive, and outside a few use cases (such as extreme horizontal scalability, aka elasticity, or machine learning) the costs just don't work out well.


I think cloud's way harder than managing actual hardware (and/or your own VMs and such on actual hardware, or even traditional VMs on someone else's hardware) since once you're beyond the trivial it quickly becomes a superset of traditional admin knowledge, not a replacement—you end up having to know how to do things the old way to understand WTF the cloud is doing and troubleshoot issues, or to integrate or transition some older system, or whatever.

And then of course every "cloud" is full of about a billion hidden gotchas ("well the marketing page says that'll work, but on Tuesdays in February it won't, so use this instead, but only if you're writing your logic in JavaScript because the tools for the other allegedly-supported SDKs are broken in weird ways half the time, so instead use this other thing, unless you're a Libra, then...") none of which knowledge transfers between "clouds", and each has a pile of dumb names to memorize and a ton of other per se useless, but necessary, knowledge you've gotta pick up, just to rub salt in the wound.


I have a friend with a rack of co-located servers I manage. I drive from San Diego to LA where they are located maybe once every 6 months or more. I don't believe I have been to the rack for about a year. The rack with 20mbs of bandwidth costs around $2,500 a month.


Did you really mean twenty megabits?


if it’s unmetered 20mbps I’d happily take that over ingress / egress cost at cloud provider such as AWS


Yeah AWS bandwidth egress is extortionate. Digital Ocean is something like 5-7x less expensive. It's ridiculous.


That's kind of shocking when compared to European prices. Over a full order of magnitude of difference, comparing to the list price of the first Google result.


Yes, not sure of the abbreviation apparently! But 20 megabits.

I like to watch Neil Patel's seo videos on Facebook occasionally and he mentioned for one service he runs he spends over 100k a month on hosting. It blows my mind because he could buy a top end server with dozens of processors and terabytes of ram, co-locate it and at least host some things on it or even turn it into your own cloud hosting server.

Even if a person bought 2 over-the-top servers for 30k each, paid around $2,500 a month for hosting it would save huge money.


20 motherboards?


Not in mine. All that stuff you build out of cloud provider tools still needs care and feeding.


Honestly, it depends on the situation. Skilled engineers who can maintain systems reliably are hard to find, in general. If its not a core system and you have the capital, it makes a lot more sense to pay a cloud hosting provider and focus on your product rather than attempt to build something simply to save on costs.


I too never got this. Have learned using bare metal before AWS was around. Took a look at AWS in 2008 and decided that it was interesting, but nothing I had use for at the time. Since then, I never felt any need for it, except for a few times when I wanted to emulate a distributed network for testing purposes.

If I look into the AWS dashboard today, I feel totally overwhelmed, while running my own servers with LXC containers on it feels effortless. I guess for many it's the other way round.

Don't really know what I'm missing, but I'm happy about the old school skillset I have, allows me to have a fraction of the infrastructure costs I would have otherwise for what I'm running.


My company builds out enterprise and b2b apps. For most of our clients, infrastructure just isn't a big enough expense to really think about or optimize for. If you're selling your web app at a $1,000-$10,000 a license per year it doesn't really matter what you spend on servers. But it still matters how much time/money you spend managing your servers. So here it makes a lot of sense to go cloud because it's a lot faster to go from having no infrastructure to running a web server and database with automatic backups.


$50k monthly would pay for a team of the worlds best system administators.


No.


You know sysadmins that make > $100K / year?

If we assume $20K fringe benefits, that $50K / month is $600K annually, which gets you 5 $100K/year sysadmins.


Yes, I know plenty of Devopsish people that make more than that. And I'm not even talking about SRE-minded folk there.


You live in a bubble. Outside of the rarefied air of SFBA, wages are much lower. I do know super sharp people who are paid far less.


And how much is your non-bubbled far less? I'm not even closely to the Bay Area btw.


Under 150k in a major urban area.


You also have to consider the features you either have to do without, or create yourself. Dashboards, alerts, saved searches, web GUI, etc... You might legitimately not need any of that, but simply implementing search and storage isn’t replacing the whole product on its own.


I work at a big enterprise saas company that's moving a couple billion dollars worth of hardware into cloud services, the price difference is wild. Original unoptimized costs were 8x, dropping to 6x in our first round of optimizations. Even the rosiest middle management protections put eventual costs at 2x owning the hardware.

Add to that we threw out plans to do things like virtualization on our owned hardware, and that engineering headcount has consistently grown faster than revenue, it's not clear there are any savings to be had there at our scale.


From my experience (consulting for a decent size company that has lots of their infrastructure on Azure) it looks to me that the amount of management involves is not less then if they had it on premises or co-located bare metal.


Seems like the cloud services are complex enough that you're paying someone to manage them anyway.


I work at a small company, we have 5 programers, and 3 IT/Server/network people.

I'm one of the programmers, and the past few months I filled the role of devops/deployment engineering for our new website.

One great example is Sentry, I love Sentry, and it's been invaluable for us. It saves the web developers a crapload of time. Now Sentry has a self hosted option, and that's what we're currently using.

Now I know little about Sentry's internals, and frankly I don't really care. But sometimes it breaks, or we want it to be updated etc. Sentry offers hosting at 25$/mo, and would mean we don't have to worry about it at all, stability, upgrades, scale, are all handled by them.

25$ a month is less than 1 hour of my time. All it has to do is save me 1 hour a month to easily pay for itself.

---

Another example is that I just spent a large amount of time trying to setup a HA Postgres cluster. This meant I had to dive into the internals of postgres, how our orchestrator (patroni) works, setting up consul to manage the state, etc. This has taken a significant amount of time (several weeks) - it's easy to get a POC working, but actually ironing the bugs out is a different story.

Also nobody else on the team fully understands it's setup, so if it breaks, welp....

All this to say, for us a hosted DB option would likely have been cheaper (compared to my time and pay) and would have better uptime and support then us rolling our own solution.

----

We don't have any central log store, and I really wish we did. Similar situation here, I could spend a month or two configuring and tuning Elastic Search, or we could just pay for a hosted option.

---

Tl;dr - I'm a developer at a small company and have spent little time doing application development the past few months because of all the time that has been required to setup the infrastructure for our application.


So why don't you move to cloud/managed services? I can tell you from own experience that having a HA solution you don't really understand is a really bad idea. Just go with master/replica for Pg and a well documented troubleshooting and manual failover procedure. It'll likely be more reliable than some HA black magic. And it will likely take less than several weeks to get up and running.


I think the real answer is that you don't have to trust an employee's expertise.

I can give you a real life example.

Years ago I did work for this company that built flight simulators for the government. Millions of dollars rolling through their company. One day I get called up by this company and the woman is panicking because their website went down.

Well, come to find out, the entire site was running on a server sitting in an office of their building and the electricity went out due to a winter storm. To say that I was floored is an understatement. I started asking questions.

Well, when I did, the "sys admin" (and I'm putting that in quotes...) started talking to the owner of this company and it was ultimately decided that I couldn't be trusted.

Fast forward a few years and around august of last year this company contacts me again for more work. They apparently scored another huge government contract, built a new facility to be able to house actual government planes so they can strip them down and turn them into flight simulators and so forth.

While I'm at the facility I learn that they're running all their software off of a server sitting in the building. Now, this isn't completely unreasonable, especially for government work. But I again started asking questions.

- Is it environmentally controlled.

- Do you have a generator?

- Do you have multiple lines in case your ISP goes down (which 100% will happen at some point)

- Do you have backups?

- ARE YOU EXERCISING THOSE BACKUPS AT LEAST ONCE/QUARTER?

When this got back to the "sys admin", he was livid. I also found out that they didn't have the source code for the latest changes I made, despite me pushing said source code to a private git server this "sys admin" had stood up. Said virtual server apparently got removed when they moved facilities, but despite this the guy accused me of pushing it to my private github repo based purely on the fact that I stated in no uncertain terms that I had pushed that to the git repo.

But this software was a part of the government contract.

I just sent out an email the next day thanking them for the opportunity, but that I would have to pass on it.

My point is this.

They're an engineering company, that's where their expertise lies. But due to the nature of what they're doing, they were forced into the software side of things. They hired an incompetent.

Companies like these are probably better off throwing money at the problem and putting it in the cloud. The skill level required to successfully run something in the cloud and not completely lose everything is much lower. That's not to say there isn't skill involved, but you don't have to hire someone who may or may not decide to run your software that's involved in a multi-million dollar contract in a closet in your building.


The cloud won't fix "hired an incompetent".


what I specifically said (with emphasis)

> Companies like these are probably better off throwing money at the problem and putting it in the cloud. __THE SKILL LEVEL REQUIRED__ to successfully run something in the cloud __AND NOT COMPLETELY LOSE EVERYTHING__ is much lower. That's not to say there isn't skill involved, but you don't have to hire someone who may or may not decide to run your software that's involved in a multi-million dollar contract in a closet in your building.


> What cloud advocates always say is that the $50k monthly will save you money from not needing to hire a team to manage it for you, and that over the course of 10+ years you will be ahead. Is that true in anyone's experience? Every once in a while somebody posts about their competing bare-metal system and it looks like a lot of people have managed to cut their server costs by 99% (based on the numbers they post) by avoiding the cloud as a service

Too many weasel words.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: