Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Things I learned after getting users (basementcommunity.bearblog.dev)
249 points by HermanMartinus on March 13, 2023 | hide | past | favorite | 149 comments


"this is mostly because i relied on a SQL ORM which in short is a tool that makes writing SQL easier to pick up and faster to develop. the biggest downside is that it might execute 50 queries to your database to get a list of information, when it probably only needs 1, which will cause slowdown."

I appreciate this honesty. Listen to this old man's advise: learn SQL properly. It's not that hard. Focus on it for a few weeks intensely and you've mastered it for life. Then just write SQL directly.

I've had weekends ruined troubleshooting my "highly productive ORM layer" that nuked a production database. Whilst functionally speaking my ORM code was in no way incorrect. I'm talking differences of a thousand fold in query load depending on how one expresses the ORM calls.

You can then become proficient in trying to reason and predict about what your ORM calls do in the actual database, but when you're several joins deep, this becomes near impossible. At which point you become the ORM, and might as well just write SQL.


Honestly, I started on the ORM hate train when I first started and wrote my own queries in an effort to be "performant", but my frankenstein of writing direct queries and abstracting code to be reusable became basically a super shitty ORM that was a nightmare to maintain for a large codebase lol. I don' think there's anything wrong with it and for 90% of CRUD actions it's what you should choose. By the time you hit performance issues you've probably thought of something else anyway.


I am the same, the advice of "learn SQL" is still very important. Most ORMs have the escape hatch of just mapping results of a straight SQL query to an expected structure (thinking of typed languages like C#/Java here). 95% of the time I can just use the ORM to make simple queries in a easy to maintain way for your application. For the 5% of queries that are either complex or can't be easily represented via your ORM of choice, declare the expected type response and use that escape hatch to write your performant SQL.

I've been on projects where the team has re-invented an ORM organically, and I've never seen it go well. Likewise with projects that are ORM 'purest', bending themselves over backwards to use an ORM for a query that can't easily be represented by whatever query syntax it has.

This is why I still think 'lite' weight ORMs are the best of both worlds since they usually have a pretty good experience for common/easy queries but then when things get tricky, it is best to just use SQL and map to an app language data structure for the results. I mostly use OrmLite [0] in dotnet and have found it has a balance that done well by me for years (note I now work at ServiceStack who built OrmLite).

[0] https://docs.servicestack.net/ormlite/ormlite-apis#query-exa...


"95% of the time I can just use the ORM to make simple queries in a easy to maintain way for your application. For the 5% of queries that are either complex or can't be easily represented via your ORM of choice, declare the expected type response and use that escape hatch to write your performant SQL."

site creator here - yeah this is the approach i'm taking now. the ORM is useful for sure and there's still a benefit to using it, but anything that needs to read from a few different tables, i'm definitely going with raw SQL


I felt the same way until I found sqlx (https://github.com/launchbadge/sqlx). The package lets you write sql in your app with compile-time checks. Was so much easier to get up and running than learning a new ORM.


Need this for Go. Been considering writing a static checker for ages.


Yeah, I am old enough to have started back when ORM did not existed yet. World with actual tested ORM is way more maintennable, waaay more easy to deal with. You can write your own queries when you really need to ... but should do it only when you really really need to.

Writing queries is easy part. Maintaining that thing is the hard part.


I'm wired differently.

Even if I'd run a database operation just on a local machine and it takes 100x more load than an optimized SQL query, I still care. It's craftsmanship.

Sounds like you may have a case of the abstraction disease. You keep reaching for abstractions that never accomplish the actual benefits of a truly solid abstraction.


> I'm talking differences of a thousand fold in query load depending on how one expresses the ORM calls

That doesn’t sound like ORM… More like an N+1 problem. Eager-loading makes N+1 more likely with ORMs, but it’s easy to avoid when you know what to look for.

ORMs are designed to reduce querying, not increase it a thousand-fold :)


N+1 is a common problem with many ORMs, and a classic example of the object-relation impedance mismatch.

Even the notion of eager vs implied-lazy loading suggests N+1, just spread out over time. Granted that might be optimal for a whole lot of use cases! But it’s definitely not optimal for a use case where you need a join upfront and your ORM does it in memory.

Also granted many ORMs can handle this in a lot of general cases if you know how to use them, and know their limitations. But they’re inherently highly dynamic, and inevitably deoptimize for some cases where just querying the database directly will be much more effective. That’s not even an admonishment to “learn SQL” as is the common retort, it’s just the generic “abstractions are leaky and sometimes it’s better to bail out a layer or more”.


Upfront: I really dislike ORMs. They're an unnecessary abstraction and a performance headache as an extra layer of middleware to worry about keeping up to date. I'd never use one in production.

But we do always need a clean way to handle CRUD between apps and DBs, that ideally doesn't require custom calls for each and every data view. Here's the thing: Probably 90% of CRUD can be handled with generic updates and inserts. Raw JSON reads can be passed back to the client... let the client know how to cast or structure those to complex data types. The rest, that the server has to manipulate, can be cast / structured using the same classes the client uses, if in node. The actual meat of really complex reads or really optimized writes should always be exceptional and done by hand.


Right. In fact any ORM worth its salt will have methods specifically designed to let you address the N+1 problem easily.

E.G: django "select_related"

Not to mention they will will have ways to use raw SQL through the lib infrastructure, which is still better than querying SQL manually, and give you full control of the query.

Also wordpress shows that you can write manual SQL and still have terrible performances because your application is badly structured, which an ORM helps with.

In the end, I don't think there is less trap with raw SQL, the traps are just different. It's not a matter of "better".


It's hard to explain to developers why an ORM isn't really that great of a solution....but at a high level:

* your database is the core of your app. Abstraction implies that you don't care about the details of your data store, which eventually will lead to two major problems:

1. When things are slow you won't be able to debug it, because you're abstracted away your database and don't understand what queries your ORM is using to do things, and

2. when your ORM migration fails you won't know what to do, because you don't understand your database. In general this happens with every ORM product, ever, because you didn't put in a constraint initially because you didn't know what that meant and then you did later, which caused the migration to fail because your constraint is being violated. Or the migration fails because the tool uses some attribute and not others for drift detection, so it tries to delete your table/database because it's detected drift.

* the way you deal with data in code and in a database are different. Code iterates over objects. If you just use your ORM naively you'll end up doing some ridiculous number of selects because in code it looks like you're just iterating over objects...when in reality the ORM is doing a select/join for each one of those objects. And if you start getting more complicated there are structures that are just harder to do with ORMs...because you have to map what you're trying to do in SQL to the way your ORM works. At that point why not ditch your ORM?

SQL isn't really that hard...but thinking about how do queries is harder than you would expect. It's almost an order of magnitude faster to do joins in the database than to join stuff in-code...but that also assumes that your schema isn't screwed up, it has indexes, etc.

Lastly, a lot of ORMs don't put indexes on columns for some reason, which kills performance. You'd think that would be a detail that would be abstracted away for you at the ORM level, but it isn't. I mean it knows what you're using to select data, so it should auto-create indexes for you, right?


Ad2: Every single real world java project I have seen used hibernate (ORM) with flyway. I never ever seen any project in production where some underlying tech would be deleting tables autonomously.

Ad1: People do debug ORMs, actually. Including queries effectivity.


Ad2 -> I've run into lots of hibernate apps. They also tend to use Spring, and the team often gets lost trying to figure out which one is the cause of the problem. It's usually both.

Hibernate and Prisma migrations will delete tables if you allow them to.

Ad1 -> I'm sure they do...but usually they call in a DBA, who basically says "this is a disaster" then recommends that they refactor everything because their database schema is fucked up because it's based on some weird object hierarchy instead of RDBMS principle.


I dont find neither of your responses consistent with reality. First, database administrators do not debug ORM. Second, it does not take specialist to debug those, developers debug those. Third, recommendation is rarely "refactor everything". Usually it is adding an index, change the query or some other minor tweak. Fourth, "weird object hierarchy instead of RDBMS principle" does not even mean anything.

Fifth, spring has nothing to do with database whatsoever. Sixth, every production code I have ever seen used flyway.


I think you're mixing up someone who's decided to use an ORM rather than some hodgepodge of string concatenation with someone who has no idea what a database is, is unable to set it up effectively, and is unable for some reason to debug the sql emitted by ORM queries or use their ORM at all effectively to avoid N+1 problems. Not really a fair comparison.


> I mean it knows what you're using to select data, so it should auto-create indexes for you, right?

Curious, how does it know what you're using to select data?


Because you're using the ORM to do selects.

I'll give you an example: prisma has a "find/findMany." As part of the build process it should examine your usage and put indexes in the column that you're using in find/findMany

Indexes are an implementation detail - but to use RDBMSs successfully you need indexes. ORMs try to hide those implementation details, but without those kinds of details you get bad performance.

Prisma just fixed another problem it had with transaction isolation levels. To even understand the problem you need to understand SQL, how databases work, and how each database implements that. It's another important implementation detail that's abstracted away...until it bites you in the ass.

Prisma also finally allowed you to use database-specific types instead of just using varchar(191), which was its default for strings.

I'm picking on prisma because I happen to be using it right now. There are good things about it, like the way it does migrations and handles schema changes. But there are times its abstractions get in the way, like when you're trying to do aggregated group by (which is easier in SQL than it is in prisma). And its migrations often fail, and to understand why you really need to understand SQL.

Hibernate was worse, because it provides a transparent object-level backing store.


> As part of the build process it should examine your usage and put indexes in the column that you're using in find/findMany

This is not as easy as it sounds, though. Reliably inspecting source code is hard, and definitely not in the scope of an ORM. It also can't catch dynamically generated queries (think user configurable filters). It might be a good idea for a third party product that analyses your code and suggest adding indexes to models.

You know who knows best about your queries? It's the RDBMS. There are tools that can add/suggest indexes by searching your database logs for slow running queries.


Learn SQL and use the ORM is usually the correct answer. Let it print out the SQL queries and how long they take (ORMs do that out of the box), then check for indexes and other standard issues. If there’s some unusual complex query to be written, sure, write that by hand.


If writing SQL directly, what process do you use to update your queries during schema changes? Do you rely on a test suite to catch errors then update queries by hand? Are you using compile-time checks through libraries like sqlx [1]?

[1]: https://github.com/launchbadge/sqlx


Finally something that bucks my instinct that integration tests are a waste of time. If I’m deploying a SQL query, everything that gets in or out gets thorough automated testing. Regardless of your attitude on the subject, directly interfacing a database is business logic. Test it like you would anything else that interacts with an external dependency. If you made a schema change and it broke something, you’ll either get test failures, or learn where your coverage is incomplete, or learn where you don’t care about a particular failure mode.

Normally I’d prefer testing more isolated units, but if it’s your own data access, just write the integration tests. They’re at least implicitly part of your “unit” anyway.


I recommend taking a step back before making a backward-incompatible schema change. Do you really need this? Chances are, it's just faffing about out of a sense of neatness and orderliness and you don't actually need to break the schema.


Sorry to butt heads there, but what ORM does automatically handle schema changes? I have so many teams with highly abstracted ORMs telling me that no-downtime schema changes are impossible, no matter how trivial the changes are, or would be in e.g. Hibernate.

And the only team capable of zero-downtime schema changes uses a minimal DSL to SQL lib.


There are two parts to the schema change, the table changes (migrations) and the code changes. I think the person you were replying to was asking about the code part. Every ORM I've used or looked at[0] does that automatically in the sense that you make a change to a model and all queries generated by that model will now have that change. Depending on the change they could break, but the point is the changes propagate. If you're not using an ORM and are just using string templates, this won't happen. There are query builders, too, which are sort of in between.

[0] Django, SQLAlchemy, Active Record, Piccolo, Tortoise, SQLModel, peewee


^yes, that's indeed what I had in mind, and you've articulated the position very clearly.


Thanks, I’m glad! It’s a topic that pops up for me a lot. Related: https://news.ycombinator.com/item?id=26905647


I don't know of any ORM that handles schema changes automatically (edit: for the most strict definition of "automatically" -- I've used ORMs that generate migration files and am generally happy with them).

My experience here is quite limited, so I'm inclined to defer to yours. With that caveat, I find that ORMs play more nicely with my IDE (e.g. I can rename a column name and automatically update all call sites) and often raise early warnings if their declarative classes are out of sync with the underlying table schema.


The persistent[0] library can automatically generate and run migrations to match schema changes.

[0]: https://hackage.haskell.org/package/persistent-2.14.5.0/docs...


This problem seems foreign to me. Are your teaming not using Laravel? If not maybe it's time to upgrade. https://laravel.com/docs/10.x/eloquent


This is exactly why I find no solution fits all here. For me, I use an ORM as a catch-all and then for certain applications I manually write the queries. It's best of both worlds, and I know what components in my app have custom queries so I can test against them.


I'm happy to write my own SELECT queries, but it sucks to be manually schlepping row[0], row[1], etc. into the fields of my domain entities, or making sure I provide the parameters to the parameterized INSERT statement in the right order. ORMs usually let you drop down to SQL as desired, and I do that when I have a nontrivial query to express, but CRUD plumbing is nice.


Not sure which programming language you are using. For Java, I find Spring JDBC templates with Row Mappers more productive than ORMs. DataClassRowMapper will automatically populate the domain entity.


Go, Java language-level 8, Python. All have an approach, usually with some third party library, but the bone stock database drivers deal in field indexes. And people consider those libraries "ORM" whether or not they really are technically.


The core problem with ORMs is that you end up having to learn the ORM on top of learning SQL and the particular database. It ends up being like working with the database through an opaque indirect layer.


Better yet, learn stored procedures, then call them just like any API with a well established set of inputs and expected return typed outputs. You'll thank me later once you decide to split your DB from your app server.


It's been at least 12 years since I had a DB on the same server as the app or service that called it... I can't fathom why anyone would do that these days. But either way, the stored procedures would work just as well. The issue you're alluding to (I think) is the time to set up connections for a dozen calls between the backend and the DB if they're on separate servers. But that's actually less of a bottleneck now than it used to be, with persistent pooled connections and most things hosted in the same VPN zones. I'm not arguing for ORMs... but appside processing [edit: "joins"] can still be appropriate if you're dealing with a monolithic DB and don't want certain giant queries or procedures to block rapid reads. The point is more to understand your infrastructure, and ORMs obfuscate it to a point that makes you irresponsible as a developer in some cases.

Also, with lateral joins and windows in mysql 8 you can really have control over whether you prefer your execution plan to loop or scan. Which takes away most of the rest of the argument for appside processing.


In my limited experience, the best thing about ORM isn't the querying, but the mapping to objects. This is why I really like ActiveRecord's `find_by_sql`[1], which allows to write a raw SQL query and get an array of objects.

[1]: https://guides.rubyonrails.org/v3.2/active_record_querying.h...


> learn SQL properly. It's not that hard. Focus on it for a few weeks intensely and you've mastered it for life.

Exactly the same is true of ORMs. I find most of the people who advocate "just use SQL" have a bizarre aversion to applying the same learning effort to their ORM.

> I've had weekends ruined troubleshooting my "highly productive ORM layer" that nuked a production database. Whilst functionally speaking my ORM code was in no way incorrect. I'm talking differences of a thousand fold in query load depending on how one expresses the ORM calls.

That kind of problem happens with regular SQL all the time, and the tools to test/investigate it are less available.


> bizarre aversion to applying the same learning effort to their ORM

Oh, sure, instead of just learning SQL I'll just learn SQL, the ORM, the ORM's weird edge case features that actually support the SQL I want, the undocumented ORM internals that prevent the good query from actually being generated, and then I'll commit a patch to the open source project to fix the undocumented internals and shepherd a custom dependency for 6 months while it gets into a numbered release. Then I'll do it all over again when I switch stacks and have to learn a completely new ORM.

Or I could just use SQL.


> the ORM, the ORM's weird edge case features that actually support the SQL I want, the undocumented ORM internals that prevent the good query from actually being generated

ORMs are much less bad than SQL at that, IME. If the ORM generated a particular query there is usually documentation for why, often an option you can change. If the SQL engine decided not to use the right index for this query... tough, there's literally nothing you can do.


"SQL is worse" is a nonsensical comparison. The problems of ORMs add to the problems of SQL and compound the problems of SQL, they do not substitute the problems of SQL.

I'm glad you found an ORM you like, but it seems we have very different experiences with ORMs.


Depends. Sometimes the ORM knows about the pitfalls and works around them automatically. Same as how using a compile-to-JS language can often end up being much nicer than writing javascript yourself.


I would always use an ORM over raw SQL but this doesn't make any sense, your ORM is still going to run a query on the same DB engine. It is much harder to optimize an ORM query, but if you're only writing very simple operations you don't typically need to. The power of ORMs (at least light weight ones) is compile time types and parameters by default.


> If the SQL engine decided not to use the right index for this query... tough, there's literally nothing you can do.

MySQL Index Hints: https://dev.mysql.com/doc/refman/8.0/en/index-hints.html

MariaDB Index Hints: https://mariadb.com/kb/en/use-index/

PostgreSQL: No. More information: https://stackoverflow.com/questions/309786/how-do-i-force-po...

SQL Server INDEX hint: https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-tr...

Oracle INDEX Hint: https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_e...

Of course, knowing exactly what is wrong and what hint you need to use (whether the issue is the index in particular or something else) is also a bit of work to figure out.

I recall working on a project that had Oracle as the RDBMS of choice and there was this one query that took approx. 45 minutes to execute, which wasn't acceptable. Merely a SELECT that JOINed a few tables and checked the data with some EXISTS constructions and such. If memory serves me right, looking through the AWR reports and adding a single NO_EXPAND hint made the query execute in not 45 minutes but around 3 seconds.

Of course, ORMs don't necessarily solve the aforementioned types of problems either, since it's still SQL statements being executed under the hood and figuring out how to add hints to those might be a bit more challenging. I've had cases where you need to drop down to native query level when using ORMs for particular queries, because there were no elegant ways of doing it otherwise (apart from creating a DB view and mapping against that in some cases, for read only access).

I think both using ORMs and using SQL directly suck, just in their own unique ways. That said, I wouldn't outright dismiss either: depending on what you're doing, one or the other is going to be a good enough tool for the job. You occasionally also see some pretty interesting projects in the ORM space, which is nice.

For example, jOOQ allows you to use a type safe fluent API for constructing your queries: https://www.jooq.org/

Oh and MyBatis let's you generate your own SQL dynamically, which is an interesting approach: https://mybatis.org/mybatis-3/dynamic-sql.html


How do I map my complex join into a list of structs with embedded associations? I'd probably need to write a good bit of code just to unmarshal the data, right? How are you doing this every time you write a new query?


Can you give an example?


> have a bizarre aversion to applying the same learning effort to their ORM.

SQL is a lingua franca while an ORM is specific to a stack.

If you know that you will always, say, be using SQLAlchemy on Python, then learning it well might be a good investment, but I don't find learning multiple ORMs to be a good time investment over just learning SQL. Most every stack has good, lightweight query builders.


Yes, both SQL and ORMs can give you problems, but I have seen far more serious, hard to diagnose trouble from ORMs than straight SQL.

Learning straight SQL is far more transferable across databases than learning an ORM.

There are reasons to have ORMs, for e.g. to keep a large code base consistent. It's useful for keeping relatively simple queries consistent across an ERP system, stuff like that.

However, learning SQL is going to have a far higher payoff across your career.


> Learning straight SQL is far more transferable across databases than learning an ORM.

Not my experience. For example a recursive/tree query usually has one way to do it in an ORM, but three or four different ways to express it in SQL depending on the database, some of which seem like they're wilfully screwing you over (one popular database requires you to only write "WITH" and won't recognise "WITH RECURSIVE"; another requires you to write "WITH RECURSIVE", even though the error message it gives you when you only write "WITH" tells you that it knew exactly what you meant).


I’m not sure I understand the consistency argument… Do you have an example, even if it’s just a silly one?


The example is from a python ERP that I modified a few years ago. All the db interaction was done through an ORM and I thought that it was an OK strategy for keeping the SQL consistent and manageable.

There were other problems with the ORM approach -- mostly that it ran tens or hundreds of extra queries for relatively straightforward SQL queries. But the software was only used by a few people at a time, so the performance wasn't that big of a deal.


"That kind of problem happens with regular SQL all the time"

Well, no, it doesn't. I told you to learn SQL properly. When you do, you know what a query plan is. Which is something you'd rarely need if you know how to design a database and its indices.

All of this would be covered in every beginner's book.


I'm aware of what a query plan is. I can see that the query plan is wrong. There's nothing I can do about it, since I took the standard advice on HN and used PostgreSQL as my database.

Meanwhile if you'd applied the same "learning it properly" standard to your ORM you would never have had the problem you described.


An alternative to learning SQL is to use a non relational database, that has more of a impedance match to your host language to start with. (Eg XTDB or another Datalog DB in case of Clojure)


I'm pretty proficient with SQL, and I completely agree that you need to know it well if you use a relational DB for something that isn't trivial, but I still love using ActiveRecord. In most web applications I worked on I needed to manually optimise some queries, either massaging AR a bit or directly writing SQL when necessary, but I would say that's much less than 1% of all interactions with the DB. In all the other situations, using AR saved me a lot of time.


ORMs are great for avoiding boilerplate and adding some typesafety. But once you start doing complicated things like JOINs, it's better to drop into raw SQL.

> Listen to this old man's advise: learn SQL properly. It's not that hard.

I couldn't agree more with the first half, or disagree more with the second :) Once you're operating at scale, it takes a lot of fine-tuning, know-how, experimentation, and reading docs to get things moving efficiently.


What are some examples of fine-tuning or trial-and-error you might need at scale?


Postgres has a ton of parameters that can greatly affect performance. I've used https://pgtune.leopard.in.ua/ to help w/ that.

In terms of schemas, where you put your indexes matters a TON. Indexes make reads faster and writes slower (to oversimplify).

Figuring out where to break out a separate table vs keep everything in one table is also a common issue.


True, all of these would still be an issue with ORMs.


Honestly, ORM is an unnecessary layer most of the time.

You can have database schemas, and you can have software schemas, and they don't necessarily need to coincide as long as your database schema allows efficient queries. The rest of your relational-manager-logic can be embedded in the post-processing methods you apply to the query results.


What do you think of "micro ORMs" like knex.js, or Dapper for C#? I personally agree with you and my personal projects just use stored procedures, but these micro orms have always piqued my interest since they're just a SQL abstraction for building single queries.


I would consider those "query builders" rather than fully fledged ORMs.


Yeah no. A micro ORM still expects you to write the full, raw SQL.

The raison d'etre for a micro ORM is to capture the results of that SQL query into a simple, flat collection of objects in a way that has a better UX than your framework's default behavior.

Micro ORMs are the perfect middle ground for debates like these.


I personally like those. They are qualitatively different from ORMs. They give you three things:

- sensible ways of composing queries beyond direct string concatenation

- safe parametrization

- return results into usable data structures (not objects)

None of these things carry the problems of ORMs. They don’t impose a paradigm that isn’t SQL. They simply let you interface with SQL in an ergonomic manner.


This is typically as far as I go with an ORM, CRUD operations on single tables that give you some type safety at compile time. If you need more complex data shapes I usually create a view or occasional stored procedure but that's about it. I use Entity Framework in my day job and do like the migrations though...


These days I build applications that actually use SQL.

Typically a single statement to get the job done for any query.


who you callin old man i'm still in my 20s!


One foot in the grave. Practically a fossil.


> when the site first got a surge of users from hacker news, there was one poster in particular who came to the site, registered a bunch of offensive, racist usernames and proceeded to post and create threads that were just full of dumb slurs. this was definitely a learning experience because i had to act quickly, so i tried a bunch of different methods to get rid of him.

it's sad that people like this exist in the world. what could possibly motivate someone to spend their time doing this?


I dealt with a similar basket case once.

It seems like parasocial relationships can swing both ways. You know how some fans develop a creepy, obsessive sort of love for creators? Well, the same goes for hatred. They feel slighted by that person that doesn't know them, and they retaliate from behind their keyboard.


The thing that really surprised me was that even when he implemented IP blocking, the user used a VPN to continue abusing the site. That's a step beyond "casual" trolling that someone might do to test a site's security (not that this is justifiable behavior) and enters the territory of targeted harassment.


I don't find it surprising. This happens in every project I run. It's almost a guarantee that when you ban someone they will come back with a new account and when you ban their IP they start using VPNs. It's such a common occurrence and they get more aggressive for every step where they consider it a victory when you manually take action against them. I often need to resort to banning whole IP ranges.


Richard Bartle, has this way of dividing up the way what he calls "players" but it works for any social media[1]. One of them is the "killer" with a sub-type of "griefer" which are those whose "... vague aim is to get a big, bad reputation". So, from that perspective, they actually do get something out of it.

[1]https://en.wikipedia.org/wiki/Bartle_taxonomy_of_player_type...


Griefers. Some people get enjoyment from hurting/annoying others.

First time I bumped into this was in the DAOC MMORPG. Some players were deliberately annoying others, to the point of "wasting" their online time doing stupid shit that annoyed people. It really shocked me.

Thing is, there's lots of griefers in any online game. Which means there's lots of people out there in the real world who would do this if they could get away with it. Anonymity allows them to do this online with very little repercussion, so it shows how many there really are. But now every time I do an interview, or look at a rental, I'm thinking "is this person a griefer? are they going to enjoy making my life a misery?"


> I said that's life

> (That's life)

> And as funny as it may seem

> Some people get their kicks

> Stomping on a dream

> But I don't let it, let it get me down

> Cause this fine old world, it keeps spinnin' around

- That’s Life

By Dean Kay and Kelly Gordon. Most famously recorded by Frank Sinatra.



> what could possibly motivate someone to spend their time doing this?

Probably being between the ages of 10-14 years old. Bartle Killer-explorer?


The hardcore persistent trolls making it their life's mission to ruin my site have always been 20+ after I've managed to identify them. Kid trolls just spam, get banned and move on.


Attention seeking + lack of moral compass.


A lot of them are edgy underage children and don't know any better. They think that they have a dark sense of humor but really they've lived a life disconnected from those words, so they like the idea of pushing others buttons at no cost to their selfish existence.


not only attention seeking, sometimes the motivation is "to spread the truth", at least as they perceive it, and sometimes it's people who get triggered and are not able to stop their rants


Had a similar experience recently. A random person started repeatedly filling out contact forms for one of our clients. They were doing it manually, and they did it for several days straight until finally blocked.

It also left me wondering why that person would spend an hour or two each day for several days in a row, filling out online forms. What’s the motivation?

My suspicion is that the person doing it works at the company and was trying to mess with their systems. But I’ll never know for sure.


I spent some time working in a prison, it was very eye opening for someone like me from a pretty sheltered background. Most of the inmates were decent enough people that made bad choices but there were a few who were just horrible sadistic bullies.


Whitehat moderation tester


Mental illness


i'm more sad to see founders repeatedly put out ugc-oriented software or communities without any design priority for safety.

safety is the hard part of ugc products but is left to figure out after scale and ossification.


How would you plan for 'safety' in UGC-oriented software? I'm actually interested in this.


I have a lot of thoughts on this that I plan to start writing on under indiedevstack.com before long, for now you can follow my one biz social where I will promote it once ready @manabiSRS

But some quick references... One approach is to avoid algorithmic surfacing of UGC outside of one's own network (or secondary connections etc), which makes discovery harder (must be compensated in other ways). Twitter may explore similar ideas with pluggable algorithms (though I don't trust them). Another approach is to constrain UGC: eg a music/audio community which has barriers to sharing new audio outside your network, but which freely allows remixing and promoting remixed audio of known good audio without as much safety control over the remixes because the operations allowed on the "good" source material make it difficult to subvert. This kind of idea is at the core of a product concept, not an additional layer to tack on later. I believe that finding these kind of cheaper ways of managing UGC and lending discovery to UGC can be a huge competitive edge.


Hey man, thanks. Really interesting. I've added indiedevstack.com to my bookmarks, so I'll check in that once you've launched


Cheers, good luck with your own work


competitor obviously


"Is my girlfriend pregante?"


Alcohol.


> some users have suggested pretty smart features that i've since implemented, like this back-to-top button to quickly get back to the top of the page

To me all position:fixed elements (headers, footers, this back-to-top button, etc) feel like a kind of annoying dirt on the screen. Their absence is a big part of why I love the web 1.0 aesthetic.


My god yes. There is literally a key on a full keyboard dedicated to this function (Home), smaller keyboards have a chord for it (like Fn-Up), and "mobile" touchscreen UI has a global mechanism for this (on iOS: touching the status bar)... to plop down a position:fixed button on top of the content to make it even easier to access this feature that is already extremely easy to access is just gratuitous.


Does Android have this? I had an Android phone for some time and the thing I missed most was the back-to-top shortcut


Android doesn’t have this, and I missed it at first as well, but the scroll acceleration is so fast (much faster than on iOS) that you can nevertheless scroll to the top very quickly.


On MacOS Fn+left arrow = back to top, fn+up arrow = up one window.


I have this in a bookmarklet:

https://github.com/t-mart/kill-sticky


Me too, but I still added a table of contents button on my long, structured articles. It's very helpful in my opinion.


Wikipedia does this now, and I find it annoying, in particular the changing “current section” highlighting, and the fact that it hides when the browser window is a bit narrower. I’d rather press Home to get to the TOC again when it scrolled off.


Agreed, seeing something changing out of the corner of my eye is very off-putting.


Hm, I can see where the criticism comes from. Is there a way to keep this feature that would make it more acceptable?


Personally I don’t think it’s worth the benefits over just having a TOC at the top of the page, as is otherwise customary. It’s different if this is a web site or web app where you have an account, and where users can permanently enable/disable it as an option when they prefer it. But on a public web page for a general audience, it always introduces friction by being visually distracting (because it doesn‘t scroll with the rest of the page), or by hiding due to responsive layout (requiring the browser width to be adjusted, or having to toggle it by mouse instead of scrolling to the top by keyboard), and so on.


Fair point. I personally end up using those a lot, especially in articles that I skim (instead of reading end to end).

I'll start collecting data on its use, because people on the orangey site (including me) tend to have opinions that don't represent the average user.


I don't see it on my phone. Do you mean the desktop table of contents? I have one of those too. I use it all the time on other websites.

I think it's a really good feature, but the Wikipedia implementation needs work.


Yes, there are numerous automated and human-powered nuisance traffic streams.

1. CMS sites are constant maintenance, as most are an endless supply of issues. However, some have content caching to reduce the SQL workload.

2. Delayed registration with CAPTCHA and a brief explanation of why you are there. Quiet banning IP filter applied to list to boot pending users who enter emails that bonce or fail to authenticate.

3. Firewall blacklist areas of the world where you don't do business (better yet, whitelist the ISPs in the regions you do business), blacklist proxy/tor/spam IP ranges, add port tripwires, and setup rate limited traffic per IP (see slow loris mitigation methods if you are not using nginx).

4. add peer site content blocker for forum spammers/bots i.e. share exploit probes preemptively with the rest of the net.

5. add email filter for mention of bitcoin/BTC, and black-hole the entire IP block if in an irrelevant region.

6. lookup same-origin enforcement for your web-server, add Subresource Integrity Hash to your core, and re-scale/watermark/scrub all media to protect users from themselves.

7. fail2ban rules for common site security scanners, known exploit attempts, and common email scams.

You owe nonpaying users nothing, so the collateral cost of blanket bans is $0 in hostile regions. Remote traffic monitoring is also recommended if you have a game engine running.

On day 2 we can look at how BTC tumblers/launderers fund most of these issues, and whether it is OK to also preemptively blanket-ban most cloud/hosting providers (costs under 7% of your users in most cases). Remember, adversaries will often pretend to be from wherever they wish to inflict harm, and time does not have an associated cost in the 3rd world.

Have a gloriously wonderful day =)


Hey there! Thanks for your comment. What do you specifically mean by rescale/watermark/scrub content to protect users from themselves?


This is done mainly to strip off meta/EXIF/GPS Data, and mitigate other format violations often used to cause "trouble". Resampling/transcoding also tends to corrupt steganography utilities, thrash user hardware thumbprints, and standardize the web experience by excluding unsafe formats in favor of well-tested efficient traditional codecs.

Specifics preclude the multifaceted nature of the policy.

https://www.youtube.com/watch?v=cJMwBwFj5nQ

Happy computing =)


I'm surprised that they're surprised about trolls. As someone who's been doing PC gaming a long time, I always assume there'll be people who just want to ruin things for everyone else. What they're attracted to is popularity, so the better you do, the more you'll have to deal with them (they tend to grow slightly superlinearly relative to overall user growth).

This is basically every game or internet forum that acquires even a little popularity: there will be some (few) people who just wanna ruin everything, and I'm always surprised by how many people are surprised by this even when they're the technically literate sort.

For example, some Japanese fighting game devs still try to count disconnects during a match as different from losses for someone's record. One guess as to what this encourages as far as player behavior goes.


Man, on the abuse front—it’s amazing the lengths that people will go through to put spam on the web. There are apparently canned solutions for pushing stuff to any Mediawiki site, although I found that a really stupid captcha¹ was enough to bring that down almost to zero, but early on with rejectionwiki, I had the same sort of chronic abuser things happening that are described in the article.

1. Basically a set of really obvious questions, like “Who wrote Hamlet?” and what’s “Shakespeare’s first name?” that any writer (for whom the site is targeted) should be able to answer.


> Who wrote Hamlet?

Sir Francis Bacon :smirk:


> that any writer (for whom the site is targeted) should be able to answer

hey now, not all writers who discuss things in English are necessarily familiar with the anglo literary tradition

(it's probably a higher overlap than average, and shouldn't be too hard for them to search, but be careful throwing the assumption around)


(Ooh, three-em dash. Fancy! I use two-em dash very occasionally in prose, and —⁂— as a section and footnote divider, but haven’t found anywhere I really wanted to use three-em dash, though I did still add a Compose key binding for it.)


> Basically a set of really obvious questions, like “Who wrote Hamlet?” and what’s “Shakespeare’s first name?” that any writer (for whom the site is targeted) should be able to answer.

Given that ChatGPT exists now, I assume these questions will need to be replaced with something harder to automate.


ChatGPT would be an extremely expensive way to answer these questions.


At MOST that is a 100 token prompt/response, so that is like $0.0002 to answer with gpt-turbo. Hardly going to break the bank...


Really? I thought these types of questions would be limited. They seem hand-crafted.

If you cache the answers you're probably looking at 10 queries or so until the site admin gives up on that idea and tries something different


Given that the spammers seem to use some sort of canned software, it might have been enough to figure out how to change one or two internal URLs in MediaWiki, actually.


Yep. I think writing code is quite difficult for most spammers. They often depend on hacked together scripts and things other people have written. Making those obvious scripts fail can make a massive difference. I suppose people who can write good code can usually make a lot more money by getting a real programming job.

Years ago I heard of a simple anti-spam technique where you add extra form fields to a web form. Then use CSS to make those fields invisible. Put a check in your backend where if you see any content in those form fields, you respond with 200 OK but ignore the request.

The programmer in me can immediately think of 10 ways to get around that - the most obvious being to fill in spam using a real web browser, automated via webdriver or something. But apparently that one trick removed ~95% of spam on their site.


>listen to your users. they might have better ideas than you!

So true. My products have improved greatly from listening to (some!) user feedback.


And if you're going to do it, make it easy. Many sites have missed out on my thoughtful (not to say I'm 'right', but that even if they don't want to do it it's considered and reasoned) feedback simply because the process although offered requires a login (i.e. sign-up) or because the last time I tried I got some dumb email back thanking me for my question it's not really possible but I can work around it by (x y z that I already said would be easier if w) and do get in touch if I have any more (sic) questions.

If you're going to solicit feedback, just dump it in an ideas bucket, no need to reply, certainly don't funnel it through the support channel for bugs/questions.


I always reply to feedback. Seems only polite.


Certainly, I've had some great even more thoughtful/in-depth replies too, it's just the ~zero effort template support-channel 'thanks for your question'/'unfortunately that's not possible today' type response - I know, that's why I gave the feedback; all it does is tell me it didn't get anywhere near the right person, which is a souring experience (and waste of my time & effort) and makes me wish there just hadn't been a feedback button to begin.


If you can recognize good feedback and manage expectations well.

From a user’s perspective there is typically a tipping point for a thing that becomes popular enough where user feedback becomes useless, superficial, lowest common denominator crap, which doesn’t understand the value prop, the quality standards and the implications of change vs stability.

I believe this type of feedback often bubbles up for similar reasons bikeshedding can become a problem, which is then perpetuating through social media.

At this point one needs a filter.


Users that give great feedback also generally give great feedback in the future.


hey i'm the author of this site! going through comments now, but looks like i still need some work to do because all this traffic caused some slowdown :(. Looks like adding a caching layer is next.


> someone is going to abuse your site

Would gating access with Google Sign-in, or Facebook sign-in, etc, be sufficient for rate limiting bad actors?


Google and Facebook both have too much data about me already, why would I voluntarily give them more? This isn't probably that common of an attitude, though.


> this worked for a little bit longer, but he proceeded to get on a VPN, and then another when i blocked that IP, then another when i blocked that IP, etc, etc.

Beyond VPNs, I've even seen attackers leverage residential IP networks which makes VPN detection ineffective as well [1]. If you ever need a more permanent identifier to ban users on, consider using a device/browser fingerprinting tool [2]. It helps avoid the whack-a-mole issue of more sophisticated attackers churning IPs/emails/user agents/etc.

[1] https://brightdata.com/proxy-types/residential-proxies [2] https://stytch.com/products/device-fingerprinting (I'm admittedly biased towards our solution as I work at Stytch)


Although difficult and not well known about, fingerprinting can be randomized[1]. I have been successful creating a random fingerprint on only Brave so far, but I did need to tweak some browser settings.

1 check your fingerprint details here: https://coveryourtracks.eff.org/


Are you paying for Sentry? What type of monitoring does it provide? I'm working on a project I'd like to add some monitoring so I'm on the lookout for a good solution. Looking for something free though until there's a need to have better insight than I can get without paying for it


I'm using the Sentry free tier and it is great. The main value is how to managed repeated errors. You can group different exceptions for the "same" error, resolve errors that have been fixed or ignore known errors until they occur to more users, more often or what have you.

If you are good about squashing errors you can make it very far on the free plan. Plus they have some burst detection built in. Just make sure that "expected" errors aren't just ignored in the UI, stop emitting them in the app itself so they don't count towards your quota (and it keeps your logs tidy).

I haven't been using their tracing or anything because their Rust SDK doesn't seem to support it despite claiming that it does (or I have set it up wrong).


Thank you for elaborating, this is useful :)


My team got really far using something called GlitchTip. It's compatible with the Sentry SDKs, but really cheap, so it felt like there were minimal consequences to switch if we ever had the need to. We only switched to Sentry cause our company was acquired and they already had a Sentry subscription going.


Hadn't heard of GlitchTip, but will look into it. Thanks!


You can self-host Sentry in a docker container. It's pretty easy to get running, and essentially free.


i am not! right now, i'm mostly just using it for performance monioring - how many MS are my queries taking and how long are my first-contentful paints on the front-end

also for errors from the BE and FE


The best way to deal with point one – ghosting. Feeling sorry to see 3 greatly ineffective, and 1 desperate method to get rid of the abusive spammer. I had experience with the community-driven portal, and trust me - the best way to exhaust the spammer - hide his posts from others - don't give him a single clue you acted on him, just let him continue posting in a special vacuum prison crafted exactly for him.


Currently I'm using a code method, where registration requires a "code". I share this code lightly, and can change it at any time. Plus there's a re-captcha in both register and login pages to annoy abusers a bit more.


I have a site that will likely need a denylist for usernames. Do you have any resources on implementing that? I mean, it sounds obvious how to do it, but if it's already been done, I'd rather just have a list to work from.


the denylist on the site is actually pretty quite simple. for now, it's just a list of chunks of words and the form validation will just compare the username against that list, where each chunk in the list if used as a regexp


You could also consider shadow banning these users. If you're immediately preventing them from creating accounts with certain usernames, they'll probably just get more creative with the profanity or save it for their posts.


Hey hey! Been loving your forum so far. It's been great chatting with folks. Hoping for many years to come.


A good way to combat abuse is to not feed the trolls, don't engage beyond a warning or two, simply delete and definitely don't argue publicly.


AKA blacklists are useful.


Blacklist please.

Blocklist if you absolutely have to.

"denylist" is an abomination.

Oh I see, a goon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: