Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Shell should only be used for small utilities or simple wrapper scripts.

Oh you mean we weren’t supposed to write an etl solution in bash?



90% of the time ETL is trivial and not system-critical and can be done with a 3 line bash script.

And in the other 10% a simple typo can cost your company millions of dollars.


If it wasn’t a toy project in which the transformation was elementary then no, you probably shouldn’t have. If it went well for your company in case it wasn’t a toy project then I would say that it was despite bash, not thanks to bash.


I’ve actually seen this happen. It’s terrifying.


You don't know true terror.

I've seen an "ETL" solution (minus the "T") written entirely in SQL but using XML instead of tables and columns. After traveling through multiple servers via OpenQuery it eventually gets loaded into a domain-specific piece. That piece passes the data into a frontend that can only run in quirks mode.

That frontend has 60-80 separate projects that all parse out basically the same data but from different fields and with different parsing/mapping logic. There is no ability to load shared libraries or otherwise de-duplicate code and no authorization to call out to any internal web services that could be written to handle it. Nor is there any form of templating available, it's all static files with no nesting of folders allowed. And the whole thing is managed in a domain-specific editor that effectively precludes the use of version control thanks to the way it stores files.

Two people are responsible for managing that frontend piece and more.


They should have fired the author/s of this abomination much earlier instead of allowing that thing to evolve up to that size.


> They should have fired the author/s of this abomination much earlier instead of allowing that thing to evolve up to that size.

The main author of the SQL side and original author of the frontend has been promoted to the top because the system is critical to the business needs, and since everyone else struggles to work with this monstrosity they must be a pretty great developer. Can't afford to lose them.

The frontend team have threatened to quit several times unless things get better. They've been demanding version control, CI, automated deployments, a shared library, and to not use the domain-specific editor piece that prevents all this.

Management (including (or because of?) the OG author) is convinced that they're pushing for these things because of job security and refuse to let them overcomplicate things. But every time they they threaten to quit they get pretty substantial raises because no one else is willing/able to deal with the mess they were left with.

If you ask me, no amount of money is worth that environment, but all parties involved still work there...


Sounds very much like IBM AS400


I’m not sure when you worked on AS/400, but from what I remember there was no xml involved at all, although you had to deal with DB2 and the almost comical table and column names...


I meant the difficulty in storing source files in version control, the data input separated to 60-70 projects with very little code shared between them and the number of people responsible for understanding the system being very low, sounds very much like an AS400+rpg environment created for parsing XML files.


Sure, but not as terrifying as people rewriting shell one-liners in python or (gasp) java.


Then a lot of people don’t understand the limits of what they’re doing.

For instance piping though commands works fine until one character sequence is interpreted as EOF. The fun part is it will work most of the time, and when it fails nobody will understand why (“we didn’t touch anything”), and rewriting the thing will be a political nightmare (“it was working before and was only 3 lines, why you need so much time to redo it ?”)

I think most people (me included) don’t do enough shell programming to really know the trade-off of the one-liner vs doing it in groovy, so the latter becomes the safer option (rightly so IMO)


one character sequence is interpreted as EOF

I know this is just an example, but how would that happen?


You can hit EOF or EOT in a stream of UTF-8 or other multi-byte encoding as part of a character. There must be ways to have it handled through something that understands character encodings and workaround the issue, but I know that dumb shell piping will fumble on those.


    % printf '\x04\necho EOT is not EOF in a pipe.\n' | bash
    bash: line 1: $'\004': command not found
    EOT is not EOF in a pipe.
    %
Pipes do not contain line disciplines and thus do not do special character processing.


If you can do it as a one-liner in Bash, then you're not using the JVM. The benefit of using Apache Groovy for scripting is access to the JVM (and of course defining Gradle builds). If you don't need the JVM, then do it in Python or Ruby.


I have been there, porting ksh scripts to Java.

As much as I might like Java, this surely did not make any sense, but when people want to pay you for doing it, oh well.


I hope there was a reason for rewriting them.. Anyway Java is for sure NOT a scripting language. F# is much much better from that point of view, with the right tools, but sadly it is not famous enough.


Well, rewriting one liners is kind of stupid, but it's probably not going to result in a nearly incomprehensible mess running in production.


As have I. It actually worked for a one time migration, but it required a lot of handholding.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: