- If you find you need to use arrays for anything more than assignment of ${PIPESTATUS}, you should use Python.
- If you are writing a script that is more than 100 lines long, you should probably be writing it in Python instead. Bear in mind that scripts grow. Rewrite your script in another language early to avoid a time-consuming rewrite at a later date.
Python seems to handle complexity better, but only on the surface. (Basically Python has nice data structures, great string formatting, and .. that's it.)
Setting up the Python environment requires something, because Python comes in many flavors (2 vs 3, system installed, user installed, /usr/local) and there are a bunch of factors (pip, pyenv, pipenv, PYTHONPATH, current directory with modules, requirement for python-dev headers, and GCC to compile native parts of packages, and then of course maybe extra shared libraries too) that can affect the "script".
Yet Python provides no way to sanity check itself. Mypy is wonderful, but doesn't really help with scripts, where the enemy is raised exceptions, not simple type errors.
And Python lacks efficient, quick and dirty process control. Sure, there are nice packages for handling subprocesses, but that's really a lot more batteries-included in Bash.
Do I like Bash? No, but debugging a bash script is -x, even the most vile many thousand lines of Bash will eventually stop somewhere with a clean-ish error or finish running.
I have no great recommendation for alternative (maybe Ammonite, maybe scripting in Rust), but I'm interested in what people think about this.
>Python seems to handle complexity better, but only on the surface. (Basically Python has nice data structures, great string formatting, and .. that's it.)
It's a complete scripting language, with tons of features, from a comprehensive standard library with something for most needs, to a packaging system, isolated environments, async io, and more. And lots of expressivity in structuring your program (including optional types).
So, not sure what the above "and that's it" means. That it's not Haskell? Yes, it's not.
>Setting up the Python environment requires something, because Python comes in many flavors (2 vs 3, system installed, user installed, /usr/local) and there are a bunch of factors (pip, pyenv, pipenv, PYTHONPATH, current directory with modules, requirement for python-dev headers, and GCC to compile native parts of packages, and then of course maybe extra shared libraries too) that can affect the "script".
If you're doing dev-ops with such scripts, you already need to handle the environment in general.
> not sure what the above "and that's it" means. That it's not Haskell? Yes, it's not.
That scripting in Python is pretty bad. It makes thing harder and at the same time you still have to think a lot about what can go wrong, because there's no enforced error handling.
Yeah, even Java doesn't force you to handle errors. Please, calm down. "Scripting" in a compiled language is not practical. People use the word "script" for a reason - it's because they don't want to deal with a language that forces them to think about everything that could go wrong. For some tasks that's perfectly sensible. For other tasks, there is Rust, Haskell and many others.
Checked exceptions are nice. You can use them if you want to make them part of your API surface.
> "Scripting" in a compiled language is not practical.
The premise is that scripts grow, and you should switch to Python. Which might be okay for Google, but seems to be only a very small step up from Bash, if any at all. (I've outlined the reasons, why I think it's not worth it.)
I do a lot of scripting, in Bash, and when that stars to grow on me, it's time to think about the problem for real. And usually a blind rewrite in Python is not the way to go.
> For some tasks that's perfectly sensible.
Those are the one-off scripts, no? Anything that should run with some regularity, anything that should be reproducible, that handles valuable data, is not something that should be cobbled together in bash/python and thrown into /etc/cron.d . Or apparently that's how Google rolls.
Yeah, well, maybe not everyone is an SRE, so the use case here seems to be build scripts and other not-so-critical scripts, that still have to be maintained, where knowledge of Bash/POSIX-sh is not really assumed, it should just work for anyone who checks out the repo.
It's really not about data structure or anything like that. The big problem with any large shell script is that it's utterly difficult to do proper error handling. Usually the best you can do is check return values to detect an error and 'exit' immediately, hoping that your "trap" works correctly to clean up behind you.
>Python lacks efficient, quick and dirty process control.
Yeah, quick and dirty, that's kind of the problem really. It's great for 20-line scripts but it turns into a huge mess for larger projects unless you enforce super strict guidelines and everybody has their doctorate in bash.
Python, perl and friends make it harder to spawn 3rd party programs and pipe them together for instance, but it also makes it massively easier to handle errors and give decent diagnostics to the user.
I don't have more faith in Python's except than in Bash's trap.
If you use 3rd party code or a subprocess, then both are pretty weak compared to something kernel enforced.
> makes it massively easier to handle errors and give decent diagnostics to the user.
I don't really find that. You have to fight for input/output for subprocesses in Python.
Sure, logging is easier in Python/perl/PHP/whatever, than in Bash, but error handling is about the same. You have to do everything manually, and there's no compiler, nor type (or other magical) system, that'd warn you if you have missed something.
Let's be clear, I'm not a huge fan of python or exception-based error handling either but I'll take it any day over... whatever you want to call shell ad-hoc error handling or lack thereof.
Trap is not like an exception, it's more like "atexit" or a signal handler. You can't really nest them, you can't really recover from them. So I definitely disagree that error handling is about the same, python's a lot better.
>there's no compiler, nor type (or other magical) system, that'd warn you if you have missed something.
Well that's the whole static vs. dynamic debate but that's a different story. Use C# or Rust or whatever you feel like if you really can't stand python or scheme. Maybe there's a niche to be filled for statically typed scripting languages, there are no popular ones that I'm aware of.
Still, that's besides the point, while I personally prefer static typing like you there's really no contest between dynamic typing and "everything is a string and you can only concatenate them" typing.
You are aware that there's no compiler, and that you have to write the try-except-finally?
I'm talking about how weak guarantees Python gives compared to Bash about the correctness of the program. It's easy to get None-s, it's too easy to forget to handle file not found or lack of permission.
For most scripts, handling file not found isn't gonna happen anyway. Often the result is just printing to standard out "hey I couldn't find file X" then bailing. An uncaught exception works well enough for this in many cases, such as build scripts.
Same goes for permissions issues and a whole slew of similar scenarios. For many use cases, the lack of forced error checking is a feature.
Concrete example: build scripts. As long as the cause of the build failure is clear from the uncaught exception, there's no reason to handle it. I don't need a friendly UI to avoid scaring off non technical users, I just need something that handles the happy path well and crashes with a meaningful stack trace when something unexpected happens.
I always add ‘set -ex’ to the top of my bash scripts to exit on any non zero top level return code and to print out all statements being run including expanded variables or inner statements.
You’ve hit most of it. The desire is a lowest common denominator. Python, Ruby, etc would be great except that in some of the places there arent current versions and beyond the most mundane you end up wanting 3rd party packages, which complicates your packaging or removes that lowest common denominator property.
What it begs is for a new tool. A single file, statically linked “super shell” that lets you do structured scripting and has builtins for a bunch of common things (reading in addresses, daemonizing processes, reading the process table, etc. and that tool needs to be build able on nearly everything. Now I like rust and the attention it is getting and the interest in safety and correctness. I can’t think of many places where I’d reach for bash but be happier writing rust, a lot of it is 15minute disposable programming jobs. At a glance, I think I’d trade 5minutes of set -e to find some error I skipped for 45 minutes of trying to rejigger things to properly borrow some shared value.
Tcl meets all of these requirements. I have a Linux distribution whose PID 1 is a Tcl shell and I can setup everything from there without spawning any new processes.
Yeah, but Tcl is old and quite crufty. The docs are a mess, everything is a string, a lot of concepts it uses are not very familiar and/or mainstream (upvar?).
A modern take on Tcl is what we need, in my opinion.
Surely we should be able to build something nicer 30 years after the first release of Tcl.
I agree that Tcl isn't mainstream and thus a lot of concepts used by Tcl are not mainstream, but other than that it's got a lot going for it.
Tcl's string representation for objects is just a serialization/deserialization mechanism, which seems to be pretty popular in other languages as well.
Additionally all of the cool things in Tcl such as coroutines, threads (as an additional, but included, package), great documentation delivered as man pages, virtual filesystem access, virtual time, safe interpreters for running untrusted code, a stable ABI (stubs) so that compiled extensions can be used for decades, a small memory as well as disk footprint, extremely cross-platform, easy to embed into larger programs as an extension language, easy to compile a large static program with Tcl as the "main" entrypoint, native BigNum support, many thousands of great packages, ....
What would a modern take on Tcl improve on Tcl that Tcl couldn't just build easier ?
The modern take on Tcl arrived almost 20 years ago, when "everything is a string" turned into "everything is a typed object with a string representation". The docs are clear and complete, and the concepts it uses make it a powerful and elegant tool.
Has something changed suddenly in the last 5 years? Yes, the docs had a decent API reference but everything slightly higher up than "what parameters this command has" was on that wiki with comments from 2004 marked as out of date by out of date comments from 2007...
Not really, it's really used to setup the environment for a network boot, it supports VLANs and network configuration and loading modules and stuff (even from Tcl virtual filesystems), it's basically just Tcl+TUAPI[1]
I thought of Rust because of the strict typing. If something TypeScript-like could compile to Bash, that'd be amazing. (Maybe there's something like that already.)
But now, to think of it, just using Node + TS (+ yarn) sounds better than Python.
Its possible to write 2-3 compatible Python code. Furthermore, for basic "scripting" its reasonable to assume no external dependencies, so no need to worry about packaging, pip, pipenv, etc.
The major disadvantage of any "comfortable" language aside from Python is that they aren't installed by default on nearly every distribution of every OS.
This is why, for example, Ansible only requires Python 2.6 on managed nodes, allowing literally zero installation on almost any host you want to manage.
If you already have a good mechanism for packaging and distributing software to all your machines (which I think is a reasonable expectation for Google) then go ahead and use whatever language you're well equipped for. But know you'll be sacrificing the easy and wide distribution that using Bash or Python brings you.
> This is why, for example, Ansible only requires Python 2.6 on managed nodes, allowing literally zero installation on almost any host you want to manage.
Yes, Ansible transfers everything. It's ugly and slow :(
I pretty much gave up on [vanilla] OpenStack because their idea of DevOps is custom python script generated Ansible plays.
Re. a recommended alternative to quick and dirty bash scripts, do you think golang would do? I've barely dabbled with the language, but its default packaging (a single, static binary) and build system (`go build x`) seem well suited to rapid setup, deployment and testing, which is presumably what you'd want in this scenario.
Many people use go for such things. I simply don't like go, it's too crude (too C), and the packaging system is. Well, it's not Cargo.
But probably for deploy scripts, go with a static binary (hosted on, let's say your GitLab instance - using the GitLab HTTP API with a token to wget/curl it) is nigh unbeatable in end-to-end development and deployment time.
Aah, makes. I'm not a huge fan of it for similar reasons, but the appeal of a language that's quick, dirty, and relatively acceptable by coworkers is strong.
I had some success with Node.js (though it was in a Node-based project to begin with). I haven't tried piping commands, but to run a bunch of commands and then have all the logic in JavaScript it's quite convenient.
Here's an example that builds a React Native app, while updating version numbers and Git tags:
My main problem with scripting in Node vs python is the event loop gets in your way when just trying to write a quick and dirty sequence of steps. Sure you can use async await but now you are transpiling your script.. seems easier to just bang it out in python.
I love Node but I use it for backend where the event-loop model is useful.
Personally I've found myself writing lots of simple command line utilities in Go. YMMV but the standard library handles file IO, network stuff, running external commands and a bunch of standard encodings really well. It also has static typing, and awesome concurrency features.
It's obviously not perfect, but it has replaced lots of Bash scripts for me.
I'm one of those people who finds perverse pleasure in writing portable Bourne shell, and doing so correctly. I know I spend far more brain cycles on it than I should; once it gets past a certain complexity limit, I should just switch to Python. But I always say "It's just a little bit more code, there must be a good way to do this," and there generally is, but only after passing over several bad ways to do it first.
Oh, and of course in doing so, I always try to minimize the amount of calling out to other processes. Why call awk when you could use `while read` instead?
You only get one array to work with in portable Bourne shell, $@. You can parse lines using read, and strip prefixes and suffixes using parameter expansion (`${parameter%pattern}`). It's a restrictive little language, with a lot of quirks and edge cases, but it always makes a nice little challenge to figure out how you can use it safely.
One of the constraints I impose is that spaces in pathnames should always be handled correctly; the biggest form of trouble you can easily get in is to forget to quote something or to parse something on space delimiters when it can include spaces. I've seen someone write a script that accidentally turned into an `rm -rf /path/to/important/data` because of poor handling of spaces.
I generally do not attempt to handle newlines in pathnames. While it's possible to do so in some cases with null delimiters, I have never seen a case of newlines in pathnames in the wild. Of course, this means that I have to make sure that these scripts are never exposed to anything that could contain malicious filenames.
Also, I move away from portable shell if it will really make the code too convoluted or fragile. For instance, if I need to parse long options (--foo), I use GNU getopt rather than the built-in getopts (with appropriate test to make sure the getopt I'm using is GNU, so it won't break silently on a system without GNU getopt), or move to a real language.
Anyhow, while I know it's counterproductive, the puzzle-solving part of me really likes trying to solve the problem of implementing something in portable Bourne shell.
> One of the constraints I impose is that spaces in pathnames should always be handled correctly;
I tend to do the exact opposite. If a filename has spaces in it, i like my script to fail spectacularly while insulting the user (who is typically me....)
Filenames are variable names. It makes no sense to allow spaces in them.
tcc is neat software and I used it for some time almost exclusively for "-run", but after many years I ultimately replaced it with a small shell rc-function for compiling, linking and running a C/C++/x86/etc. file from the shell.
I think it's nicer.
#!/bin/sh
crun() {
local file="$1"
shift
local exepath="$(mktemp)"
if [[ "$file" =~ \.c$ ]]; then
gcc -g -Wall "$file" -o "$exepath" || return $?
else
echo "no filetype detected"
return 126
fi
"$exepath" "$@" & fg
}
... along with a more sophisticated version for .zshrc as well.
#!/usr/bin/env zsh
function crun {
zparseopts -E -D -- -gcc::=use_gcc \
c:=custom_compiler \
o+:=copts \
Wl+:=lopts \
-dump::=dump_asm \
v::=verbose \
h::=usage \
g::=debug
if [[ -n $usage ]]; then
cat <<- EOF
usage: crun [options]... <filename>
--clang (default) use clang for C & C++ files
--gcc use GCC for C & C++ files
--dump dump assembly of program
-o supply an option (e.g -o -Wall)
-v verbose
-g debug
Compiles and runs a C, C++ or x86 Assembly file.
EOF
return 126
fi
# select unique entries of `copts` and then slice copts[2..] (copts[1]
# contains the flag, e.g "-o")
local file=${@[-1]}
local options=${${(u)copts}[2,-1]}
local exepath="$(mktemp)"
if [[ $file =~ \.(cc|cpp|cxx)$ ]]; then
local compiler="clang++"
$compiler -std=c++1z -g -Wall -Weffc++ ${=options} $file -o $exepath
elif [[ $file =~ \.c$ ]]; then
local compiler="clang"
[[ -n $use_gcc ]] && ccompiler="gcc"
$compiler -g -Wall ${=options} $file -o $exepath
elif [[ $file =~ \.(s|asm)$ ]]; then
local objpath="$(mktemp)"
nasm -felf64 $file -o $objpath && ld $objpath -o $exepath
else
echo "no filetype detected"
return 126
fi || return $?
if [[ -n $dump_asm ]]; then
objdump -S -M intel -d $exepath
else
[[ -n $verbose ]] && echo "exepath: $exepath"
if [[ -n $debug ]]; then
gdb --args "$exepath" "$@"
else
"$exepath" "$@" & fg
fi
fi
}
Not really. It has rules for function naming and loops. I would set my rule as "never define a function or a loop". Once you get past a totally linear, stateless and deterministic script, stop writing bash. Roast me if you want, but my default language for this kind of stuff is Perl.
I really don't understand the "switch to Python" thing. Bash scripts are good for calling sequences of command line programs. That is not particularly convenient in Python.
I've told developers that if I catch them writing shell scripts in python they must commit a shell script doing
the same set of operations. 2x the work usually dissuades
from this type of nonsense.
My route is to make a good rule about writing python with more than 'n' os., subprocess. and *.Popen calls automatically requiring shell script counterparts in case we find your need to pythonize unsafe, unreadable and slow|broken|buggy.
There are obviously programs that are better written as a shell script, no denying it! I'm just saying use the best tool for the job. If you're writing complex logic in a shell script, you're probably barking up the wrong tree. Likewise, if all you're doing is calling external programs from python, you're probably doing it wrong.
- If you find you need to use arrays for anything more than assignment of ${PIPESTATUS}, you should use Python.
- If you are writing a script that is more than 100 lines long, you should probably be writing it in Python instead. Bear in mind that scripts grow. Rewrite your script in another language early to avoid a time-consuming rewrite at a later date.