Use Haskell for shell scripting

cies · on Jan 30, 2015

Thanks Gabriel Gonzalez! There is a comment on the blog post (by Chris Done) asking how it deals with piping. I really wonder about that too.

Some related projects:

- Joey Hess recently released a nice Haskell-to-sh compiler. I like this approach as the resulting sh scripts are runnable on pretty much every *nix. https://joeyh.name/blog/entry/shell_monad/

- Chris Done also released a lib to do shell stuff from Haskell, which build on the conduit library http://chrisdone.com/posts/shell-conduit

- Chris also wrote a shell in Haskell https://github.com/chrisdone/hell

- Then there is Shelly by Greg Weber https://github.com/yesodweb/Shelly.hs

There are probably more...

Gabriel439 · on Jan 30, 2015

You use `inproc` and `inshell` for piping. For example, here's the type of `inshell`:

    inshell
        :: Text        -- Shell command
        -> Shell Text  -- Standard input to feed command
        -> Shell Text  -- Standard output produced by command

I made one intentional simplification in the API, which was to not provide a way to capture standard error. It's definitely possible to provide such a utility, but I wanted to simplify things as much as possible in the first release before the slow onslaught of feature cruft begins. If there were such a utility, it would have this type:

    both
        :: Text        -- Shell command
        -> Shell Text  -- Standard input to feed command
        -> Shell (Either Text Text)

... and you could selectively listen to just stderr or stdout by taking advantage of the fact that pattern match failures short-circuit downstream commands:

    Left txt <- both -- only read stderr

There is one more shell library that I know of: `process-streaming`. I actually didn't know about `shell_monad` (that's the one most similar in spirit to what I wrote).

The main reason I rolled my own library is that this was written with the specific audience of people who didn't know any Haskell, but were comfortable with Python or Bash. My actual goal is to convince people internally at Twitter to use Haskell instead of Python for large scripts. I reviewed all those libraries (with the exception of shell_monad) to see if I felt comfortable marketing them to non-Haskell programmers and none of them felt like the right level of abstraction to me. I almost ended up going with Shelly, but in the process of polishing shelly for internal usage I found myself continually wrapping things with better names, different types, and providing missing features to get a single import umbrella, so I just stopped and asked: "why not just do this as a cohesive single library instead?". Also, `shelly` does not provide any `IO`-only commands: everything has to be wrapped in the `Sh` monad.

As for the other libraries, `shell-conduit` was too complex for new users in my opinion and `hell` is not embedded within Haskell (it's a separate language), and I wanted to keep the features of Haskell. I still need some more time to review `shell_monad` to see if I made a mistake by ignoring it.

danidiaz · on Jan 30, 2015

"process-streaming" is more like a set of helper functions for "process"; it doesn't provide formatting, regexps, or OS-independent implementations of typical shell commands. It does support piping of processes, though.

kenko · on Jan 30, 2015

Why `Either Text Text`? What if you're interested in both stdout and stderr?

Gabriel439 · on Jan 30, 2015

Then you can do this:

    fmap (either id id) (both ...)

... which is equivalent to:

    x <- both ...
    return (case x of
        Left  txt -> txt
        Right txt -> txt)

That removes the `Either` tag and fuses them into a single stream.

cies · on Jan 30, 2015

Thanks for this elaborate response.

Gabriel439 · on Jan 30, 2015

You're welcome!

Doji · on Jan 30, 2015

The tutorial does a great job of explaining why this is interesting: http://hackage.haskell.org/package/turtle-1.0.0/docs/Turtle-...

For example, the pwd function returns a FilePath type rather than a String:

  Prelude Turtle> :type pwd
  pwd :: IO Turtle.FilePath

The datefile function is also typed:

  Prelude Turtle> :type datefile
  datefile :: Turtle.FilePath -> IO UTCTime

So this really does seem to structure the data passed between commands, instead of the "stringly typing" unix shells have historically been known for.

TazeTSchnitzel · on Jan 30, 2015

Are those types just aliases of String?

throwaway283719 · on Jan 30, 2015

No. For example, a FilePath is (after resolving a few other type aliases)

  data Root
	  = RootPosix
	  | RootWindowsVolume Char
	  | RootWindowsCurrentVolume

  data FilePath = FilePath
	  { pathRoot        :: Maybe Root
	  , pathDirectories :: [String]
	  , pathBasename    :: Maybe String
	  , pathExtensions  :: [String]
	  }

marcosdumay · on Jan 30, 2015

What do you mean by "alias"?

Path carries String-like information, it can even be easily converted to and from strings. Yet, it's a strong type that won't let you write something like 'path </> file_contents' (although, with overloaded strings, you can do 'path </> "file_name"').

UTCTime is not String-like.

barrkel · on Jan 30, 2015

OK. How do you easily fork to run a command in the background? How does setting up pipes work? What's the idiom for chdir'ing to a subdirectory such that you pop back out again when you're done (I'd use a subshell with (ch xxx; ...) in bash)?

Getting into more tricky stuff, what's the equivalent of <() in bash?

This doesn't really demonstrate anything that shell scripts are actually written for: orchestrating and composing other processes, and job control.

If you wanted to leverage type checking for safety, it would be more interesting to typecheck the streams input and output by pipes.

tinco · on Jan 30, 2015

Did we read the same article? The entire 'streaming section' is about pipes and I/O redirects. Running a command in the background is just forkIO $ proc ..etc.., as in regular Haskell.

Nothing tricky about it.

barrkel · on Jan 30, 2015

The streaming section of the article has nothing about composing processes, that I could see; it appeared to be about treating the output of commands as input to Haskell lazy lists. I may have misread it, though.

Here's a pattern that comes up fairly frequently for me:

  foo | fgrep -v -f <(cut -f 2 info.csv) | bar

It uses the second column in info.csv as fixed strings to match inside lines in the output of foo, and filters them out, with the remaining lines going to bar.

All 4 processes (foo, bar, fgrep, cut) run concurrently. Likely fgrep will block on cut sooner or later, but the point is that multiple communicating concurrent processes are set up using a fairly easy to use DSL.

That's what a shell is, to me.

Gabriel439 · on Jan 30, 2015

There are two ways you can embed that within `turtle`. You can either embed each step as its own concurrent process, like this:

    -- Note, the flow is right-to-left, not left-to-right
    inshell "bar" (inshell "fgrep ..." (inshell "foo" empty))

Or you can just embed the entire thing within a single `inshell` command:

    inshell "foo | fgrep ... | bar"

The reason this works is that the type of `inshell` is:

    inshell
        :: Text        -- Command line
        -> Shell Text  -- Standard input to feed
        -> Shell Text  -- Standard output from command

This leads you stream to any shell command's input and read the command's output also as a stream.

danidiaz · on Jan 30, 2015

> What's the idiom for chdir'ing to a subdirectory such that you pop back out again when you're done

This can be done with the "bracket" function, which works roughly like a context manager in Python:

  import Control.Exception
  import System.Directory

  withDirectory :: FilePath -> IO a -> IO a
  withDirectory path action = bracket (getCurrentDirectory <* setCurrentDirectory path) 
                                      setCurrentDirectory 
                                      (const action)

Gabriel439 · on Jan 30, 2015

> How do you easily fork to run a command in the background?

`turtle` provides `fork` for running a command in the background. Example usage:

    example = do
        using (fork commandToForkInAnotherThread)
        theseCommandsStillRunInTheOriginalThread

> How does setting up pipes work?

See the `inproc` and `inshell` commands, which let you convert any shell command into a stream transformation embedded within Haskell.

> What's the idiom for chdir'ing to a subdirectory such that you pop back out again when you're done (I'd use a subshell with (ch xxx; ...) in bash)?

You can write a combinator for this using `turtle` pretty easily:

    pushd newDir = do
        oldDir <- pwd
        cd newDir
        return (cd oldDir)

... and you use it like this:

    example = do
        popDir <- pushDir "/tmp"
        ... do stuff ...
        popDir

> what's the equivalent of <() in bash?

`inproc`/`inshell` which let you read in a command's standard output as a stream

barrkel · on Jan 30, 2015

FWIW:

<(foo) in bash creates a fifo, and pipes the output of foo to the fifo. It then replaces the whole <(foo) argument with the path to the fifo. This means that commands that normally expect to read from a file on the command line can instead be wired to read their input from a process. And, of course, both processes run concurrently.

>(foo) does the same thing, except the other way around, for process output.

danidiaz · on Jan 30, 2015

There is a "createPipe" function in the "unix" package http://hackage.haskell.org/package/unix-2.7.1.0/docs/System-... that gets us half-way towards process substitution.

Unfortunately, I don't know how to get the name of the device file associated to the pipe, and I need it in order to pass it as an argument to the reading process :(

thinkpad20 · on Jan 30, 2015

In my opinion it be nice to write it as a wrapper type thing:

    inDir d action = do
      oldDir <- pwd
      cd newDir
      result <- action
      cd oldDir
      return result

Then you could use it like a python `with` statement. :)

S4M · on Jan 30, 2015

I don't know much about Haskell, but I thought it had some properties to isolate side effects, but the code he gives:

    main = do
        cd "/tmp"
        mkdir "test"
        output "test/foo" "Hello, world!"  -- Write "Hello, world!" to "test/foo"
        stdout (input "test/foo")          -- Stream "test/foo" to stdout
        rm "test/foo"
        rmdir "test"
        sleep 1
        die "Urk!"

Clearly doesn't (it creates a directory, writes in a file, removes that file and that directory all in one go without anything indicated by the function main. Is it because it's the main function of the program, or am I missing something?

tinco · on Jan 30, 2015

People are talking about monads and stuff like that. No need to worry about maths and words you don't need to know. That 'do' keyword up there indicates the start of a simple DSL. The DSL goes like this, every line is the beginning of a lambda. And the result of each lambda evaluation is passed into the next lambda.

So you get a sort of cascading scope of lambdas where the result of each lambda is available passed into the next. Each lambda depends on the evaluation of the previous one. Normally in Haskell functions are executed lazily, this structure forces the sequential evaluation.

So what are these cd, mkdir, output etc functions? They return an object with a specific type called 'IO'. This type is monadic, but that's irrelevant for now. Haskell as you know has no side effects in the language itself. The IO type basically is a command pattern, it says "execute this I/O with these parameters".

The monadic aspect of IO makes it so that at the end the commands will have accumulated in a list, of which you can get an item if you give it the results of the previous item. So that's what the main function returns, a list of commands with some lazily evaluated Haskell code in between them. Now comes the side effect part. The Haskell runtime system iterates over the list of commands and executes them. The result of each command is used to get the next command in the list.

So that's the core of the magic trick of monadic I/O, you make a lazy list of I/O commands, and have something external to the language execute those I/O commands, giving the results back to the language to get the next I/O command to execute.

zoomerang · on Jan 30, 2015

Haskell functions return side-effects using the IO type, with the boilerplate plumbing being hidden with monads and do-notation. "main" in Haskell by default has a return type of "IO ()", and any "IO" values returned by that function are executed by the runtime.

The end result in this case is something that just looks and feels completely imperative.

but if you were to try and call, say, the "rmdir" function inside another function that didn't have an IO return type, you'd get a compile error. (More specifically, you could technically call the function, you just couldn't return the "IO" value as a result, so it couldn't perform any actions).

tome · on Jan 30, 2015

> IO type

:)

omaranto · on Jan 30, 2015

Haskell does not force you to indicate that a function has possible side effects in the program source code, the type of the function will however have such an indication. Here main has type IO (), and the IO indicates it can do arbitrary I/O and mutation. Haskell will infer the type for you so you don't need to declare it in the source code.

So I disagree with the claim "without anything indicated by the function main", and would amend to "without anything indicated explicitly by the source code, leaving the only indication in the inferred type".

lmm · on Jan 30, 2015

It's in a do block, so you can see that these are not simple function calls; they're being composed via some monad. In cases where you're actually chaining values together it's more obvious which things are which:

    do
      value1 <- function1 --effectful function
      let value2 = function2(value1) --pure function
      value3 <- function3(value1, value2) --effectful function
      ...

But the notation is a bit more magic for this "no return" case; I prefer the Scala approach where even if you don't care about the return values you'd have to write this as

    for {
      _ ← cd "/tmp" // effectful function
      _ ← mkdir "test" //effectful function
      _ = someCalculation() //pure function
      ...

Peaker · on Jan 30, 2015

If you enable -Wall, Haskell forces you to use "_ <-" for any action that has a stateful result.

But the "let" vs no "let" is a pretty strong hint anyway :)

mercurial · on Jan 30, 2015

Control.Monad.void is your friend.

Peaker · on Jan 31, 2015

Data.Functor.void you mean? :)

mightybyte · on Jan 30, 2015

That's because the whole block you're pointing to there is in Haskell's side effects box. So this is simply not the right example for illustrating how Haskell does isolate side effects. Shell scripts in general are very side effecting, so for this application it makes sense.

sbergot · on Jan 30, 2015

You are right. In this case effects are not isolated. But in this particular script, there are no interesting things to move into a pure function. It does not mean that it wouldn't be the case in a more complex script.

Like everything, you have to learn to balance your IO code and your pure code. A bit like learning when to factor something into a separate class, or leave it in a few statement/methods. If you write everything in IO & do notation, you don't get the main benefits of haskell. But if your code is more than 10 lines long, chances are that there will be useful pure functions in it.

S4M · on Jan 30, 2015

I agree about the benefits of isolating side effects and IO - I generally code in python, and my code tends to look like:

    def main_function(args):
        data = get_data(args)
        result = do_calculations(data)
        push_results(result, args)

Where the function do_calculations is somewhat pure - no side effects, but I do use local variables that I modify inside the function.

> You are right. In this case effects are not isolated. But in this particular script, there are no interesting things to move into a pure function. It does not mean that it wouldn't be the case in a more complex script.

Well, I thought that the point of Haskell (of one of its points) is that it forces the programmer to declare whatever side effect in the type of the function. But here, there is no way to know that main, on top of printing stuff, also messes up with the directory and there is no type signature indicating it - in this example it's no big deal but I could write something like:

    main = do
        rm "/"
        sleep 1
        die "Oooops my files!"

sordina · on Jan 30, 2015

You're right that Haskell doesn't distinguish different IO actions other than by the type they return. There are certainly libraries that do this though, although they aren't widely used.

Even though Haskell doesn't distinguish different classes of IO actions, it still distinguishes IO actions from other kinds of actions (such as stateful actions as per your example), and pure computations and that provides a hell of a lot of bang for buck.

The Idris language has the notion of effect types [1] to make achieving the goal of categorising the kinds of effects being used in a function easier to deal with, but that uses the dependent capabilities of the language.

[1] http://eb.host.cs.st-andrews.ac.uk/drafts/eff-tutorial.pdf

tel · on Jan 30, 2015

> Not widely used...

I use them in every single application I write, or I make my own tighter, more specific ones. They're incredibly useful in real world apps.

sordina · on Jan 31, 2015

Sorry Tel, I didn't mean to imply that nobody uses them, more that I would guess that they are used less than 1% of the time where their inclusion could be beneficial.

tel · on Jan 31, 2015

Ha, no offense. I just wanted to emphasize that they are used.

In particular, I think they're more useful in applications than libraries and most Haskell code you can find in the wild is library code---so you end up not seeing them much.

sbergot · on Jan 30, 2015

`main` is the entry point of the program. Its type is `IO ()`. It is a warning to omit a type signature in a top level definition, so in a normal program you would have a `main :: IO ()` type definition. The `IO` allows any kind of effect.

sukilot · on Jan 31, 2015

Note that style of Pyhton uses block/strict IO (not streaming/lazy) which makes it rather inefficient if any part of he process exits early (due to error or because the user only wanted the first line of output). Most common command line programs (and simple Haskell programs) are streaming not block.

amelius · on Jan 30, 2015

Yes, the code uses the IO monad, you might want to look it up.

_qc3o · on Jan 30, 2015

Who's the target audience of this exactly? I already see a language pragma, do notation, liftIO, parser combinators.

Hamming has this great set of lectures on how he became a world renowned scientist and in one of the lectures he explains why Ada failed and other languages succeeded. The difference was that Ada was designed logically and most successful languages were designed psychologically. Even when government contracts mandated Ada people still wrote in Fortan and hand translated to Ada. You can watch the videos and take from it what you will.

A minimal bash file is`#!/bin/bash`. A minimal turtle file is already way too long and logical.

The set of videos: https://www.youtube.com/playlist?list=PL2FF649D0C4407B30.

nandemo · on Jan 30, 2015

Presumably the target is existing haskellers. Basically, "if you use Haskell, here's something to help you use it for shell scripts too".

I don't think it's reasonable to assert that Ada "failed" (e.g. it runs on large passenger airplanes), but in any case that's kinda beside the point, TFA isn't primarily about Haskell evangelism/advocacy.

> language pragma, do notation, liftIO, parser combinators.

Arguably, all of this is within the reach of an intermediate-level Haskell programmer. OverloadedStrings is considered a basic pragma.

Gabriel439 · on Jan 30, 2015

The target audience is non-Haskell programmers, and if you don't think the tutorial is good enough to onboard such a programmer then I consider that a bug against the library. I would actually appreciate if people submitted Github issues highlighting any pedagogical problem with the tutorial.

I think the use of `liftIO` is a reasonable objection. When I wrote the library I had the choice of utomatically pre-wrapping all `IO` commands with `liftIO` for the user (making them all `Shell` commands) by default. However, I decided not to do that for two reasons:

* If you do that you can't use them outside of a `Shell` any longer * The user has to learn `liftIO` anyway if they want to use `IO` actions not provided by the `turtle` library. I didn't want to teach the user a leaky abstraction

I don't see any issue with `do` notation is bad. Same thing with parser combinators, which are just strings in the simple case, and the "Patterns" section of tutorial has a table showing you how to convert regular expression idioms to `Pattern`s:

http://hackage.haskell.org/package/turtle-1.0.0/docs/Turtle-...

The language pragma is sort of a grey area. I decided to keep it because it doesn't take a long time to explain and it significantly increases the usability of the library.

_qc3o · on Jan 31, 2015

I think you mentioned at some point the goal is to get folks using large python scripts to use this instead. I would be very curious to hear how that progresses.

pjmlp · on Jan 30, 2015

Ada might have failed on OS, but that is just because few startups that based their workstation OS in UNIX succeeded in the market at large.

C goes hand-in-hand with UNIX, so clearly no UNIX vendor would have it in their SDK and UNIX developers weren't willing to pay for tools.

As history has shown, the moment UNIX vendors started doing "Home" and "Pro" editions, GCC got lots of help.

As Ada talks at FOSDEM show, it is present everywhere where safety matters and its use has been slowly increasing since the Internet has shown how bad idea is to connect C code to the outside world.

http://www.kb.cert.org/vuls/byupdate?open=&start=1&count=10

yosefk · on Jan 30, 2015

I think there's much (~10x) more C code than Ada code everywhere where safety matters. (No hard data, just a feeling from experience - if you have hard data proving me wrong, do share.)

Also, it's not just Unix that's written in C or a descendant - there's also, well, Windows, and a load of embedded RTOSes.

If Ada made you as productive as C with extra benefits or something to that effect, you'd expect Ada to succeed at the marketplace at a scale at least comparable to C's - especially with the government support it had which put C at a disadvantage, not?

pjmlp · on Jan 30, 2015

> I think there's much (~10x) more C code than Ada code everywhere where safety matters

Probably, but using C dialects and certification processes that make C just look like Ada with another syntax.

http://www.misra-c.com/MISRAChome/tabid/181/Default.aspx

http://en.wikipedia.org/wiki/DO-178B

http://www.programmingresearch.com/solutions/medical-devices...

> Also, it's not just Unix that's written in C or a descendant - there's also, well, Windows, and a load of embedded RTOSes.

Windows did not exist when UNIX was created.

MS-DOS was based on CP/M which copied ideas from UNIX into home computers. So while C didn't had a special place in home computers, UNIX was gaining adoption in the enterprise even Microsoft had their own UNIX, Xenix.

Which they used to cross compile some of their MS-DOS applications.

So it was only natural that when they started developing Windows, they used their in-house languages and both Quick Basic and Quick Pascal were not that up to the task, leaving C as the option.

Embedded RTOS are traditionally POSIX compliant, wich leads again to C.

Microsof is actually moving away from C, this is why they don't care about compliance any longer and speak about C++ and .NET Native.

Even their latest C99 related changes are only related to what ANSI C++11/14 require and a few key open source projects that they wanted to see supported.

Which is kind of funny, because Microsoft was the last C compiler vendor in the home computing space, to add a C++ compiler to their tools, with Microsoft C/C++ 7.0.

> If Ada made you as productive as C with extra benefits or something to that effect, you'd expect Ada to succeed at the marketplace at a scale at least comparable to C's - especially with the government support it had which put C at a disadvantage, not?

Not if people are expected to pay for the compilers.

madawan · on Jan 30, 2015

Would it be possible to create a /bin/turtle script which prepends those import and language statements? That way all the boilerplate that's needed would be "main = do ...", which seems acceptable imo.

npsimons · on Jan 30, 2015

As a working programmer and a bit of a language geek, a few years ago I decided to try to get as many programming languages as possible installed on my dev machines. From this, I eventually tried to get as many of them "scriptable" as possible, creating at a bare minimum a "Hello, world!" template that could be run from the command line. I like having options, or a bare minimum, having things around to play with/learn when I've got down time.

I remember something in "Mythical Man-Month" that extolled the virtues of scripting programming for concept exploration, and I've often felt this is one of the major advantages traditionally scriptable languages have over compiled. Once you can run a program without compiling it, iteration tends to go faster.

So sure, some languages require more boilerplate to get started than others, but I've got templates for that, and I happily scripted almost all the exercises in "Thinking in C++" because it just made working them out faster, even in emacs where I can bind the compile key to any command I can dream of.

boothead · on Jan 30, 2015

For all the considerable awesomeness that Gabriel produces, I always think the best part is the *.Tutorial module he includes. I always learn a lot and it's always a great over view that puts the work in context.

Everyone should do this!

chrisBob · on Jan 30, 2015

After learning Perl I started using it where some more educated people might recommend a proper shell script. My thinking is that using what you know is a whole lot more efficient than learning a new tool for a small job, even if some people think it is the right tool. I am sure it is no different for people familiar with Haskell.

loudmax · on Jan 30, 2015

I do a lot of shell scripting, and I'm not sure there is such a thing as a "proper" shell script. The shell just isn't a great programming language. Just about any modern scripting language is better, starting with Perl. But the shell has been the lingua franca of the Unix world for decades now. It's the one language that you can pretty much guarantee is on any Unix or Linux server, even pretty ancient ones.

I don't doubt that Haskell isn't a better scripting language language than the shell, but you can't assume /usr/bin/env runhaskell is going to return anything on random Linux servers. Perl and Python, maybe, but Haskell isn't there yet.

klibertp · on Jan 30, 2015

> you can pretty much guarantee is on any Unix or Linux server, even pretty ancient ones

Well, yes and no. You can get reasonable compatibility with different Unix flavours if you stick to sh. Your script is not going to work on BSDs once you start using bash specific features, though.

Fun fact: on FreeBSD bash does not live in /bin/bash, it's in /usr/local/bin/bash. Every time you write a shebang with /bin/bash hardcoded you're making your script harder to use there.

Perl is everywhere almost by default and it's more compatible as it has just one implementation, without sh/bash/csh/ksh/tcsh/zsh madness. I'd say it's a good idea to use Perl instead of shell script for anything more complicated than a few lines of code if it's meant to be portable. (And I'm not Perl programmer at all).

loudmax · on Jan 30, 2015

Oh that's interesting. I'd assumed that FreeBSD had made bash the default shell around the same time that Mac OS did. I guess my point still stands for /bin/sh. Not a fun programming language though.

floatboth · on Jan 30, 2015

No way. One of FreeBSD's goals is to get rid of everything that's GPL-licensed. bash is not only that, but it's also horrible code.

And it's a user-friendly shell with all the tab completions and history searches, which DOES NOT BELONG IN /bin/sh!

klibertp · on Jan 30, 2015

Exactly, which is the reason why BSDs were not affected by shellshock. And moreover, modern tcsh is quite a powerful and full-featured shell, too.

I unfortunately had to switch to Linux a few years ago (after using FreeBSD for almost a decade) and I still miss how consistent and well laid out BSDs seem in comparison.

Gabriel439 · on Jan 30, 2015

Note that you only need `/usr/bin/env runhaskell` if you want to interpret the script. You can also compile the script as a native binary, which is the recommended approach on Windows.

falcolas · on Jan 30, 2015

Agreed. I look upon people who use PHP for shell scripting with a sigh and a shaken head, but I can't fault them for it. PHP works, PHP is quick to write, and for many tasks, PHP is sufficient.

I hope Haskel can gain traction in this area, if only because options are always nice to have, and competition forces everyone to bring their best game.

mercurial · on Jan 30, 2015

I like the Pattern thing. However, it seems to me that you're going to quickly run into trouble if you need to even vaguely emulate shell scripting. Shell utilities live and die by their options. It's unfortunate Haskell supports neither named arguments nor default values. Which means that in order to emulate options, you would need to pass records to your "shell" utility, which, on top of being cumbersome, forces you to prefix every option in a way unique to your utility, since you cannot have two records with the same fields in the same namespace...

mightybyte · on Jan 30, 2015

It's going to be difficult for something like this to match the ease of shell scripting. For instance, typing `cd "foo"` is significantly more painful than `cd foo`. (Maybe this could be fixed by forking or adding to ghci so that when you hit space after the function name it automatically puts the quotes in for you and places the cursor between them.) Options are certainly important, and as others have said, they can be emulated with records. Is the syntax going to just as convenient as shell scripting? No. But that's not the point. The point is that shell scripting is massively painful in a lot of other ways where Haskell blows it away. So the task is to find a happy medium that gets fairly close to the convenience of shell scripting while still giving us the power of Haskell.

There are a number of potential approaches for coming close to the ease of shell scripting. One is options records as others have mentioned. For defaults you can have a Default instance (no, that's not boilerplate because you would have had to specify the defaults somewhere anyway). Then there is plenty of room for infix operator combinators to make it easier to change individual options. A second option could be to put options into a string that would get parsed into a record. You could use patterns similar to those used in existing command line argument processors like optparse-applicative. Or, if you don't like that, then maybe a quasiquote could give more power.

Do these things require some boilerplate? Yes. We know that is going to be required since Haskell wasn't designed for the convenience that shells were designed for. But that's fine in this case because the potential benefits are huge.

gamegoblin · on Jan 30, 2015

Quite a lot of libraries here: https://wiki.haskell.org/Command_line_option_parsers

mercurial · on Jan 30, 2015

That's not the issue.

The issue is that, if you want to simulate both "grep" and "grep -r", you need to different functions, or you need to have your "grep" function accept a record of parameters.

vlastachu · on Jan 30, 2015

I'm not good enough haskell programmer, but there is possible solution (records as you mentioned).

    import Prelude hiding ((-))

    data Grep = Grep {isRecursive :: Bool, maxCount :: Maybe Int} --etc
        deriving (Show)
    grep = Grep False Nothing
    --short pseudonim
    r :: Grep -> Grep
    r command = command{isRecursive = True}
    m :: Int -> Grep -> Grep
    m num command = command{maxCount = Just num}
    
    (-) :: a -> (a -> a) -> a
    (-) command flag = flag command
    ourGrep = grep -m 50 -r 
    
    main = print ourGrep -- > Grep {isRecursive = True, maxCount = Just 50}
    
    --then we should write monad which execute that data

mercurial · on Jan 30, 2015

Yes, that's exactly what I wouldn't want to type. Also, your record is going to blow up in the likely case another utility uses a recursive flag, because of Haskell's pervasive namespacing problems.

tel · on Jan 30, 2015

It probably wouldn't be too hard to do something like

    grep "foo"
      & "recursive" <~ True
      & "maxCount"  <~ 100

in a typesafe way. It just probably wouldn't be worth the complexity. It also probably couldn't be a straight `IO` action then, which was a design constraint of Gabriel's.

Gabriel439 · on Jan 30, 2015

Actually, you can do `grep -r` by just combining `grep` and `lstree`. Here's an example:

    example = do
        file <- lstree "some/dir"
        True <- liftIO (testfile file)
        grep "Some pattern" (input file)

This is an example of how most of Bash's option heavy ecosystem is an outgrowth of Bash's limitation as a language (individual commands accumulate flags to work around functionality difficult to implement within the Host language). I think having a decent host language decreases the need for so many configuration knobs for every command.

mercurial · on Jan 30, 2015

You're right to some extent, but I think many of these 'knobs' have a good reason to exist (--dry-run, -a, -z for rsync for instance) and cannot be usefully, or at all, replaced by more composability. And attempting to implement support for them will run against the limitations of Haskell's syntax.

Something like OCaml would be better suited, since polymorphic variants, named and default arguments give a lot more flexibility, though the fact that shell commands happily return different outputs depending on their options would still be an issue.

sukilot · on Jan 31, 2015

Unix has

     find . -exec grep $pattern {} \;

but `grep -r` is easier to write.

Hz8NSD · on Jan 30, 2015

I saw those as regular parser that returns text but not just grep wrapper. Yes It has function named 'grep', but its not grep wrapper. It looks like `lstree` could be combined with `grep` function for emulate `grep -r`.

mercurial · on Jan 30, 2015

Take lstree then. How would you give it an option giving the kind of ordering you want?

barsoap · on Jan 30, 2015

I guess you wouldn't, you'd have a sort function afterwards.

...for Unix' insistence on composability, the shell tools are often unnecessarily monolithic, probably because that's the only sane way if the only type you have in interconnect is `string`.

mercurial · on Jan 30, 2015

Sure, but consider the common case of wanting the most recent file/directory in a directory. This would be something like "ls -t|head -n 1". It's pretty convenient, AND your ls function still returns only file and directory names, no additional information.

Here, you'd need to either parametrize the return type of ls to get simple strings (which is what you want most of time) or additional metadata, or alternatively to have different ls commands.

marcosdumay · on Jan 30, 2015

If your arguments aren't exclusive, you can pass a list of algebraic data. If they are exclusive, you can construct a type for them.

mercurial · on Jan 30, 2015

> If your arguments aren't exclusive, you can pass a list of algebraic data.

Which you need to prefix to avoid clashes.

> If they are exclusive, you can construct a type for them.

Which is going to end up being a record, which:

- is awkward to build (compared to just giving options to a command or arguments to a function)

- will most likely need to be an instance of Default

- which needs to have its fields prefixed to avoid clashes

Starts to sound like an awful amount of boilerplate.

vlastachu · on Jan 30, 2015

> prefixes, default instances

Also bad setters ("record {field = val}" looks nice but useless, as not a function).

I hope the developers of GHC also clearly see the problem and someday will be engaged in it. Then the language becomes much more expressive.

Is that task exists somewhere in roadmap?

tel · on Jan 30, 2015

There have been probably 15 different proposals for this throughout the years. They never make enough momentum to go through. While record pain points seem like a huge deal, in practice they're not sufficiently bad to motivate changes in the language in the face of the various tradeoffs that would have to be made.

Today, typeclasses and lenses cover 99% of "the record problem" as far as I've experienced.

joelwilliamson · on Jan 30, 2015

There has been quite a bit of discussion on ghc-devs about fixing records. The most promising approach seems to here:

http://nikita-volkov.github.io/record/

marcosdumay · on Jan 30, 2015

Prefixes are a problem, no arguing about that. But...

> is awkward to build (compared to just giving options to a command or arguments to a function)

Idiomatic bash: apt-get install package_name

Idiomátic haskell: apt_get $ AptInstall package_name

What is so awkward about that?

mercurial · on Jan 30, 2015

First, it doesn't let you install multiple packages in one go (or let you install a specific package version). Secondly, it doesn't support apt-get options like -m, -d -n...

You could extend your argument structure for this, but then you need to specify every argument all the time, or have the user modify a default value. This is definitely awkward compared to straight shell.

falcolas · on Jan 30, 2015

Please forgive my lack of familiarity with the concurrent workings of Haskell, but since the Shell streams are based off []/IO, and not Concurrent.Chan, does this mean one turtle function has to complete (and write its results to memory) before the next turtle function can run?

To me, magic bits of shell scripts which turtle would need to improve upon were it to replace said scripts are not the loop constructs, conditionals, or even the type system (even though it's completely lacking in bash), it is the ability to use pipes to link processes concurrently.

joeyh · on Jan 30, 2015

The streaming section shows some examples of combining turtle functions, this will be the same as shell pipes.

There's also nothing stopping you from using forkIO to spark off a separate thread, and doing IO in multiple threads concurrently.

Haskell's IO manager allows multiple threads doing concurrent IO in what looks like an imperative, one instruction after the other manner. Instead of async callbacks like you might expect from other languages.

Klasiaster · on Jan 30, 2015

For me combing the best parts of bash and ipython is the way to go. Up to now this seems more comfortable to me than using subprocess in python or this haskell aproach which needs to be aware of every programme output to give what it promises. You can easily copy big parts of existing bash scripts and e.g. add error handling in the python way :) Even I think for loops/list comprehensions are betten than the strange bash syntax.

And here a short example::

  #!/usr/bin/env ipython3
  #
  # 1. echo "#!/usr/bin/env ipython3" > scriptname.ipy    # creates new ipy-file
  #
  # 2. chmod +x scriptname.ipy                            # make it executable
  #
  # 3. starting with line 2, write normal python or do some of
  #    the ! magic of ipython, so that you can use shell commands
  #    within python and even assign their output to a variable via
  #    var = !cmd1 | cmd2 | cmd3                          # enjoy ;)
  #
  # 4. run via ./scriptname.ipy - if it fails with recognizing % and !
  #    but parses raw python fine, please check again for the .ipy suffix which must be there!
  #
  # ugly example, please go and find more in the wild
  files = !ls *.* | grep "y"
  for file in files:
    !echo $file | grep "p"
  # sorry for this nonsense example ;)
  # it's even possible to access the output of a command by outputvariable.s, .p or .n
  # see file:///usr/share/doc/ipython-doc/html/interactive/reference.html#system-shell-access

Better take a look here, it's more complete: https://blog.safaribooksonline.com/2014/02/12/using-shell-co...

codygman · on Jan 31, 2015

Oh, I'm going to have to see if I can use Turtle with IHaskell tomorrow!

0: http://gibiansky.github.io/IHaskell/ 1: https://registry.hub.docker.com/u/gregweber/ihaskell/

tel · on Jan 30, 2015

But why "turtle"?

strager · on Jan 30, 2015

Turtle shell.

tel · on Jan 30, 2015

And in retrospect that's quite obvious, hah!

I spent the whole time trying to think how this was connected to LOGO.

npsimons · on Jan 30, 2015

Nice! Now I can add Haskell to my list of languages I can script with.

I'm always on the lookout for new languages I can script with (or at least get closer to rapid prototyping) for easier learning, testing, problem solving, etc. I've got templates that I run against linters, style checkers, etc for many languages and it will be helpful to have even more options.

agumonkey · on Jan 31, 2015

It's not closely related but still, it reminded me of the wonderful https://pypi.python.org/pypi/sh to write 'shell' script in python with very low boilerplate.

akurilin · on Jan 30, 2015

This is awesome, I was actually looking for something like that out of sheer curiosity, but perhaps it'll make it into production at some point.

fallat · on Jan 30, 2015

I've been pushing for alternative shell scripts for awhile now. I mostly stick to Python and Haskell now. It is great. Highly recommended.

amelius · on Jan 30, 2015

I wonder why it uses the convention:

stdout (input "test/foo")

instead of:

output stdout (input "test/foo")

which would be expected considering the previous line.

qznc · on Jan 30, 2015

Haskell is low on boilerplate? Yes, in general I would agree. Those scripts however, all have to be prefixed with "{-# LANGUAGE OverloadedStrings #-} import Turtle main = do". This is tedious boilerplate.

sukilot · on Jan 31, 2015

You could trivially have a wrapper program that added that to every script file before calling runhaskell.

joelthelion · on Jan 30, 2015

I want to see how you implement the pipe :)

meekins · on Jan 30, 2015

q3k · on Jan 30, 2015

I don't really see the point of this, apart from academic research values.

POSIX shell is everywhere - your current Linux and OS X machines, old UNIX workstations, home routers, servers... Just drop in a file and it will probably run just fine, unless the author screwed something up completely. POSIX shell scripts are the perfect bootstrap mechanisms that will run almost anywhere regardless of architecture.

Haskell, on the other hand, is rarely present in an operating system - if you absolutely, positively need a higher-level language for „shell scripting”, then you have a much higher chance of finding a Perl interpreter, or even Python. Heck, even getting ghc and its' basic ecosystem running has always proved to be a huge burden to me. Try sticking a `cabal install` in your CI flow, you'll see your job times increase by hours.

Third, there's just the KISS aspect of it - if you're writing something that has logic so simple it can be stuck in a shell file, why not just write it in a shell file? You don't need category theory to get a few files installed...

comex · on Jan 30, 2015

Because shell is so deficient that even for "simple" things it is really easy to screw up - when whitespace or special characters in filenames cause some case you overlooked to screw up due to terrible quoting rules, when missing arguments cause [1], when you accidentally put bashisms in scripts labeled /bin/sh, when you suddenly have to do some basic text parsing (e.g. extracting capture groups from a regex) and have to either switch to perl or use some ugly bash extension that's incompatible between the version of bash OS X uses and the newer ones.

So you might want to use a different language - even for purely/mostly personal use, in which case Haskell would be fine.

[1] https://github.com/ValveSoftware/steam-for-linux/issues/3671

regularfry · on Jan 30, 2015

That's before you've even addressed the stultifying features of shell as a language: booleans and tests are odd, arrays are odder, they have things called "functions" which don't have return values, the list goes on. Basically if you're writing shell, you probably also have at least Perl available, and probably Python...

sbergot · on Jan 30, 2015

It is a bit easy to dismiss everything haskell related as only useful for "academic research". In fact I am not sure how do you make this link between this post and research at all.

Now if you don't know about haskell and want to write a quick and short lived script, there is 0 value in writing it in haskell. However, if you happen to now a bit of haskell and that your script is likely to be used several times you might find some benefits to this.

- haskell is quick to write and the code can be quite terse. You can create an myscript.hs file and run it with runhaskell. Zero platform complexity overhead.

- you get the benefits of static types which are easier maintenance and refactoring.

- if it evolves in anything more complex, it is easy to move it in a cabal project.

- if you need to do something cpu intensive, you can compile/profile/improve perfs.

Peaker · on Jan 30, 2015

The "Academic" language and "Category Theory" strawmen is getting tiring...

Many of us who use Haskell do so (without an academic degree or almost any CT knowledge, by the way) because it offers the best bang for our buck -- less code, more safety, more stuff done and done well! It also runs reasonably fast, unlike similarly terse languages.

Being able to use it in a light-weight manner for one-off scripts is nice too.

knivets · on Jan 30, 2015

As far as I understand, you need interpreter only if you want to run file as a script, but if you compile beforehand, you can just use it, so no need for Haskell to be present in an operating system.

q3k · on Jan 30, 2015

What's the point then? You just end up with opaque binary blobs that you have to first cross-compile to every possible OS and architecture... How is that scripting by any stretch of the definition?

rjsw · on Jan 30, 2015

What is the point of PowerShell on Windows, you can just use COMMAND.COM.

iopq · on Jan 30, 2015

Fine, have shell scripts rm -rf $VARIABLE/* while other people try to create sane alternatives

Every few years someone writes a retarded install script that wipes your drive, it's like it's inevitable