is $ meaningful in verb trains? because that was what I was referring to when I said no loops/recursion/flowcontrol
N.B. a sibling says J has added a direct definition construct while I wasn't watching which renders my comment largely irrelevant although the I feel general point that a lot of J 'example code' tends towards difficult-for-noobs to parse verb trains still holds.
I cannot share specific projects. J is particularly good for numeric/algorithmic experimentation. It is interactive/REPL-based. Types and boilerplate are minimal. There is no compilation and minimal packages. This is the language I reach for when step 2 is not going to be “install these packages” or deploy to share code. It is most remarkable for data hacking, as it is trivial to manipulate structures and maintain performance. I use python, R, and clojure frequently, but the ability to move quickly in J is without parallel. Weaknesses include namespacing and deployment, although I have seen deployment of substantial codebases both on desktops and servers. Multi-threading and AX512 instructions in _your_ (not package) code, from the REPL are some of what you get with j904.
It sounds like we do similar sorts of things (based on tool list) but each time I poke at an APL like system I back away clear on learning curve and unclear on value.
At core, my job is arithmetic on 3D arrays of approx 10x1000x100,000,000.
The rub is that for every LOC of written manipulating those structures I’ve got 100 LOCs doing IO (broadly defined) and then 1,000-10,000 doing some form of ETL, QC, normalization (I.e. find and validate the correctness of the magic numbers that go in the cells of the big array).
Do you think J/APL would be of any use to me and if so where in your similar projects’ life cycle does it crop up?
Possibly so. We do simple IO primarily from parquet or csv data, and I am working on the Apache Arrow/Flight package currently. Our data sets typically fit in RAM but either if you have enough RAM or a file amenable to memory mapping, you shouldn’t have a problem. J has a memory mapping utility. Numerical data is straightforward. We do ETL, QC, and normalization in J. Avoid transposes where you can, but reshapes are trivial and done with metadata, so fast. Overall what you describe sounds like a fun project to try in J.
I suspect the J package Jd is probably the most non-trivial public codebase. I don’t love the coding style (functions are long and scripted) and it doesn’t make use of newer lambda functions (“direct definitions”) which are easier to read. https://github.com/jsoftware/data_jd
I did some work to compare Jd to data.tables and found that it was more performant in some instances such as on derived columns, and approximately equally performant on aggregations and queries. Jd is currently single-threaded, whereas multiple threads are important on some types of queries. I tried to further compare with Julia DB at the same time (maybe a year ago) and found that was incorrectly benchmarked by the authors and far slower than both; that might be different now. Jd is more equivalent to data.tables on disk; Clickhouse is far better at being a large-scale database.
Rules of thumb on memory usage:
Python/Pandas (not memory-mapped): "In Pandas, the rule of thumb is needing 5x-10x the memory for the size of your data.”
R (not memory-mapped): "A rough rule of thumb is that your RAM should be three times the size of your data set.”
Jd: "In general, performance will be good if available ram is more than 2 times the space required by the cols typically used in a query.”
Re CSV reading, Jd has a fast CSV reader whereas J itself does not. I have written an Arrow integration to enable J to get to that fast CSV reader and read Parquet.
The claim here is +80% reduction in time, +5x throughput relative the current standard.
Nice to see illumination of the benefit of high-level algorithmic design associated with the first-principals aspect of APL.
This is also an interesting claim presently, given there's an active conversation in the Jsoftware community currently on how to handle tree structures.
Okay. Learn how to install and link a BLAS version appropriate for your machine architecture. Do not assume the particular compiler you are currently using has a magical optimizer.
This will future-proof your code and make it more portable, not to mention instill confidence that your code is correct. Of course if you are just playing around with a toy project, this is probably not worth the effort.
A bunch of questions: Is multi-threading happening at the level of user-defined functions? How are threads scheduled? What's the underlying method or library for enabling multi-threading (Cilk, openMP, some LWT library...)? To what extent has this level of granularity been tested against other levels of nested parallelism (e.g. SIMD or otherwise parallel operators)? Have you tested performance by OS, and if so, have you noted any necessary OS-level modifications related to thread management? Is this part of a broader roadmap for accelerator integration?
1. Domains are the unit of parallelism. A domain is essentially an OS thread with a bunch of extra runtime book-keeping data. You can use Domain.spawn (https://github.com/ocaml-multicore/ocaml-multicore/blob/5.00...) to spawn off a new domain which will run the supplied function and terminate when it finishes. This is heavyweight though, domains are expected to be long-running.