More

jpf0 · on April 1, 2023

J recursion is $: You can do flow control in both scripting style and array style. Loops are loops.

Avshalom · on April 1, 2023

is $ meaningful in verb trains? because that was what I was referring to when I said no loops/recursion/flowcontrol

N.B. a sibling says J has added a direct definition construct while I wasn't watching which renders my comment largely irrelevant although the I feel general point that a lot of J 'example code' tends towards difficult-for-noobs to parse verb trains still holds.

jpf0 · on April 1, 2023

Yes. this one is fun. Recursive, memoized Fibonacci, the 155th integer precisely.

{{(-&2 +&$: -&1) ^: (1&<) M. y}} 155x

It'll run in your browser in 0.003 seconds.

https://jsoftware.github.io/j-playground/bin/html2/#code=%7B...

jpf0 · on March 31, 2023

Group is (was) a function in various array languages, including K, APL, BQN, J, and Shakti.

jpf0 · on Oct 16, 2022

Yes. Ask questions if you like.

BigElephant · on Oct 16, 2022

Can you share some of the projects? What are the typical use cases? Why use J instead of other languages?

jpf0 · on Oct 16, 2022

I cannot share specific projects. J is particularly good for numeric/algorithmic experimentation. It is interactive/REPL-based. Types and boilerplate are minimal. There is no compilation and minimal packages. This is the language I reach for when step 2 is not going to be “install these packages” or deploy to share code. It is most remarkable for data hacking, as it is trivial to manipulate structures and maintain performance. I use python, R, and clojure frequently, but the ability to move quickly in J is without parallel. Weaknesses include namespacing and deployment, although I have seen deployment of substantial codebases both on desktops and servers. Multi-threading and AX512 instructions in _your_ (not package) code, from the REPL are some of what you get with j904.

fhsm · on Oct 16, 2022

It sounds like we do similar sorts of things (based on tool list) but each time I poke at an APL like system I back away clear on learning curve and unclear on value.

At core, my job is arithmetic on 3D arrays of approx 10x1000x100,000,000.

The rub is that for every LOC of written manipulating those structures I’ve got 100 LOCs doing IO (broadly defined) and then 1,000-10,000 doing some form of ETL, QC, normalization (I.e. find and validate the correctness of the magic numbers that go in the cells of the big array).

Do you think J/APL would be of any use to me and if so where in your similar projects’ life cycle does it crop up?

jpf0 · on Oct 17, 2022

Possibly so. We do simple IO primarily from parquet or csv data, and I am working on the Apache Arrow/Flight package currently. Our data sets typically fit in RAM but either if you have enough RAM or a file amenable to memory mapping, you shouldn’t have a problem. J has a memory mapping utility. Numerical data is straightforward. We do ETL, QC, and normalization in J. Avoid transposes where you can, but reshapes are trivial and done with metadata, so fast. Overall what you describe sounds like a fun project to try in J.

BigElephant · on Oct 16, 2022

thanks! Can you share any non trivial J codebases on github (not from you or your organization)?

jpf0 · on Oct 17, 2022

I suspect the J package Jd is probably the most non-trivial public codebase. I don’t love the coding style (functions are long and scripted) and it doesn’t make use of newer lambda functions (“direct definitions”) which are easier to read. https://github.com/jsoftware/data_jd

jpf0 · on April 4, 2022

I did some work to compare Jd to data.tables and found that it was more performant in some instances such as on derived columns, and approximately equally performant on aggregations and queries. Jd is currently single-threaded, whereas multiple threads are important on some types of queries. I tried to further compare with Julia DB at the same time (maybe a year ago) and found that was incorrectly benchmarked by the authors and far slower than both; that might be different now. Jd is more equivalent to data.tables on disk; Clickhouse is far better at being a large-scale database.

Rules of thumb on memory usage: Python/Pandas (not memory-mapped): "In Pandas, the rule of thumb is needing 5x-10x the memory for the size of your data.” R (not memory-mapped): "A rough rule of thumb is that your RAM should be three times the size of your data set.” Jd: "In general, performance will be good if available ram is more than 2 times the space required by the cols typically used in a query.”

Re CSV reading, Jd has a fast CSV reader whereas J itself does not. I have written an Arrow integration to enable J to get to that fast CSV reader and read Parquet.

jpf0 · on March 24, 2022

J FFI lib call: https://code.jsoftware.com/wiki/Scripts/CallJ and https://code.jsoftware.com/wiki/Guides/DLLs/Calling_the_J_DL... database: odbc, jd, sqlite, mysql, [arrow bindings] network services: sockets, websockets, zmq, & http 1.1 text: unicode support, names limited to asci https://wiki.jsoftware.com/wiki/Guides/UnicodeGettingStarted

jpf0 · on March 2, 2022

The claim here is +80% reduction in time, +5x throughput relative the current standard.

Nice to see illumination of the benefit of high-level algorithmic design associated with the first-principals aspect of APL.

This is also an interesting claim presently, given there's an active conversation in the Jsoftware community currently on how to handle tree structures.

mamcx · on March 3, 2022

What is more interesting: Implement it was pretty simple and I don't resort to anything fancy.

P.D: You have the link to the Jsoftware community? Or wanna link back from here?

jpf0 · on March 1, 2022

Since you supposedly know something about this topic, please contribute it.

keithalewis · on March 3, 2022

Okay. Learn how to install and link a BLAS version appropriate for your machine architecture. Do not assume the particular compiler you are currently using has a magical optimizer.

This will future-proof your code and make it more portable, not to mention instill confidence that your code is correct. Of course if you are just playing around with a toy project, this is probably not worth the effort.

jpf0 · on Feb 25, 2022

Companies that use APL and other array languages in production:

https://github.com/interregna/arraylanguage-companies

jpf0 · on Dec 21, 2021

A bunch of questions: Is multi-threading happening at the level of user-defined functions? How are threads scheduled? What's the underlying method or library for enabling multi-threading (Cilk, openMP, some LWT library...)? To what extent has this level of granularity been tested against other levels of nested parallelism (e.g. SIMD or otherwise parallel operators)? Have you tested performance by OS, and if so, have you noted any necessary OS-level modifications related to thread management? Is this part of a broader roadmap for accelerator integration?

sadiq · on Dec 21, 2021

1. Domains are the unit of parallelism. A domain is essentially an OS thread with a bunch of extra runtime book-keeping data. You can use Domain.spawn (https://github.com/ocaml-multicore/ocaml-multicore/blob/5.00...) to spawn off a new domain which will run the supplied function and terminate when it finishes. This is heavyweight though, domains are expected to be long-running.

2. Domainslib is the library developed alongside multicore to aid users in exploiting parallelism. It supports nested parallelism and is pretty highly optimised (https://github.com/ocaml-multicore/domainslib/pull/29 for some graphs/numbers). The domainslib repo has some good examples: https://github.com/ocaml-multicore/domainslib/tree/master/te...

3. We've not tested against other forms of parallelism. There isn't anything stopping you exploiting SIMD in addition to parallelism from domains.

4. No, we've not compared performance by OS.

5. No plans for the multicore team to look at accelerator integration at the moment.

jpf0 · on Oct 20, 2021

https://github.com/interregna/arraylanguage-companies