The thinking behind attaching a PDF with colors and not a Genbank file is why we...

rubatuga · on March 29, 2021

Wait, you mean you don't extract genomic data from Excel? The MARCH1 gene brings many interesting surprises.

perl4ever · on March 30, 2021

Excel finally has a facility for manipulating data that keeps it where you put it. It also incorporates a fairly decent functional programming language. It's called Power Query, not to be confused with all the other things that MS has named starting with "Power" and have no relationship at all and are mostly awful.

The only real annoyance I have with it is that the editor window is modal, like it blocks all the spreadsheets you have open on your machine, and it's primitive even compared to VBA, especially for debugging.

It's not just that it's given me the experience of "this is the way a spreadsheet or BI tool should work" but also "this is the way SQL should work". It's a little cumbersome to do the standard SQL-type operations, but the clean integration of functions means you can implement anything that's missing. Like say, Oracle has grouping sets - you can, and I did, just write a function to do that. I always felt that having a separate procedural language in your database was wrong, but I'd never seen the alternative until now. And I've been falling in love with higher order functions.

andylynch · on March 30, 2021

Power query is one of the best things to be added to Excel in recent years. I especially like how it makes import/ cleanups easier to reproduce vs the old ways.

chromatin · on March 29, 2021

I am fond of September 2, myself.

For those not in the know:

https://genomebiology.biomedcentral.com/articles/10.1186/s13...

mmmrtl · on March 29, 2021

Now SEPTIN2! (and MARCHF1)

julienchastang · on March 29, 2021

Exactly. FAIR (Findable, Accessible, Interoperable and Reusable) principles are at a loss here [1]. The "Reusable" part seems to be especially problematic as the sequence is buried in a PDF file though all aspects of FAIR are compromised here. Edit: It looks like there is now a PR to address this issue [2]

[1] https://www.nature.com/articles/sdata201618

[2] https://github.com/NAalytics/Assemblies-of-putative-SARS-CoV...

ImaCake · on March 30, 2021

Things are getting better, but it still so so bad. The funny thing about that Nature article is that I recently had to parse a html table from a recent Nature article. Thankgod pd.read_html did a decent job and I then only needed another hour to hunt down all the typos and weird text issues.

brian_herman · on March 29, 2021

Here you go! https://github.com/brianherman/Assemblies-of-putative-SARS-C...

shellfishgene · on March 30, 2021

If there is no annotation or metadata FASTA format is usually preferred ;)

brian_herman · on March 30, 2021

Didn't know that TYVM https://raw.githubusercontent.com/brianherman/Assemblies-of-...

brian_herman · on March 30, 2021

Do you have a list of all popular formats?

flobosg · on March 29, 2021

My thoughts exactly!

Somewhere, Margaret O. Dayhoff is weeping.