Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thinking behind attaching a PDF with colors and not a Genbank file is why we can't have nice things in biotechnology.


Wait, you mean you don't extract genomic data from Excel? The MARCH1 gene brings many interesting surprises.


Excel finally has a facility for manipulating data that keeps it where you put it. It also incorporates a fairly decent functional programming language. It's called Power Query, not to be confused with all the other things that MS has named starting with "Power" and have no relationship at all and are mostly awful.

The only real annoyance I have with it is that the editor window is modal, like it blocks all the spreadsheets you have open on your machine, and it's primitive even compared to VBA, especially for debugging.

It's not just that it's given me the experience of "this is the way a spreadsheet or BI tool should work" but also "this is the way SQL should work". It's a little cumbersome to do the standard SQL-type operations, but the clean integration of functions means you can implement anything that's missing. Like say, Oracle has grouping sets - you can, and I did, just write a function to do that. I always felt that having a separate procedural language in your database was wrong, but I'd never seen the alternative until now. And I've been falling in love with higher order functions.


Power query is one of the best things to be added to Excel in recent years. I especially like how it makes import/ cleanups easier to reproduce vs the old ways.


I am fond of September 2, myself.

For those not in the know:

https://genomebiology.biomedcentral.com/articles/10.1186/s13...


Now SEPTIN2! (and MARCHF1)


Exactly. FAIR (Findable, Accessible, Interoperable and Reusable) principles are at a loss here [1]. The "Reusable" part seems to be especially problematic as the sequence is buried in a PDF file though all aspects of FAIR are compromised here. Edit: It looks like there is now a PR to address this issue [2]

[1] https://www.nature.com/articles/sdata201618

[2] https://github.com/NAalytics/Assemblies-of-putative-SARS-CoV...


Things are getting better, but it still so so bad. The funny thing about that Nature article is that I recently had to parse a html table from a recent Nature article. Thankgod pd.read_html did a decent job and I then only needed another hour to hunt down all the typos and weird text issues.



If there is no annotation or metadata FASTA format is usually preferred ;)



Do you have a list of all popular formats?


My thoughts exactly!

Somewhere, Margaret O. Dayhoff is weeping.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: