Hacker Newsnew | past | comments | ask | show | jobs | submit | awenger's commentslogin

The data from the project is released to the public domain (CC0). The research article is also free to access.

See https://github.com/marbl/CHM13 and https://www.science.org/doi/10.1126/science.abj6987.


Complete here means the full end-to-end sequence of all chromosomes in a single human cell line named CHM13. The typical human cell has 46 chromosomes, in 23 pairs (one from our mother, one from our father) named chromosome 1, chromosome 2, and so on. This CHM13 cell line is special is that each of its pairs is (nearly) identical. Each chromosome is a long string of A,C,G,T nucleotides. So, this complete genome is a full set of 23 sequences without any "not sure" positions or "gaps" in the sequence.

One common analogy is to consider the genome sequence (a.k.a. assembly) as a map. Since the initial publication of the human genome in the early 2000s, most regions of human DNA has been known in full resolution. Other portions, most prominently the repetitive centromeres that lie at the middle of chromosomes, have remained unmapped. It was known that they exist, approximately how big they were, and which types of sequences lay inside, but the full order of the sequence had never been determined for any human genome until this work.

You could consider the genome like the earth and the centromeres like a dense rainforest. Previously we had detailed maps of most of the earth, and we had mapped the boundaries of the rainforest and had satellite-level images (i.e. we knew they were full of plants). Now we have on-the-ground pictures with full detail.

Having a map of these sequences makes the accessible to study. One of the most valuable uses of the human genome is as a shared coordinate system used by scientists to compare different individuals and identify and name genetic variants that explain human traits. We lacked that coordinate system for a big chunk of the genome until now.

As you say, this paper reports the sequence of a single human cell line named CHM13. Each of us has a slightly different genome sequence (really two of them, one from each parent). Now when scientists sequence the genomes of more individuals, they can look at these regions that were previously ignored. Certainly understanding those regions will improve our understanding of human biology. Exactly how much will remain to be seen.


Well not quite: There is still a lot of ambiguity and compression in centromeres. But I agree that we are almost there.

So, this complete genome is a full set of 23 sequences without any "not sure" positions or "gaps" in the sequence.


What's a cell line, and do we know anything about who CHM13 is?


chm13 is from a "complete hydatidiform mole" https://en.wikipedia.org/wiki/Molar_pregnancy and the paper says "Local ancestry analysis shows that most of the CHM13 genome is of European origin, including regions of Neanderthal introgression, with some predicted admixture" and fig 1 shows a cool breakdown of the regions of the genome with different ancestries


Seems to be an immortalized (telomerase*-transformed) cell line from a female fetus with near-complete homozygosity (https://sites.google.com/ucsc.edu/t2tworkinggroup/chm13-cell...).

* Telomerase is a reverse transcriptase that allows to achieve replicative immortality (https://academic.oup.com/hmg/article/9/3/403/715108).


> The typical human cell has 46 chromosomes, in 23 pairs

Mitochondria have their own DNA, which is also sequenced.


Same here. Only after login though.


OP here. That's correct. This is work of Survata not Yahoo. I see how the title might suggest a connection, but that was not our intent.


Survata co-founder here. To clarify, Survata is not a voluntary response sample. Voluntary samples often have a bias because the individuals who choose to respond are those with strong feelings on a topic. For our surveys, the primary incentive is access to premium content - and not a desire to express one's opinion on a topic. We aim to have a respondent pool that truly represents the population.


Survata co-founder here.

Good point. We had the same thought and did consider running a survey variant with the fixed 6 mo time frame. And we may just give it a try to see how the results are affected.

Even with the current wording, I find the SMS comparison useful. It demonstrates that people are willing to admit to sexting in the anonymous Survata survey format. I like your hypothesis about greater willingness to admit to "bad behavior" in the distant past than in the recent past. My intuition is that anonymity weakens that effect, but we'll have to measure to know for certain.


"I like your hypothesis about greater willingness to admit to "bad behavior" in the distant past than in the recent past. My intuition is that anonymity weakens that effect, but we'll have to measure to know for certain."

If you look at this report on the validity of self-reported drug use, it goes into the issue of how people are more likely to admit bad behavior that happened long ago. Anonymity probably does ameliorate the problem, but I'm guessing that it would still be significant.

http://archives.drugabuse.gov/pdf/monographs/monograph167/do...


Great link—thanks for posting it—too few people cite the sources for their beliefs. In a related context, "Truth and consequences: using the bogus pipeline to examine sex differences in self-reported sexuality" discusses how "some of the sex differences in self-reports of sexuality are not due to actual sex differences in behavior, but rather to differences in reporting as a function of differential normative expectations for men and women": http://www.thefreelibrary.com/Truth+and+consequences%3a+usin... .


That's one of the first things that we (I'm a Survata co-founder) noted in seeing the data too.

I see a few possible explanations: (1) A woman who sexts could have multiple sexting partners. In the extreme, you could have every man in the world sext with one woman, making the male sexting prevalence 100% and the female near 0. (2) While we defined sexting as "sending or receiving", some respondents may interpret the question as primarily about sending. There could be a gender bias in the sending vs receiving of sexts. (3) As you point out, the data is reported behavior and not observed behavior. Reported behavior often is a good proxy for observed behavior, but it is not perfect. And there are known to be effects where certain demographics answer questions dishonestly for conscious or unconscious reasons. Perhaps women are less willing to admit to sexting behavior.


(Survata co-founder here)

Survata has a DIY survey creation tool, but we review and suggest wording changes to avoid biased questions. We also advise on how to arrange (and randomize) answer choices to allow us to calculate and compensate for answer biases like always clicking the first or last option.

Responses are gathered on surveywalls across the web, where visitors answer short surveys in exchange for free access to premium content (e.g. ebook or video).


(Survata co-founder here)

Garry's explanation is a good one. The data for this survey was collected via surveywalls (example at [1]), which let visitors access premium content online for free in exchange for answering a few questions. All respondents here have US IP addresses and self-report age in the 13-25yr range. We generally see honesty rates of 90% or higher to questions for which we can verify the answer (e.g. "Which OS are you currently using?" or "Who is the President of the US?").

1. Example surveywall: http://www.hyperink.com/So-You-Want-To-Be-A-Programmer-b1559...)


I've enjoyed the recent 5-hour ENERGY ad about how many doctors approve of their product. A whopping 73% of doctors recommend low calorie energy products... when measuring the percent of those who recommend energy products.

http://www.youtube.com/watch?v=RCqT3fdAAHQ

While there are abusive uses like push polls and leading questions, there are legitimate uses too.


That example is even worse; 73% of doctors recommend that if you must use an energy product, you should use a low-calorie one.


I always forget how sarcasm is lost online. I meant that as an example of a misleading, bad poll.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: