You Draw It: Family Income vs. College Attendence

erikb · on June 20, 2015

Drawing an assumption before evaluating data is a really good approach. Ten points to Griffindor for that one.

They said their guess and the data differ the same way as my guess. But then they think the data is correct and they just throw out their guess. That's actually wrong. This point is very important in statistics, but you actually already learn it in 4th or 5th grade math classes. Make a guess about the end result and if your calculated results is quite different be sceptical about your calculated result. Our ability to guess is not good at getting the exact numbers right, but it's really, really good at getting the big picture. Therefore when data and guess disagree there's a really good chance the data is wrong and we must question it and not just our guess (which we can also question, but questioning the data is more important).

PS: It's quite funny how often we consume lots of data and calculate a lot and still the right answer to the question is: "Dunno yet"

At least for me that's what I learned in the last few years on HN.

dnautics · on June 20, 2015

The analogy doesn't hold. The point of the NYT exercise is to challenge our perceptions/biases, not challenge calculations.

Yes, in science, say, you should make a guess about the end result, based on something, like first principles, direct observation, etc. Then, if the calculated result diverges, check both.

In this case though there is no reason for the general public to have had a good picture of appropriate curve; it's a lot of hearsay and ideology which biases what we think it should look like, and it is right to trust that less than the actual data.

avereveard · on June 20, 2015

this seems more like a very clever way to collect data about perceived inequality and could be applied to much more distributions

sitkack · on June 20, 2015

I'd love to see the geo-spatial distribution of guesses.

fauigerzigerk · on June 20, 2015

Isn't what you learn in 4th or 5th grade math classes that you should be sceptical about your own calculations if they diverge a lot from your estimate? When it comes to empirical data, I think the only sensible approach is to remain equally sceptical about both your guess and the way the data was compiled or modelled.

But after a few rounds of scepticism, data analysis and model critique you should get to a point where you trust the data more than your own guess. Otherwise we could just as well stop doing science at all.

TheSpiceIsLife · on June 20, 2015

Science theater is the practice of investing in research intended to provide the feeling of improved understanding while doing little or nothing to actually achieve it.

Or the start-up equivalent 'innovation theatre'.

fauigerzigerk · on June 20, 2015

What's your point?

pygy_ · on June 20, 2015

FWIW, my guess was at the 98th percentile, relatively close to the data (on the other hand, I was really bad at guessing the position of the ball in yesterday's football/soccer test).

The median guess probably suffers from a "connect the dots" bias, going from (0,0) to (100,100) through the suggested point.

Seeing how smooth the data is, it results from a large sample, and the measurements are objective, I fail to see how they could have gotten them wrong.

_wjtv · on June 20, 2015

> (on the other hand, I was really bad at guessing the position of the ball in yesterday's football/soccer test

Sounds interesting, but I seemed to have both missed that and am not noticing anything like it. Got a link?

beambot · on June 20, 2015

GP is being hyperbolic.

He's saying that, "Just because you can't guess the trend in the data (eg. the track of a soccer ball in yesterday's match) doesn't mean that something is wrong with the data; you often need to double-check both your hypothesis and your data. And in this case your hypothesis was probably at fault due to inherent bias rather than inaccuracies in the data."

maxerickson · on June 20, 2015

No, literal:

http://projects.nytimes.com/interactive/sports/worldcup/spot...

beambot · on June 20, 2015

HA! LoL. I stand corrected. ;)

pygy_ · on June 20, 2015

See here: https://news.ycombinator.com/item?id=9747102

cactusface · on June 20, 2015

Well, one thing they glossed over is the cost of tuition at the colleges in question varies. The poorest families are more likely to send their children to cheaper colleges. You might get an S curve if it was for colleges with $50K tuition, or whatever.

jschwartzi · on June 20, 2015

I agree but I would say it's not the poorest families that choose cheap colleges but lower middle-class families. People like teachers, office workers, et. al. who can't afford every college or who have to take out loans to send their kids to college are going to be price-sensitive. The poorest people generally make so little that they qualify for great financial aid without having to do academic scholarships.

ma2rten · on June 20, 2015

That was actually mentioned in the text though.

cactusface · on June 20, 2015

Ooops...

technomancy · on June 20, 2015

I wonder what the graph would look like if it was parents' income as an absolute value vs as a percentile.

leereeves · on June 20, 2015

It would look like a cliff, as most of the population would be packed on the left.

kens · on June 20, 2015

Out of curiosity I drew the US wealth distribution on a linear axis a while ago, and it's really insanely packed to the left. As in everyone but a couple thousand people are in a single pixel column on the left with less than $100 million. Half the population are within a wavelength of light's distance of 0 on a linear wealth scale. The point is that wealth is nothing like a normal distribution - the very rich are so rich that even millions of dollars is nothing in comparison.

http://www.righto.com/2013/03/wealth-distribution-in-united-...

tripzilch · on June 23, 2015

A few years ago, I made this histogram for a friend (I'm not American myself), who was interested in the modal income for the US, instead of the median income. The modal income is when you pick a US citizen at random, the expected value for their income, which can be seen as the location of the peak in this histogram:

http://i.imgur.com/euPugla.png

Note that the data is from 2008, and only counts employed citizens.

I compiled the data from an official US government data/demographics/census website (I forget which one, sorry). I noticed I could query the average income for all counties split over many "occupation groups". This gave me relatively fine-grained buckets (it doesn't really matter what they were, just that they were small-ish), weighted by the number of people in it, allowing me to plot the histogram. The really proper way to build this histogram would be to bucket an actual list of income for each individual US citizen. But that list is not available, for obvious reasons.

6502nerdface · on June 20, 2015

Yeah you'd have to use log or sqrt of income.

erikb · on June 20, 2015

You're right, it might also be about representation! Have completely forgotten about that one.

EGreg · on June 20, 2015

Actually aren't you supposed to adjust your beliefs according to Bayes' theorem? That's what the origial Bayes used it for.

cookingrobot · on June 20, 2015

Totally agree. My guess about the curve was wrong the same way as yours and the authors.

If that data's real then I want to know why it's tied to income and not something like wealth.

One wild theory.. quotas are based to income somehow, and that impressively straight line is because admissions officers are really good at their jobs.

eitally · on June 20, 2015

Well, for one (and I'm not saying this is the explanation, just an answer to your question), FAFSA student aid eligibility is based on income, not wealth.

jtuente · on June 20, 2015

Actually wealth is another factor for FAFSA, it's just so seldom that a family has wealth but not income. Things like college savings accounts are actually factored against FAFSA applicants.

carrotleads · on June 20, 2015

If your guess and data differ, wouldn't it be advisable to double check your data, collection mechanisms and your assumptions that led to your guess/estimate.

After a few goes your data needs to take precedence and one's guess dumped.

learnstats2 · on June 20, 2015

I learned also to check the original source: the data there doesn't seem to fit such a perfect straight line and does show a slight S curling at the edges.

It just shows an S curved the other way to what I drew.

erikb · on June 22, 2015

How did you draw an s cuve the other way? I thought there is only one way, a little extreme: horizontal line, diagonal increase, horizontal line.

kabouseng · on June 20, 2015

Sometimes your intuition is wrong, and you can follow the wrong path for a very long time because the reality doesn't fit your intuition (the history of science is littered with examples).

But yes, the scientific method is: make a hypothesis, collect data, test your hypothesis, revise, do it again.

kristopolous · on June 20, 2015

The level of variations in the text of the article depending on the drawing is a nice touch: http://imgur.com/a/pbmfc (didn't get anything special for drawing a bell curve BTW)

inerte · on June 20, 2015

Nice. It is also possible to get a "trust-fund dip" if you move the line downward near the end. You get a text with something like "you thought really rich people don't feel compelled to go to college".

jimmaswell · on June 20, 2015

I did that and didn't get that message. Wasn't a very extreme dip though

jshap70 · on June 20, 2015

The real scientist's approach right here

Sealy · on June 20, 2015

I really like the way the article gets its readers to interact by drawing the line and giving tailored feedback. It makes the article way more memorable and engaging.

This is EDUCATION done right.... Quite fitting really as the article is about Uni and education in itself!

This post definitely gets my up-vote.

rileyriley · on June 20, 2015

I think it's an amazing achievement and definitely worth studying. The nytimes have been doing a great job with visualizations and engagements like this - not in a spurious "hey click the flashing light" way but in a "think carefully, and now here's this article in terms of what you think" way.

Incredible! If only text books worked like this.

Desmos is trying to do this for math lessons: https://teacher.desmos.com/

blfr · on June 20, 2015

Previously https://news.ycombinator.com/item?id=9618827 (95 comments)

bildung · on June 20, 2015

I really like the idea of having the readers draw their assumption out. A great approach to creating conscious engagement with a complex topic.

The scales in this particular example are problematic, though. By using a logarithmic scale for the x dimension, the non-linear relationship between income and college attendance is hidden to the majority of readers. Logarithmic scales are hard to grasp for most people not working with numbers all day. Having percentages on both axes but only one of them being linear further obfuscates the variable relationship.

Veedrac · on June 20, 2015

Am I missing something or are they not both linear?

jacalata · on June 20, 2015

It's income rank, rather than actual income. In my explanatory text they mention that the difference between two points on the far left is a few hundred dollars and the difference between two points on the right is a million or so (I forget the actual numbers).

bildung · on June 20, 2015

As jacalata described, the actual income differences in dollars are not distributed linearly across the y axis. Each percentile interval represents a different dollar interval.

The y axis uses percentages, too. But it displays an absolute number of people, so the number of people in the 10-20% bracket equals those in the 80-90% bracket and so on.

The actual number of children won't be the same for each income bracket, but the are evenly distributed on the y axis. I think :)

sireat · on June 20, 2015

I got the starting and end points reasonably close, but I too drew an S line since I did not think that reality could be so linear.

Something so linear just makes me question whether there is some trickery going on. You learn early on that data is never so easily fitted.

galfarragem · on June 20, 2015

Interesting article but mostly because of the form not the conclusion - as refered in other comments, most people will guess the big picture. What's really impressive IMHO is that people in NY times are consistently killing it in mainstream interactive journalism.

nicklovescode · on June 20, 2015

Would be a good weekend project to make a generalized form of this(enter your own data, it constructs a page). Maybe I'll try it if I go to a hackathon sometime later this year, but someone should beat me to it!

carlob · on June 20, 2015

I would really like to see this with actual income on the x axis rather than just the percentile. And then again with the log of income.

cocoflunchy · on June 20, 2015

See http://unside.t4you.in/data/intuitive-axes/

Hyvel · on June 20, 2015

The link to the study comparing income and graduation was broken.

Here it is : http://www.nber.org/papers/w17633

The female advantage graphs are truly interesting.

todd8 · on June 20, 2015

This sentence from the paper sums up to me a remarkable trend:

> For the most recent cohorts, the four-year college graduation rate for women (32 percent) is ten points higher than the comparable rate for males (22 percent).

This means that women were completing college (I believe in the year 2006) at 1.5 times the rate that men did. That's a huge difference, and from the charts it appears that the difference has been increasing for decades without signs of leveling off.

relicscattergun · on June 20, 2015

Hasn't this been posted on HN before...like, about a week or so ago?

GnarfGnarf · on June 20, 2015

In college they would say "Attendance".

mattmaroon · on June 20, 2015

Am I the only one who constantly has nytimes website flip to another page due to scrolling? It's every time I read an article.

jimmaswell · on June 20, 2015

Why do the richest bother going to college if they can just live very well off interest and investments?

rifung · on June 20, 2015

I have many friends who are so wealthy they could easily live off their family's money for several generations. There are some who don't want to depend on their parents and actually make something of themselves. Of course there are others who go because their parents tell them to go. It seems to be looked down upon to just take your parents' money and do nothing with your life, or at least that's how it was from where I grew up (southern California).

matthiasl · on June 20, 2015

It's a good place to meet people.

It's interesting.

It's fun.

jimmaswell · on June 20, 2015

I imagine you could get all that without having an obligation of coursework hanging over you, but I guess at that point that doesn't matter much either

jmaygarden · on June 20, 2015

It's what all their friends are doing.

logicallee · on June 20, 2015

Very interesting. I was extremely accurate, but the 50th percentile point helped tremendously.

I started drawing from the left, with the approach given below, however by the time I was at the 50th percecntile mark I was nowhere near 50% so I moved those up a bit. Around 40th to 60th percentile is when I would have done the worst without the aid of the middle point. (Which brought my graph up.)

My graph: http://imgur.com/a/hjvdf

- You drew a more accurate picture of reality than about 98 percent of people who have tried so far.

- Your line was relatively straight, reflecting one of the more striking findings of this research: The relationship between college enrollment and parental-income rank is linear.

- Your guess was extremely accurate. Is that you, Raj Chetty?

This was my methodology for drawing "extremely accurately":

Starting at left (where I started drawing) I reasoned, primary and secondary education is free and mandatory in the united states, and there are huge scholarship and other support programs. So while the poorest of the poor have everything going against them in terms of family support and even culturally, plus likely the pressure of starting to work early, still, I reasoned at least one out of five can make use of the opportunities and begin attending college.

I then intended to proceed up steadily, but I intended to level off somewhat between the 40th and 70th percentile, because a lot of average-income people simply start working. As you can see, the free point moved my graph up (via adjustment by me when I saw that I was still short of it at that point) heavily.

I had then intended to proceed up linearly to a very high rate of college attendance, and after a certain income level (say, top 5%-3%) I intended to be around 100%. I thought basically 100% of the top of the top attended college, but for me being in the top 5% of income would have assured that. It's not like measuring "graduate degree" or something else. A college degree is quite standard for children of the top incomes. I also thought it would stop levelling up because some people from extremely rich families produce lazy children by spoiling them. If your child doesn't go to college when you're making $150,000, they're not going to go to college when you're making $200,000. If they're not going to go when you're pulling $1M per year, they certainly aren't when you're pulling $5M per year. (in fact might be slightly less likely to.)

As you can see from the rest of the album - http://imgur.com/a/hjvdf - I did quite well. But these effects aren't present at all.

Some of the poor attended exactly as I predicted, and this rose immediately with income. The effect I predicted that did NOT appear is that in the middle class, there is a firm distinction between being at the 40th or 60th percentile - it is still linear. I would have thought it didn't matter. If you look here - http://blogs.marketwatch.com/encore/2014/10/02/incomes-are-m... - sadly they only show a few select percentiles, but let's go around the $51,939 level at 50%. That is let's say $3,100 per month take-home pay, give or take. I wouldn't have expected there to be a huge decision on whether YOU will attend college, depending on whether YOUR FAMILY is earning $2,800 per month or $3,500 per month. Essentially, I think this wouldn't figure into your decision at all, period. Also, I think that being at the 40th to 60th percentile does NOT mark a shift in cultural status (what part of the middle class you're in), and also is incredibly fluid. A family's income could easily shift by this amount from one year to the next - what, would their child not attend college in 2003 but would attend in 2005, because they're taking home $3,500 per month instead of $2,800? Maybe to some small extend, but certainly not linearly.

So I expected a smooth or levelled-off part in the middle percentiles. This didn't happen, but the supplied point helped me avoid it - I actively adjusted my graph due to it.

I wonder what the reasoning was behind giving people a supplied point. I guess they wanted to know the shape people guessed, rather than the levels people guessed? Or set people's expectations, so that they don't have wild expectations about the percentage of the population that attends college?

alkonaut · on June 20, 2015

> I wonder what the reasoning was behind giving people a supplied point. I guess they wanted to know the shape people guessed, rather than the levels people guessed? Or set people's expectations, so that they don't have wild expectations about the percentage of the population that attends college?

Exactly. They essentially fix the "m" in the y = k*x + m equation, and let people guess only k. That is, you factor out a lot of the guesswork about how many actually attend higher education and let the question focus on how it varies with income. When you ask people for a guess it's much easier to average a scalar value than a tuple. Clever, if you ask me.

logicallee · on June 20, 2015

Yes, but anyone who has information about their immediate percentile-neighbors (but nobody else) now 2 data points (their own income-part of society and the supplied data point) and so unless they're around the 50th percentile themselves you have supplied them with a ton of information. I think it would have been far more instructive not to include that data point: people's guesses also would have been more revealing (the last, aggregate-guesses graph on the results page). (Especially since we can have a good idea about who the guessers are - readers of the NY Times who choose to complete this specific graph exercise.)

blumkvist · on June 20, 2015

    You drew a more accurate picture of reality than about 92 percent of people who have tried so far.
    You correctly guessed that children from the very poorest families face tough odds in going to college – only about one in four do.
    You underestimated the chances of college enrollment for the very richest children. In reality, about 94 percent of children from America’s richest families go to college. (You guessed around 77 percent.)

tezza · on June 20, 2015

One part Gimmick, two parts Chart Junk with a large dash of Navel Gazing and a squeeze of Self Re-inforcement.

This is almost the Comfortable Middle Class version of the Find-your-way-through-a-Maze puzzles that you find on paper placemats in roadside Burger Restaurants.