Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
You Draw It: Family Income vs. College Attendence (nytimes.com)
245 points by rgbrgb on June 20, 2015 | hide | past | favorite | 62 comments


Drawing an assumption before evaluating data is a really good approach. Ten points to Griffindor for that one.

They said their guess and the data differ the same way as my guess. But then they think the data is correct and they just throw out their guess. That's actually wrong. This point is very important in statistics, but you actually already learn it in 4th or 5th grade math classes. Make a guess about the end result and if your calculated results is quite different be sceptical about your calculated result. Our ability to guess is not good at getting the exact numbers right, but it's really, really good at getting the big picture. Therefore when data and guess disagree there's a really good chance the data is wrong and we must question it and not just our guess (which we can also question, but questioning the data is more important).

PS: It's quite funny how often we consume lots of data and calculate a lot and still the right answer to the question is: "Dunno yet"

At least for me that's what I learned in the last few years on HN.


The analogy doesn't hold. The point of the NYT exercise is to challenge our perceptions/biases, not challenge calculations.

Yes, in science, say, you should make a guess about the end result, based on something, like first principles, direct observation, etc. Then, if the calculated result diverges, check both.

In this case though there is no reason for the general public to have had a good picture of appropriate curve; it's a lot of hearsay and ideology which biases what we think it should look like, and it is right to trust that less than the actual data.


this seems more like a very clever way to collect data about perceived inequality and could be applied to much more distributions


I'd love to see the geo-spatial distribution of guesses.


Isn't what you learn in 4th or 5th grade math classes that you should be sceptical about your own calculations if they diverge a lot from your estimate? When it comes to empirical data, I think the only sensible approach is to remain equally sceptical about both your guess and the way the data was compiled or modelled.

But after a few rounds of scepticism, data analysis and model critique you should get to a point where you trust the data more than your own guess. Otherwise we could just as well stop doing science at all.


Science theater is the practice of investing in research intended to provide the feeling of improved understanding while doing little or nothing to actually achieve it.

Or the start-up equivalent 'innovation theatre'.


What's your point?


FWIW, my guess was at the 98th percentile, relatively close to the data (on the other hand, I was really bad at guessing the position of the ball in yesterday's football/soccer test).

The median guess probably suffers from a "connect the dots" bias, going from (0,0) to (100,100) through the suggested point.

Seeing how smooth the data is, it results from a large sample, and the measurements are objective, I fail to see how they could have gotten them wrong.


> (on the other hand, I was really bad at guessing the position of the ball in yesterday's football/soccer test

Sounds interesting, but I seemed to have both missed that and am not noticing anything like it. Got a link?


GP is being hyperbolic.

He's saying that, "Just because you can't guess the trend in the data (eg. the track of a soccer ball in yesterday's match) doesn't mean that something is wrong with the data; you often need to double-check both your hypothesis and your data. And in this case your hypothesis was probably at fault due to inherent bias rather than inaccuracies in the data."



HA! LoL. I stand corrected. ;)



Well, one thing they glossed over is the cost of tuition at the colleges in question varies. The poorest families are more likely to send their children to cheaper colleges. You might get an S curve if it was for colleges with $50K tuition, or whatever.


I agree but I would say it's not the poorest families that choose cheap colleges but lower middle-class families. People like teachers, office workers, et. al. who can't afford every college or who have to take out loans to send their kids to college are going to be price-sensitive. The poorest people generally make so little that they qualify for great financial aid without having to do academic scholarships.


That was actually mentioned in the text though.


Ooops...


I wonder what the graph would look like if it was parents' income as an absolute value vs as a percentile.


It would look like a cliff, as most of the population would be packed on the left.


Out of curiosity I drew the US wealth distribution on a linear axis a while ago, and it's really insanely packed to the left. As in everyone but a couple thousand people are in a single pixel column on the left with less than $100 million. Half the population are within a wavelength of light's distance of 0 on a linear wealth scale. The point is that wealth is nothing like a normal distribution - the very rich are so rich that even millions of dollars is nothing in comparison.

http://www.righto.com/2013/03/wealth-distribution-in-united-...


A few years ago, I made this histogram for a friend (I'm not American myself), who was interested in the modal income for the US, instead of the median income. The modal income is when you pick a US citizen at random, the expected value for their income, which can be seen as the location of the peak in this histogram:

http://i.imgur.com/euPugla.png

Note that the data is from 2008, and only counts employed citizens.

I compiled the data from an official US government data/demographics/census website (I forget which one, sorry). I noticed I could query the average income for all counties split over many "occupation groups". This gave me relatively fine-grained buckets (it doesn't really matter what they were, just that they were small-ish), weighted by the number of people in it, allowing me to plot the histogram. The really proper way to build this histogram would be to bucket an actual list of income for each individual US citizen. But that list is not available, for obvious reasons.


Yeah you'd have to use log or sqrt of income.


You're right, it might also be about representation! Have completely forgotten about that one.


Actually aren't you supposed to adjust your beliefs according to Bayes' theorem? That's what the origial Bayes used it for.


Totally agree. My guess about the curve was wrong the same way as yours and the authors.

If that data's real then I want to know why it's tied to income and not something like wealth.

One wild theory.. quotas are based to income somehow, and that impressively straight line is because admissions officers are really good at their jobs.


Well, for one (and I'm not saying this is the explanation, just an answer to your question), FAFSA student aid eligibility is based on income, not wealth.


Actually wealth is another factor for FAFSA, it's just so seldom that a family has wealth but not income. Things like college savings accounts are actually factored against FAFSA applicants.


If your guess and data differ, wouldn't it be advisable to double check your data, collection mechanisms and your assumptions that led to your guess/estimate.

After a few goes your data needs to take precedence and one's guess dumped.


I learned also to check the original source: the data there doesn't seem to fit such a perfect straight line and does show a slight S curling at the edges.

It just shows an S curved the other way to what I drew.


How did you draw an s cuve the other way? I thought there is only one way, a little extreme: horizontal line, diagonal increase, horizontal line.


Sometimes your intuition is wrong, and you can follow the wrong path for a very long time because the reality doesn't fit your intuition (the history of science is littered with examples).

But yes, the scientific method is: make a hypothesis, collect data, test your hypothesis, revise, do it again.


The level of variations in the text of the article depending on the drawing is a nice touch: http://imgur.com/a/pbmfc (didn't get anything special for drawing a bell curve BTW)


Nice. It is also possible to get a "trust-fund dip" if you move the line downward near the end. You get a text with something like "you thought really rich people don't feel compelled to go to college".


I did that and didn't get that message. Wasn't a very extreme dip though


The real scientist's approach right here


I really like the way the article gets its readers to interact by drawing the line and giving tailored feedback. It makes the article way more memorable and engaging.

This is EDUCATION done right.... Quite fitting really as the article is about Uni and education in itself!

This post definitely gets my up-vote.


I think it's an amazing achievement and definitely worth studying. The nytimes have been doing a great job with visualizations and engagements like this - not in a spurious "hey click the flashing light" way but in a "think carefully, and now here's this article in terms of what you think" way.

Incredible! If only text books worked like this.

Desmos is trying to do this for math lessons: https://teacher.desmos.com/



I really like the idea of having the readers draw their assumption out. A great approach to creating conscious engagement with a complex topic.

The scales in this particular example are problematic, though. By using a logarithmic scale for the x dimension, the non-linear relationship between income and college attendance is hidden to the majority of readers. Logarithmic scales are hard to grasp for most people not working with numbers all day. Having percentages on both axes but only one of them being linear further obfuscates the variable relationship.


Am I missing something or are they not both linear?


It's income rank, rather than actual income. In my explanatory text they mention that the difference between two points on the far left is a few hundred dollars and the difference between two points on the right is a million or so (I forget the actual numbers).


As jacalata described, the actual income differences in dollars are not distributed linearly across the y axis. Each percentile interval represents a different dollar interval.

The y axis uses percentages, too. But it displays an absolute number of people, so the number of people in the 10-20% bracket equals those in the 80-90% bracket and so on.

The actual number of children won't be the same for each income bracket, but the are evenly distributed on the y axis. I think :)


I got the starting and end points reasonably close, but I too drew an S line since I did not think that reality could be so linear.

Something so linear just makes me question whether there is some trickery going on. You learn early on that data is never so easily fitted.


Interesting article but mostly because of the form not the conclusion - as refered in other comments, most people will guess the big picture. What's really impressive IMHO is that people in NY times are consistently killing it in mainstream interactive journalism.


Would be a good weekend project to make a generalized form of this(enter your own data, it constructs a page). Maybe I'll try it if I go to a hackathon sometime later this year, but someone should beat me to it!


I would really like to see this with actual income on the x axis rather than just the percentile. And then again with the log of income.



The link to the study comparing income and graduation was broken.

Here it is : http://www.nber.org/papers/w17633

The female advantage graphs are truly interesting.


This sentence from the paper sums up to me a remarkable trend:

> For the most recent cohorts, the four-year college graduation rate for women (32 percent) is ten points higher than the comparable rate for males (22 percent).

This means that women were completing college (I believe in the year 2006) at 1.5 times the rate that men did. That's a huge difference, and from the charts it appears that the difference has been increasing for decades without signs of leveling off.


Hasn't this been posted on HN before...like, about a week or so ago?


In college they would say "Attendance".


Am I the only one who constantly has nytimes website flip to another page due to scrolling? It's every time I read an article.


Why do the richest bother going to college if they can just live very well off interest and investments?


I have many friends who are so wealthy they could easily live off their family's money for several generations. There are some who don't want to depend on their parents and actually make something of themselves. Of course there are others who go because their parents tell them to go. It seems to be looked down upon to just take your parents' money and do nothing with your life, or at least that's how it was from where I grew up (southern California).


It's a good place to meet people.

It's interesting.

It's fun.


I imagine you could get all that without having an obligation of coursework hanging over you, but I guess at that point that doesn't matter much either


It's what all their friends are doing.


Very interesting. I was extremely accurate, but the 50th percentile point helped tremendously.

I started drawing from the left, with the approach given below, however by the time I was at the 50th percecntile mark I was nowhere near 50% so I moved those up a bit. Around 40th to 60th percentile is when I would have done the worst without the aid of the middle point. (Which brought my graph up.)

My graph: http://imgur.com/a/hjvdf

- You drew a more accurate picture of reality than about 98 percent of people who have tried so far.

- Your line was relatively straight, reflecting one of the more striking findings of this research: The relationship between college enrollment and parental-income rank is linear.

- Your guess was extremely accurate. Is that you, Raj Chetty?

This was my methodology for drawing "extremely accurately":

Starting at left (where I started drawing) I reasoned, primary and secondary education is free and mandatory in the united states, and there are huge scholarship and other support programs. So while the poorest of the poor have everything going against them in terms of family support and even culturally, plus likely the pressure of starting to work early, still, I reasoned at least one out of five can make use of the opportunities and begin attending college.

I then intended to proceed up steadily, but I intended to level off somewhat between the 40th and 70th percentile, because a lot of average-income people simply start working. As you can see, the free point moved my graph up (via adjustment by me when I saw that I was still short of it at that point) heavily.

I had then intended to proceed up linearly to a very high rate of college attendance, and after a certain income level (say, top 5%-3%) I intended to be around 100%. I thought basically 100% of the top of the top attended college, but for me being in the top 5% of income would have assured that. It's not like measuring "graduate degree" or something else. A college degree is quite standard for children of the top incomes. I also thought it would stop levelling up because some people from extremely rich families produce lazy children by spoiling them. If your child doesn't go to college when you're making $150,000, they're not going to go to college when you're making $200,000. If they're not going to go when you're pulling $1M per year, they certainly aren't when you're pulling $5M per year. (in fact might be slightly less likely to.)

As you can see from the rest of the album - http://imgur.com/a/hjvdf - I did quite well. But these effects aren't present at all.

Some of the poor attended exactly as I predicted, and this rose immediately with income. The effect I predicted that did NOT appear is that in the middle class, there is a firm distinction between being at the 40th or 60th percentile - it is still linear. I would have thought it didn't matter. If you look here - http://blogs.marketwatch.com/encore/2014/10/02/incomes-are-m... - sadly they only show a few select percentiles, but let's go around the $51,939 level at 50%. That is let's say $3,100 per month take-home pay, give or take. I wouldn't have expected there to be a huge decision on whether YOU will attend college, depending on whether YOUR FAMILY is earning $2,800 per month or $3,500 per month. Essentially, I think this wouldn't figure into your decision at all, period. Also, I think that being at the 40th to 60th percentile does NOT mark a shift in cultural status (what part of the middle class you're in), and also is incredibly fluid. A family's income could easily shift by this amount from one year to the next - what, would their child not attend college in 2003 but would attend in 2005, because they're taking home $3,500 per month instead of $2,800? Maybe to some small extend, but certainly not linearly.

So I expected a smooth or levelled-off part in the middle percentiles. This didn't happen, but the supplied point helped me avoid it - I actively adjusted my graph due to it.

I wonder what the reasoning was behind giving people a supplied point. I guess they wanted to know the shape people guessed, rather than the levels people guessed? Or set people's expectations, so that they don't have wild expectations about the percentage of the population that attends college?


> I wonder what the reasoning was behind giving people a supplied point. I guess they wanted to know the shape people guessed, rather than the levels people guessed? Or set people's expectations, so that they don't have wild expectations about the percentage of the population that attends college?

Exactly. They essentially fix the "m" in the y = k*x + m equation, and let people guess only k. That is, you factor out a lot of the guesswork about how many actually attend higher education and let the question focus on how it varies with income. When you ask people for a guess it's much easier to average a scalar value than a tuple. Clever, if you ask me.


Yes, but anyone who has information about their immediate percentile-neighbors (but nobody else) now 2 data points (their own income-part of society and the supplied data point) and so unless they're around the 50th percentile themselves you have supplied them with a ton of information. I think it would have been far more instructive not to include that data point: people's guesses also would have been more revealing (the last, aggregate-guesses graph on the results page). (Especially since we can have a good idea about who the guessers are - readers of the NY Times who choose to complete this specific graph exercise.)


    You drew a more accurate picture of reality than about 92 percent of people who have tried so far.
    You correctly guessed that children from the very poorest families face tough odds in going to college – only about one in four do.
    You underestimated the chances of college enrollment for the very richest children. In reality, about 94 percent of children from America’s richest families go to college. (You guessed around 77 percent.)


One part Gimmick, two parts Chart Junk with a large dash of Navel Gazing and a squeeze of Self Re-inforcement.

This is almost the Comfortable Middle Class version of the Find-your-way-through-a-Maze puzzles that you find on paper placemats in roadside Burger Restaurants.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: