Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Recursion isn't the problem. Not keeping track of seen tweets is the problem. Recursion can be used to detect cycles and traverse a cyclic graph in a way that doesn't blow up.


But you wouldn’t do that if your assumption is that the graph doesn’t cycle.

Edit: child comments are correct and I regret my oversight


For traversing a DAG you probably still would to avoid exploring an exponential number of paths (consider a chain of diamonds [1]).

1: https://www.researchgate.net/figure/A-diamond-shaped-DAG_fig...


But a diamond cannot occur in a Twitter-reply graph, right? It would require a Tweet to be able to reply to more than one tweet.


It can because the assumption is that we are crawling embedded links as well as native parents.


You can reply to one tweet while quote-retweeting another. Depending on how links are defined that could result in a diamond.


It’s a lot easier to just have a depth limit on such non-cyclic graphs than keep an in-memory list of previously seen nodes. its interesting for sure! but a much rarer edge-case imo


Graph doesn't need to cycle to hit a node twice. Good code does not visit the same node twice.


When doing recursion (Postgres recursive CTE) I keep a path on the latest edges and check to see new edges aren't already visited, so same nodes can appear in multiple branches but not on the same branch. Works flawlessly.


Can you provide an example of this? I’d like to understand this technique.

Edit: is this an example? https://stackoverflow.com/a/1757915


Irritation would probably better in terms of not blowing up your memory requirements. Just keep hashes of all visited nodes and a stack or queue of to-be-visited nodes and loop until you have no more to-be-visited nodes.


I think you meant "iteration", not "irritation".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: