The date you've noticed is put there purposefully in this example Excel sheet by the original author.
>I then use the function pointblank::test_col_vals_in_set from the pointblank package to detect if there are any issues by setting no_issue to be either TRUE or FALSE.
>If there are any issues, I will isolate and display them.
>I will then correct the invalid dates accordingly and proceed. We just assume that the date was supposed to be 31-01-2017.
Yes, but the point I'm trying to make is that I think "isolation" is the wrong approach for data quality issues. That's like fishing out a turd. You need to quarantine the lot until you fix the upstream.
Automating that process (invalidation followed by filtering) is even worse, it will merely mask data quality issues when you want the opposite, to grind everything to a halt until you get workable and realistic data.
Perhaps I'm not "realistic" enough about real world data to work in data analytics. The frustration from ignoring bad data was a large part of why I quit data analysis for software development around 15 years ago.
For sure, you'd generally want to address the source of the issue rather than band-aid it, but that isn't always possible (or not possible immediately) and you just have to work with what you've got. Most of the time your boss isn't going to let you "grind everything to a halt".
But, more importantly, I think the blog is focused more on the practical "here's how to do x with some code", and less about the theory of data science.
post author here, thanks to both of these commenters for pointing out this issues with the intentionally bad date. I'll make note of this and update the post.
>I then use the function pointblank::test_col_vals_in_set from the pointblank package to detect if there are any issues by setting no_issue to be either TRUE or FALSE.
>If there are any issues, I will isolate and display them.
>I will then correct the invalid dates accordingly and proceed. We just assume that the date was supposed to be 31-01-2017.
https://jeremy-selva.netlify.app/blog/2024-02-15-tackling-fo...