Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> https://verapdf.org/

That is one shitty site. Trying to shove Google Analytics down my throat, no contact information, no privacy page. Probably illegal under GDPR.

> so I've sometimes wondered if a cheap form of "sanitizing" PDFs would be to simply force their PDF/A flags on.

That's not really how PDF-standards work. You'll have to "rewrite" the problematic parts, the standards are just for checking against the pre-defined ruleset.

In professional media production we do this "rewrite" all the time (PDF/X-standard). Though sometimes PDF files are just so "broken" that it's impossible to fix them.



> That is one shitty site.

Yes, I don't think it gets much attention - I should probably have pointed at the github org which is reasonably active. https://github.com/verapdf

> That's not really how PDF-standards work.

Well, it is how the standard works (don't make me dig out the relevant bit of what's publicly available from the standard) - the issue is whether common PDF readers actually do what they're "supposed to" or whether they just try and interpret as much as they can.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: