That is one shitty site. Trying to shove Google Analytics down my throat, no contact information, no privacy page. Probably illegal under GDPR.
> so I've sometimes wondered if a cheap form of "sanitizing" PDFs would be to simply force their PDF/A flags on.
That's not really how PDF-standards work. You'll have to "rewrite" the problematic parts, the standards are just for checking against the pre-defined ruleset.
In professional media production we do this "rewrite" all the time (PDF/X-standard). Though sometimes PDF files are just so "broken" that it's impossible to fix them.
Yes, I don't think it gets much attention - I should probably have pointed at the github org which is reasonably active. https://github.com/verapdf
> That's not really how PDF-standards work.
Well, it is how the standard works (don't make me dig out the relevant bit of what's publicly available from the standard) - the issue is whether common PDF readers actually do what they're "supposed to" or whether they just try and interpret as much as they can.
That is one shitty site. Trying to shove Google Analytics down my throat, no contact information, no privacy page. Probably illegal under GDPR.
> so I've sometimes wondered if a cheap form of "sanitizing" PDFs would be to simply force their PDF/A flags on.
That's not really how PDF-standards work. You'll have to "rewrite" the problematic parts, the standards are just for checking against the pre-defined ruleset.
In professional media production we do this "rewrite" all the time (PDF/X-standard). Though sometimes PDF files are just so "broken" that it's impossible to fix them.