• keepthepace@slrpnk.net
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 hours ago

    Yes, PDFs are much more permissive and may not have any semantic information at all. Hell, some old publications are just scanned images!

    PDF -> semantic seems to be a hard problem that basically requires OCR, like these people are doing