The whispering is all in her head and says she sucks

  • FlorianSimon@sh.itjust.works
    link
    fedilink
    arrow-up
    6
    ·
    6 days ago

    You can extract text from PDFs without using OCR, they aren’t all images embedded in a file.

    I’m sure you’ve opened PDF documents before and selected text in it, or searched for something. That works because the text is embedded in the document, I’m sure.

    You can also create PDF documents with the text converted as images, but those are usually larger in size.

    • just_an_average_joe@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 days ago

      Not necessarily, CVs have complicated formatting. Nobody (should) write blocks of text, and you don’t know how many columns the candidate is using. Is the candidate using a specific section to show star based skill rating or word based? So you can still search for individual keywords but if you try copying the whole pdf and paste it in txt (which is what will be forwarded to ATS), it does not make much sense. The structure is too complicated extract where you studied, what did you studied and your grade, what other experiences you have and how long you worked there etc.

      Extracting structured data is in its own right a different field of science. There is plenty of recent research on extracting structured data from academic pdfs (I was working on this in a research institute in germany around 2022), even when LLMs are used it can get really complicated to the point that there are specialized LLMs for just that.

      But ATS systems are cheap/not high enough priority to even use OCR let alone LLMs so unfortunately the responsibility of making an easily parsable CV comes down to the candidate.

      Try this next time you see your CV, copy its text to a txt then think about if you can write a program that can reliably extract your experience, education, interests etc. Its going to be super difficult and even then it won’t generalize to thousands of other CVs.

      • bufalo1973@lemmy.ml
        cake
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 days ago

        All those “problems” apply to Word too. Maybe you use tables, maybe you use lists, maybe you use stars, maybe … So there’s no advantage in forcing people to use Word “because the machine can understand it better”. Because that’s a lie.

        • FlorianSimon@sh.itjust.works
          link
          fedilink
          arrow-up
          2
          ·
          6 days ago

          Exactly what I was about to reply. Try copying a crazy multi-column Word document into text, and you’ll get similar results.

          Copy-pasting parts of your PDF document is not any more difficult than doing the same thing for a Word document.