• corship@feddit.de
    link
    fedilink
    English
    arrow-up
    73
    ·
    1 year ago

    I wrote an ai that classifies spam emails with 99.9% accuracy.

    Our test set contained 1000 emails, 999 aren’t spam.

    The algorithm:

    • gandalf_der_12te@feddit.deOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Honestly I’d rather have that, than randomly have to miss some important E-mail because the system put it in the junk folder.

  • notabot@lemm.ee
    link
    fedilink
    English
    arrow-up
    62
    arrow-down
    1
    ·
    1 year ago

    All odd numbers are prime: 1 is prime, 3 is prime, 5 is prime, 7 is prime, 9 is experimental error, 11 is prime, and so on, I don’t have funding to check all of them, but it suggests an avenue of productive further work.

      • notabot@lemm.ee
        link
        fedilink
        English
        arrow-up
        6
        ·
        1 year ago

        Look, just because it breaks everything, that’s no reason not to include it in a joke. We’ll just have to rebuild the entire edifice of mathematics.

        Seriously, thanks for the link, I hadn’t considered the implications of including 1 in the set of primes, and it really does seem to break a lot of ideas.

  • A_Very_Big_Fan@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    1
    ·
    1 year ago

    It’s been a fat minute since I last did any programming outside of batch scripts and AHK… I’m struggling to understand how it’s not returning false for 100% of the tests

    • RAM@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      50
      ·
      edit-2
      1 year ago

      It is always returning false, but the screen shows a test, where a non-prime evaluating as false is a pass and a prime evaluating as false is a fail :))

    • Troublehelix@feddit.nu
      link
      fedilink
      English
      arrow-up
      27
      ·
      1 year ago

      The output shown is the result of a test for the function, not the result of the function itself.

    • JackGreenEarth@lemm.ee
      link
      fedilink
      English
      arrow-up
      14
      ·
      1 year ago

      It’s returning false for all the tests, but it only should be returning false for 95% of them, as 5% are prime.

  • idunnololz@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    ·
    edit-2
    1 year ago

    How many primes are there before 1 and 2^31. IIRC prime numbers get more and more rare as the number increases. I wouldn’t be surprised if this would pass 99% of tests if tested with all positive 32 bit integers.

    • Kogasa@programming.dev
      link
      fedilink
      English
      arrow-up
      14
      ·
      1 year ago

      Per the prime number theorem, for large enough N the proportion of primes less than or equal to N is approximately 1/log(N). For N = 2^(31) that’s ~0.0465. To get under 1% you’d need N ~ 2^(145).

    • idunnololz@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 year ago

      Wolfram alpha says it’s about 4.9%. So 4.9% of numbers in the range 1 to 2^31 are prime. It’s more than I expected.

  • xthexder
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    1 year ago

    Ah yes, my favorite recurring lemmy post! It even has the same incorrect test output.

    Last time I saw this I did a few calculations based on comments people made:
    https://l.sw0.com/comment/32691 (when are we going to be able to link to comments across instances?)

    • There are 9592 prime numbers less than 100,000. Assuming the test suite only tests numbers 1-99999, the accuracy should actually be only 90.408%, not 95.121%
    • The 1 trillionth prime number is 29,996,224,275,833. This would mean even the first 29 trillion primes would only get you to 96.667% accuracy.

    In response to the question of how long it would take to round up to 100%:

    • The density of primes can be approximated using the Prime Number Theorem: 1/ln(x). Solving 99.9995 = 100 - 100 / ln(x) for x gives e^200000 or 7.88 × 10^86858. In other words, the universe will end before any current computer could check that many numbers.
    • lad@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      But you can use randomised test-cases. Better yet, you can randomise values in test-cases once and throw away the ones you don’t like and get arbitrarily close to 100% with a reasonable amount of tests

  • corsicanguppy@lemmy.ca
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 year ago

    why would you store comments in git?

    Oh. Oh ha ha ha ha you just don’t know ‘checkout’ from ‘check out’. Clean out your desk.