INDEX
    Explanations

    accurately counts words

    New Auto-Interp
    Negative Logits
     mutants
    0.75
     americanos
    0.74
     masterpieces
    0.70
    ENE
    0.69
    ISM
    0.69
    ORM
    0.68
     mansions
    0.68
     programmers
    0.68
     brasileiros
    0.68
     promotional
    0.67
    POSITIVE LOGITS
    is
    0.71
    "},
    0.71
    <0xBF>
    0.70
    0.67
    as
    0.65
    <0xAB>
    0.64
     ۋە
    0.64
    س
    0.64
    ت
    0.64
    <0x94>
    0.63
    Act Density 1.245%

    No Known Activations