INDEX
    Explanations

    instances of specific capital letters or characters

    New Auto-Interp
    Negative Logits
    lying
    -0.15
    oa
    -0.15
    SAFE
    -0.15
     stalking
    -0.15
    bean
    -0.15
    resource
    -0.15
    TOOLS
    -0.14
     affordability
    -0.14
    otec
    -0.14
    oke
    -0.14
    POSITIVE LOGITS
    alog
    0.22
    apis
    0.21
    aro
    0.20
    ález
    0.18
    izio
    0.17
    акон
    0.17
    азв
    0.17
    pz
    0.16
    abyte
    0.16
    agra
    0.16
    Act Density 0.008%

    No Known Activations