INDEX
    Explanations

    instances of the word "Int" or related variations, likely indicating references to intelligence or introspection

    New Auto-Interp
    Negative Logits
    elian
    -0.15
     perror
    -0.15
    ldb
    -0.14
    rsp
    -0.14
    mouseout
    -0.14
    atrice
    -0.14
    hatt
    -0.14
     Norm
    -0.14
    afi
    -0.14
    ahi
    -0.14
    POSITIVE LOGITS
    RODUCTION
    0.19
    ention
    0.19
    umes
    0.19
    roducing
    0.19
    ended
    0.18
    emann
    0.18
    rog
    0.18
    ensive
    0.18
    ros
    0.17
    érieur
    0.17
    Act Density 0.032%

    No Known Activations