INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    abler
    -0.16
    ushman
    -0.15
    aub
    -0.15
    ivre
    -0.14
    ôi
    -0.14
    akespeare
    -0.14
    145
    -0.14
    edin
    -0.14
    óln
    -0.14
    .framework
    -0.13
    POSITIVE LOGITS
    ostat
    0.17
     spl
    0.16
    IGHL
    0.15
     sple
    0.15
    longleftrightarrow
    0.14
    ยม
    0.14
    rial
    0.14
    cyan
    0.14
    rig
    0.14
    oodoo
    0.13
    Act Density 0.095%

    No Known Activations