INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     another
    -0.35
     these
    -0.33
     Yeo
    -0.33
    קו
    -0.32
    eo
    -0.32
     a
    -0.31
     cleanup
    -0.31
     enforced
    -0.30
     toege
    -0.29
    ことなく
    -0.29
    POSITIVE LOGITS
     surprised
    1.80
    surprised
    1.73
    Surprised
    1.41
    shocked
    1.23
     shocked
    1.20
     surpris
    1.20
     astonished
    1.14
     überrascht
    1.09
     amazed
    1.09
     sorprend
    1.09
    Act Density 0.004%

    No Known Activations