INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     multit
    -0.82
     hors
    -0.77
     COP
    -0.76
     loudspe
    -0.75
     toget
    -0.75
     livest
    -0.74
     grapp
    -0.73
     seiz
    -0.73
     sne
    -0.73
     videog
    -0.72
    POSITIVE LOGITS
    ma
    1.37
    na
    1.32
    sa
    1.29
    ia
    1.23
    aga
    1.21
    ya
    1.16
    ava
    1.12
    amia
    1.12
    da
    1.12
    ana
    1.11
    Act Density 0.229%

    No Known Activations