INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     in
    -1.84
     at
    -1.63
     so
    -1.63
     all
    -1.62
     not
    -1.54
     Although
    -1.53
     on
    -1.53
     two
    -1.50
     their
    -1.48
     just
    -1.47
    POSITIVE LOGITS
     différentes
    1.65
     verschillende
    1.65
     nieuwe
    1.48
     refroid
    1.42
     différents
    1.42
     adatt
    1.39
     новую
    1.37
     dépens
    1.37
    いろんな
    1.37
     новых
    1.37
    Act Density 0.020%

    No Known Activations