INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     binds
    -0.06
     trainers
    -0.06
    -0.06
     reproduce
    -0.06
     italiane
    -0.06
    Heading
    -0.06
     showed
    -0.06
    _as
    -0.06
     Find
    -0.06
     usando
    -0.06
    POSITIVE LOGITS
    Cls
    0.07
    νού
    0.07
    fred
    0.07
     period
    0.06
     Brotherhood
    0.06
    CHAT
    0.06
     periods
    0.06
    -Петерб
    0.06
     Sa
    0.06
    -FIRST
    0.06
    Act Density 0.011%

    No Known Activations