INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ,
    0.52
    /
    0.44
     somewhat
    0.44
    .
    0.43
     or
    0.43
    ).
    0.43
     type
    0.42
     helpful
    0.42
    _
    0.42
     useful
    0.42
    POSITIVE LOGITS
     Każ
    0.66
     każdej
    0.56
     Ogni
    0.55
     Every
    0.53
     YOU
    0.53
     Abbiamo
    0.53
     Think
    0.52
     NOTHING
    0.52
     Siamo
    0.52
     Estamos
    0.51
    Act Density 0.003%

    No Known Activations