INDEX
    Explanations

    words related to criticism, evaluation, information sharing, and knowledge

    New Auto-Interp
    Negative Logits
     (
    -0.83
    .
    -0.82
    -0.82
    ,
    -0.80
    ↵↵
    -0.80
     in
    -0.80
    -0.78
    ;
    -0.74
     .
    -0.73
     -
    -0.73
    POSITIVE LOGITS
     milano
    1.95
     marcato
    1.89
     dispen
    1.88
     tremb
    1.83
     pessi
    1.82
     nutr
    1.82
     ritard
    1.82
     doman
    1.82
     igno
    1.80
     napoli
    1.79
    Act Density 0.074%

    No Known Activations