INDEX
Explanations
pronouns saying or thinking
New Auto-Interp
Negative Logits
lika
-1.19
Cinta
-1.13
Moda
-1.12
bilden
-1.12
klaus
-1.07
plafon
-1.07
rah
-1.07
-1.06
Aktu
-1.06
tarif
-1.06
POSITIVE LOGITS
said
1.62
says
1.55
say
1.16
should
1.13
noted
1.07
And
1.00
he
0.97
に来て
0.96
Note
0.94
spune
0.92
Activations Density 0.011%