INDEX
Explanations
Brazilian Portuguese diacritics used in words
the presence of the word "didn't" in various contexts
New Auto-Interp
Negative Logits
hemor
-0.75
mathemat
-0.74
subsequ
-0.72
Seym
-0.70
imitation
-0.69
captives
-0.69
successes
-0.66
Swordsman
-0.65
culprit
-0.65
defeats
-0.65
POSITIVE LOGITS
ï¸ı
1.03
Balt
0.96
sure
0.94
ï¸
0.93
âĢł
0.87
ski
0.85
own
0.85
âķIJ
0.84
atural
0.82
_>
0.82
Activations Density 0.254%