INDEX
Explanations
instances of strong emotional expressions or affirmations
New Auto-Interp
Negative Logits
zan
-0.16
yer
-0.16
rang
-0.15
anka
-0.14
ittest
-0.14
dig
-0.14
elman
-0.14
bid
-0.14
utos
-0.13
dem
-0.13
POSITIVE LOGITS
ussy
0.16
ocket
0.16
624
0.14
yon
0.14
vn
0.14
obus
0.14
ản
0.13
spb
0.13
lyn
0.13
andle
0.13
Activations Density 0.000%