INDEX
Explanations
expressions of personal dislike or negative opinions
New Auto-Interp
Negative Logits
bì
-0.15
ings
-0.14
odcast
-0.14
aces
-0.14
ienia
-0.13
leine
-0.13
éĴ
-0.13
insky
-0.13
inking
-0.13
ingham
-0.13
POSITIVE LOGITS
tog
0.16
toc
0.15
ì¦Ŀ
0.14
iaux
0.14
arth
0.13
Howe
0.13
arta
0.13
ieves
0.13
edir
0.13
_VEC
0.13
Activations Density 0.138%