INDEX
Explanations
phrases indicating feelings of exclusion or inadequacy
New Auto-Interp
Negative Logits
duk
-0.15
agens
-0.15
iams
-0.15
cak
-0.15
inus
-0.14
icari
-0.14
Kostenlose
-0.14
umd
-0.14
íĻį
-0.14
reeze
-0.14
POSITIVE LOGITS
even
0.16
RING
0.16
even
0.15
ountain
0.14
Hick
0.14
sometimes
0.14
Neu
0.14
cav
0.14
cred
0.13
-basket
0.13
Activations Density 0.037%