INDEX
Explanations
words indicating positive feelings or states of being
New Auto-Interp
Negative Logits
onest
-0.15
urette
-0.14
las
-0.14
اÙĦعÙħ
-0.14
Wass
-0.14
anou
-0.14
ÏĦÎŃ
-0.14
'gc
-0.14
емо
-0.14
estroy
-0.14
POSITIVE LOGITS
igu
0.17
æį®
0.16
inka
0.15
_BT
0.15
bearing
0.15
bear
0.15
EEK
0.15
-toast
0.14
icina
0.14
lij
0.14
Activations Density 0.002%