INDEX
Explanations
words and phrases indicating emotional states or reactions
New Auto-Interp
Negative Logits
ear
-0.16
isser
-0.15
δο
-0.14
fantasy
-0.14
enti
-0.14
зв
-0.14
caller
-0.14
ासन
-0.13
Opens
-0.13
encoding
-0.13
POSITIVE LOGITS
ood
0.17
edia
0.15
odpad
0.15
jeta
0.15
оÑģÑĤÑĮ
0.15
Buchanan
0.14
insula
0.14
Ïģά
0.14
amarin
0.14
ield
0.14
Activations Density 0.001%