INDEX
Explanations
instances of emotional or subjective expressions
New Auto-Interp
Negative Logits
Affero
-0.16
rawler
-0.15
ogan
-0.15
plusplus
-0.14
ylie
-0.14
žil
-0.14
à¸ŀร
-0.14
à¹Ģà¸Ĥà¸ķ
-0.14
ÏĥÏĦά
-0.14
žel
-0.14
POSITIVE LOGITS
idor
0.16
Tib
0.16
sensor
0.15
quality
0.15
üre
0.14
typical
0.14
èĪ
0.14
bare
0.14
sf
0.14
aison
0.14
Activations Density 0.006%