INDEX
Explanations
words and phrases that convey friendliness and positive social interactions
New Auto-Interp
Negative Logits
Ìĥ
-0.16
ãĤ¤ãĤ¯
-0.15
cpp
-0.15
ngo
-0.14
greed
-0.14
nam
-0.14
oen
-0.14
iao
-0.14
.Throw
-0.13
ngth
-0.13
POSITIVE LOGITS
inkel
0.19
assen
0.18
yyyy
0.16
enough
0.15
iswa
0.15
liness
0.15
lier
0.15
ãĥ¶
0.15
ness
0.14
WithEvents
0.14
Activations Density 0.023%