INDEX
Explanations
words related to expressions of affection
references to affection and emotional connection
New Auto-Interp
Negative Logits
ulhu
-0.75
ramid
-0.67
DoS
-0.64
Somali
-0.64
Worlds
-0.63
Regulatory
-0.62
Hass
-0.62
akedown
-0.61
ÄŁ
-0.61
å¸
-0.61
POSITIVE LOGITS
ately
1.45
uously
1.07
ate
1.06
iously
1.03
uous
1.03
acy
1.03
affection
0.98
naire
0.95
atile
0.93
atural
0.93
Activations Density 0.017%