INDEX
Explanations
positive emotional words or sentiments, particularly focused on love and affection
references to emotions, particularly related to hearts and feelings
New Auto-Interp
Negative Logits
ression
-0.67
Recomm
-0.66
Delivery
-0.65
ressive
-0.63
ECH
-0.62
ron
-0.62
rav
-0.60
redo
-0.59
Relations
-0.58
elim
-0.58
POSITIVE LOGITS
chool
1.57
paces
1.49
pace
1.37
mith
1.35
creen
1.33
ystem
1.26
hips
1.26
pring
1.26
ynthesis
1.20
hip
1.19
Activations Density 0.086%