INDEX
Explanations
instances of emotional expressions or reactions
New Auto-Interp
Negative Logits
avoidance
-0.76
Suk
-0.63
pressures
-0.61
Companion
-0.61
Freed
-0.58
Drawn
-0.58
Mole
-0.58
Hug
-0.57
Blend
-0.56
ABE
-0.56
POSITIVE LOGITS
ever
0.92
ttle
0.89
ï¸ı
0.87
lee
0.86
ggle
0.84
athom
0.83
reci
0.81
conom
0.80
etary
0.78
ufact
0.78
Activations Density 0.139%