INDEX
Explanations
expressions of affection and emotional attachment
New Auto-Interp
Negative Logits
enburg
-0.19
uz
-0.18
enberg
-0.17
iad
-0.15
akers
-0.15
iras
-0.15
allen
-0.15
unic
-0.15
chez
-0.14
mouseup
-0.14
POSITIVE LOGITS
toward
0.29
towards
0.28
border
0.28
-border
0.26
border
0.22
affair
0.22
Border
0.22
shown
0.20
_border
0.19
Towards
0.19
Activations Density 0.077%