INDEX
Explanations
words and phrases expressing deep emotions and appreciation
New Auto-Interp
Negative Logits
onth
-0.17
akers
-0.16
uz
-0.15
iad
-0.15
acco
-0.14
unic
-0.14
ddit
-0.14
Willow
-0.14
forces
-0.14
åħį
-0.14
POSITIVE LOGITS
toward
0.25
towards
0.24
affair
0.23
border
0.19
-border
0.19
border
0.18
Towards
0.17
Towards
0.17
hacia
0.17
affinity
0.16
Activations Density 0.075%