INDEX
Explanations
phrases related to close relationships or strong emotional connections
instances of the word "best."
New Auto-Interp
Negative Logits
cano
-0.76
zanne
-0.64
abolished
-0.63
outlawed
-0.63
OUR
-0.61
leted
-0.60
umat
-0.57
ller
-0.56
letal
-0.55
zzi
-0.55
POSITIVE LOGITS
seller
1.14
friend
1.03
iary
1.02
friend
0.98
intentions
0.96
iaries
0.94
Friend
0.91
efforts
0.90
wishes
0.88
instincts
0.87
Activations Density 0.048%