INDEX
Explanations
connections and references to social relationships, particularly involving friends and family
New Auto-Interp
Negative Logits
friend
-0.40
Friend
-0.40
friend
-0.38
Friend
-0.36
friendship
-0.34
friendships
-0.32
Friends
-0.31
friends
-0.30
friends
-0.29
Friendship
-0.28
POSITIVE LOGITS
foes
0.28
aqu
0.25
enemies
0.25
foe
0.23
Enemies
0.23
Aqu
0.21
col
0.21
neighbors
0.20
enemy
0.20
acqu
0.20
Activations Density 0.027%