INDEX
Explanations
mentions of friends and family
New Auto-Interp
Negative Logits
Friend
-0.32
Friend
-0.30
friend
-0.29
friend
-0.28
friendship
-0.27
Friendship
-0.24
friendships
-0.23
Friends
-0.21
_friend
-0.20
дÑĢÑĥж
-0.20
POSITIVE LOGITS
foes
0.31
enemies
0.29
acquaint
0.28
neighbors
0.27
Enemies
0.27
neighbours
0.25
colleagues
0.25
foe
0.24
associates
0.24
relatives
0.24
Activations Density 0.025%