INDEX
Explanations
references to the concept of "friend"
references to friendships or mentions of friends
New Auto-Interp
Negative Logits
aeda
-0.82
tarians
-0.80
cale
-0.79
ceilings
-0.78
chloride
-0.70
overe
-0.67
plets
-0.67
mberg
-0.66
elsius
-0.65
Frames
-0.62
POSITIVE LOGITS
lier
1.31
liness
1.28
lies
1.19
liest
1.15
friend
0.94
ship
0.92
ships
0.88
hesis
0.85
finder
0.84
hetical
0.79
Activations Density 0.038%