INDEX
Explanations
mentions of friendships or friendly interactions
mentions of friendship
New Auto-Interp
Negative Logits
tarians
-0.69
aeda
-0.65
ournal
-0.62
Analysis
-0.61
untreated
-0.61
chloride
-0.59
acent
-0.58
ERO
-0.57
heel
-0.57
ilion
-0.55
POSITIVE LOGITS
liest
1.67
liness
1.66
lier
1.66
lies
1.53
ship
1.21
ships
1.18
finder
0.92
hips
0.88
hetical
0.84
nee
0.83
Activations Density 0.040%