INDEX
Explanations
references to personal relationships and community connections
New Auto-Interp
Negative Logits
asso
-0.15
onas
-0.15
ulton
-0.15
ertino
-0.15
ounter
-0.15
Rough
-0.14
izzo
-0.14
icers
-0.14
ilver
-0.14
ogui
-0.14
POSITIVE LOGITS
loved
0.19
friends
0.16
yourself
0.16
someone
0.16
friend
0.16
Loved
0.15
_TBL
0.15
family
0.15
Boss
0.14
Male
0.14
Activations Density 0.076%