INDEX
Explanations
references to professional or social relationships among individuals
New Auto-Interp
Negative Logits
Pruitt
-0.16
ople
-0.15
dden
-0.14
andle
-0.14
arden
-0.14
kara
-0.14
orida
-0.14
izzo
-0.14
oucher
-0.13
exter
-0.13
POSITIVE LOGITS
hood
0.21
hip
0.20
ship
0.20
ships
0.18
们
0.17
whom
0.17
/part
0.16
/op
0.16
innen
0.15
hips
0.15
Activations Density 0.120%