INDEX
Explanations
references to social structures and interactions
New Auto-Interp
Negative Logits
aming
-0.17
amed
-0.16
oure
-0.15
afil
-0.15
ibur
-0.15
usi
-0.15
aris
-0.14
çak
-0.14
ufe
-0.14
906
-0.14
POSITIVE LOGITS
interacting
0.16
leme
0.15
fabric
0.15
loo
0.15
Thrown
0.15
.direct
0.14
otherwise
0.14
interaction
0.14
desc
0.14
&q
0.14
Activations Density 0.009%