INDEX
Explanations
references to social hierarchy and relationships
New Auto-Interp
Negative Logits
enheim
-0.19
Administrative
-0.17
wrongdoing
-0.16
.documentation
-0.15
Protective
-0.15
Typed
-0.15
folk
-0.15
exceptionally
-0.15
Stevenson
-0.15
deterrent
-0.15
POSITIVE LOGITS
oe
0.19
publicly
0.18
domest
0.18
Åĵ
0.16
OPS
0.16
late
0.16
ÌĨ
0.15
zel
0.15
british
0.15
uncommon
0.15
Activations Density 0.064%