INDEX
Explanations
references to individuals or groups with specific roles or characteristics
New Auto-Interp
Negative Logits
ifie
-0.15
ially
-0.15
enen
-0.15
inya
-0.14
kk
-0.14
fil
-0.13
ifiable
-0.13
ppers
-0.13
zew
-0.13
isms
-0.13
POSITIVE LOGITS
own
0.27
soever
0.24
oping
0.21
own
0.21
whose
0.19
whose
0.18
Own
0.18
upon
0.17
commended
0.17
_own
0.16
Activations Density 0.017%