INDEX
Explanations
personal pronouns followed by actions or attributes in relation to a specific person
references to individuals or questions about identity
New Auto-Interp
Negative Logits
MER
-0.82
³³³³
-0.71
BACK
-0.71
inence
-0.67
Glob
-0.67
âĶģ
-0.66
NB
-0.66
IVE
-0.66
OOL
-0.65
urb
-0.65
POSITIVE LOGITS
soever
0.95
else
0.83
sorts
0.80
redes
0.80
exactly
0.79
vou
0.73
kinds
0.73
happened
0.73
ingred
0.71
exchanged
0.70
Activations Density 0.061%