INDEX
Explanations
words related to mental conditions or behaviors associated with mental instability
references to insanity or madness
New Auto-Interp
Negative Logits
fman
-0.73
gio
-0.70
rounder
-0.70
AUT
-0.67
iers
-0.67
nan
-0.65
çĦ
-0.65
ributed
-0.65
yer
-0.64
atu
-0.64
POSITIVE LOGITS
asylum
0.87
agascar
0.81
Asylum
0.77
regress
0.76
insane
0.75
insanity
0.72
rid
0.70
itating
0.69
plea
0.69
abama
0.68
Activations Density 0.030%