INDEX
Explanations
references to psychological disorders and their associated characteristics
New Auto-Interp
Negative Logits
un
-0.77
pu
-0.75
,
-0.74
.
-0.72
no
-0.70
to
-0.68
ha
-0.68
in
-0.68
bu
-0.67
so
-0.65
POSITIVE LOGITS
Monfieur
1.64
Jefus
1.59
myſelf
1.57
Efq
1.51
Houſe
1.49
itſelf
1.49
Majefty
1.48
ſelf
1.44
himſelf
1.43
Eſ
1.41
Activations Density 0.828%