INDEX
Explanations
references to mental states or psychological conditions
New Auto-Interp
Negative Logits
gis
-0.15
stiff
-0.15
discomfort
-0.15
urv
-0.14
nesty
-0.14
GIS
-0.14
dolor
-0.14
sting
-0.14
ustom
-0.14
ibble
-0.14
POSITIVE LOGITS
insanity
0.49
madness
0.48
insane
0.45
crazy
0.45
lun
0.45
mad
0.44
mad
0.43
craz
0.42
Crazy
0.42
sanity
0.40
Activations Density 0.397%