INDEX
Explanations
words related to mental health conditions and experiences such as dissociation, identity confusion, and memory alteration
New Auto-Interp
Negative Logits
inery
-0.78
©¶æ¥µ
-0.74
abba
-0.71
pione
-0.68
abor
-0.67
iHUD
-0.65
20439
-0.64
alde
-0.64
ullivan
-0.63
Ãį
-0.63
POSITIVE LOGITS
?
1.99
?:
1.89
?'
1.86
?"
1.86
?)
1.79
?),
1.76
?",
1.75
?).
1.74
?!
1.72
?".
1.69
Activations Density 1.748%