INDEX
Explanations
references to mental health issues, particularly depression and anxiety
New Auto-Interp
Negative Logits
anning
-0.17
åĪ©
-0.16
osy
-0.15
адж
-0.14
ãĥ³ãĤ¹
-0.14
tparam
-0.14
ãĤ¼
-0.14
ãĥ¬ãĥĥãĥĪ
-0.14
κι
-0.14
asant
-0.14
POSITIVE LOGITS
mood
0.18
Sad
0.16
Depression
0.15
depressive
0.15
ductive
0.15
Mood
0.15
/an
0.14
Factory
0.14
antidepress
0.14
é¡Ķ
0.14
Activations Density 0.074%