INDEX
Explanations
words related to mental health conditions, especially anxiety
New Auto-Interp
Negative Logits
endor
-0.85
estone
-0.80
anmar
-0.78
ingen
-0.78
nice
-0.72
estones
-0.70
sites
-0.69
announced
-0.67
ded
-0.67
eve
-0.67
POSITIVE LOGITS
disorders
1.08
disorder
1.00
provoking
0.96
Disorders
0.94
sickness
0.83
inducing
0.81
illness
0.81
symptoms
0.80
anxiety
0.80
xiety
0.80
Activations Density 0.031%