INDEX
Explanations
mentions of mental health and conditions
New Auto-Interp
Head Attr Weights
0:0.03
1:0.00
2:0.11
3:0.41
4:0.05
5:0.02
6:0.03
7:0.04
8:0.08
9:0.07
10:0.07
11:0.05
Negative Logits
—"
-2.00
),"
-1.79
…"
-1.64
Advent
-1.56
ë
-1.53
Martial
-1.52
!),
-1.51
darling
-1.49
..."
-1.49
!"
-1.49
POSITIVE LOGITS
chars
1.76
SW
1.75
rimp
1.59
**
1.54
lishes
1.52
kit
1.51
captcha
1.50
imeo
1.50
verified
1.48
STEP
1.47
Activations Density 0.007%