INDEX
Explanations
expressions related to self-harm and suicide
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.09
3:0.09
4:0.02
5:0.03
6:0.05
7:0.10
8:0.16
9:0.17
10:0.05
11:0.13
Negative Logits
taboola
-1.32
glers
-1.27
Presents
-1.23
ciplinary
-1.22
CLUD
-1.15
published
-1.12
lished
-1.08
spect
-1.06
══
-1.05
Statement
-1.03
POSITIVE LOGITS
chunk
1.15
unborn
1.15
invaders
1.10
ego
1.08
damn
1.08
dro
1.06
crap
1.06
shit
1.05
pesky
1.04
goddamn
1.03
Activations Density 0.042%