INDEX
Explanations
references to suicide
references to suicide and related concepts
New Auto-Interp
Negative Logits
heny
-0.80
asonic
-0.76
Avg
-0.75
Collider
-0.72
artisan
-0.69
Stud
-0.68
Phar
-0.68
rium
-0.67
aunder
-0.66
uncture
-0.66
POSITIVE LOGITS
suicide
1.12
bomber
1.11
zai
1.03
bombers
1.01
itated
0.99
itating
0.97
icide
0.95
pact
0.87
icides
0.86
ide
0.86
Activations Density 0.025%