INDEX
Explanations
words related to being harmed or criticized
words related to harm or injury
New Auto-Interp
Negative Logits
enhagen
-0.81
UX
-0.63
aceae
-0.63
DragonMagazine
-0.63
Dragonbound
-0.62
leck
-0.61
Highlands
-0.61
Integrity
-0.61
sylv
-0.61
ynski
-0.57
POSITIVE LOGITS
ried
1.11
ped
1.10
ping
1.07
red
1.00
bled
0.99
med
0.97
ved
0.96
ked
0.95
ivated
0.94
gered
0.92
Activations Density 0.146%