INDEX
Explanations
words associated with destruction or damage
New Auto-Interp
Negative Logits
eve
-0.89
birth
-0.78
ORGE
-0.78
women
-0.77
xon
-0.72
meal
-0.71
yip
-0.69
Sakuya
-0.68
hemor
-0.67
masters
-0.66
POSITIVE LOGITS
anca
1.46
eness
1.10
anches
0.92
anch
0.91
ahn
0.90
ack
0.88
anc
0.87
ossom
0.86
ilty
0.86
oks
0.85
Activations Density 0.004%