INDEX
Explanations
words related to surrendering or giving in
past tense verbs and actions related to causing harm or impact
New Auto-Interp
Negative Logits
vas
-0.66
mania
-0.65
gram
-0.65
find
-0.65
spe
-0.64
fam
-0.63
grad
-0.62
von
-0.61
aceae
-0.61
Kills
-0.61
POSITIVE LOGITS
oots
0.78
toe
0.70
ears
0.69
kered
0.68
Wrath
0.67
cffffcc
0.66
itors
0.65
goodbye
0.64
tails
0.64
pring
0.64
Activations Density 0.058%