INDEX
Explanations
references to seeking or enacting revenge
terms related to revenge and vengeance
New Auto-Interp
Negative Logits
livest
-0.87
uddle
-0.76
ographics
-0.74
anian
-0.67
Transition
-0.66
hover
-0.64
regulated
-0.64
66666666
-0.63
kered
-0.63
interchange
-0.61
POSITIVE LOGITS
against
1.22
Against
1.05
against
1.05
retaliation
1.04
revenge
1.02
vengeance
0.95
retali
0.95
retribution
0.95
Against
0.93
repr
0.91
Activations Density 0.075%