INDEX
Explanations
adjectives related to violence and intense negative actions
references to "vicious" behavior or cycles in various contexts
New Auto-Interp
Negative Logits
mberg
-0.90
ITNESS
-0.85
ylon
-0.78
ittee
-0.77
agine
-0.77
ovember
-0.77
ourced
-0.76
ĨĴ
-0.74
avez
-0.73
aintain
-0.72
POSITIVE LOGITS
ly
1.36
nesses
0.91
ness
0.91
vicious
0.81
retribution
0.79
circle
0.74
icious
0.73
assault
0.73
iously
0.71
aggregation
0.70
Activations Density 0.025%