INDEX
Explanations
words related to destruction and disruption
terms associated with harmful or disruptive forces
New Auto-Interp
Negative Logits
ighed
-0.83
cedented
-0.82
oling
-0.81
igon
-0.77
olina
-0.77
elf
-0.76
puter
-0.76
ribe
-0.75
Veter
-0.75
buck
-0.75
POSITIVE LOGITS
impulse
1.11
minded
0.99
behav
0.96
impulses
0.93
tendencies
0.87
behavi
0.85
arts
0.80
activity
0.79
tools
0.79
qualities
0.78
Activations Density 0.047%