INDEX
Explanations
words related to policies, regulations, and societal issues
New Auto-Interp
Negative Logits
Typhoon
-0.65
Hilton
-0.63
Rhodes
-0.61
Pwr
-0.61
Polk
-0.61
Limbaugh
-0.60
Arpaio
-0.58
giveaway
-0.58
Sec
-0.57
RPM
-0.55
POSITIVE LOGITS
selves
1.35
pecially
0.94
lightly
0.92
self
0.90
atisf
0.87
own
0.86
selves
0.86
belongings
0.86
avior
0.84
ELF
0.84
Activations Density 0.106%