INDEX
Explanations
strong and impactful words
terms associated with manipulation, control, and centralization of power or policies
New Auto-Interp
Negative Logits
OTOS
-0.67
Annotations
-0.58
bsite
-0.57
scanned
-0.55
guy
-0.53
cyclopedia
-0.52
nudity
-0.51
HOME
-0.50
Bucks
-0.50
photograp
-0.50
POSITIVE LOGITS
ibly
0.74
polit
0.73
otent
0.72
emonic
0.71
itatively
0.70
arious
0.70
iously
0.70
iable
0.70
ulative
0.68
iably
0.66
Activations Density 0.489%