INDEX
Explanations
the word "supreme" or related terms, as well as phrases related to authority and power
references to authority and supremacy
New Auto-Interp
Negative Logits
OUT
-0.90
ppo
-0.83
TPS
-0.74
okemon
-0.72
uffy
-0.70
ugg
-0.69
FORE
-0.69
ople
-0.67
zl
-0.66
kson
-0.65
POSITIVE LOGITS
ly
0.85
rament
0.80
most
0.76
essential
0.72
secrecy
0.70
vigilance
0.70
ITY
0.70
iour
0.69
doms
0.69
reme
0.68
Activations Density 0.013%