INDEX
Explanations
phrases related to conflict, controversy, or heated discussions
New Auto-Interp
Negative Logits
Provision
-0.76
sadd
-0.63
Rockets
-0.62
Voy
-0.62
Helm
-0.62
Ethics
-0.61
Sadd
-0.61
Grail
-0.60
Directorate
-0.58
Bottom
-0.58
POSITIVE LOGITS
usal
1.09
iw
1.06
ocratic
1.06
umin
1.03
pron
1.01
aps
1.00
adic
1.00
ches
0.99
otropic
0.99
erb
0.99
Activations Density 1.975%