INDEX
Explanations
mentions of authoritative or governmental figures and actions
discussions around decision-making and accountability in public contexts
New Auto-Interp
Negative Logits
/
-0.66
Newsletter
-0.63
taboola
-0.63
respectively
-0.58
Byz
-0.56
ezvous
-0.56
igon
-0.56
resid
-0.55
%);
-0.55
Trog
-0.55
POSITIVE LOGITS
?!"
0.96
such
0.96
!?"
0.88
blatantly
0.86
suddenly
0.85
?!
0.84
mere
0.84
!?
0.83
solely
0.80
someone
0.79
Activations Density 0.899%