INDEX
Explanations
phrases related to accountability and public scrutiny of figures in positions of power
New Auto-Interp
Negative Logits
erot
-0.17
Animate
-0.16
tel
-0.15
hol
-0.15
Animalia
-0.15
lg
-0.14
Inlining
-0.14
entai
-0.14
ham
-0.14
Marcos
-0.14
POSITIVE LOGITS
ulet
0.16
bond
0.15
_ASC
0.14
ÑģÑĤв
0.14
::$
0.14
abee
0.14
_conditions
0.14
δÏģο
0.13
elocity
0.13
altern
0.13
Activations Density 0.008%