INDEX
Explanations
phrases related to political matters and discourses
New Auto-Interp
Negative Logits
479
-0.13
ogl
-0.13
awan
-0.13
£p
-0.13
Assert
-0.13
enso
-0.13
Ñĩа
-0.13
ưá»Ŀng
-0.13
asta
-0.13
elog
-0.13
POSITIVE LOGITS
objective
0.57
neutral
0.54
impartial
0.51
unbiased
0.48
neutrality
0.47
objective
0.46
Neutral
0.45
-neutral
0.45
neutral
0.45
Neutral
0.44
Activations Density 0.360%