INDEX
Explanations
mentions of political figures and government bodies
references to political entities, figures, and related terminology
New Auto-Interp
Negative Logits
Brow
-0.49
seismic
-0.48
imensional
-0.48
Era
-0.48
Merit
-0.48
vortex
-0.47
Place
-0.47
Amen
-0.47
oven
-0.47
Filipino
-0.47
POSITIVE LOGITS
tracks
0.68
glers
0.67
milo
0.67
imaru
0.63
issance
0.63
alike
0.62
enance
0.60
retty
0.59
stice
0.57
itiz
0.57
Activations Density 0.881%