INDEX
Explanations
references to political parties and figures, as well as societal issues
references to political parties and their influence on societal issues
New Auto-Interp
Negative Logits
Originally
-0.72
raft
-0.70
escription
-0.70
unusual
-0.69
âĨij
-0.65
hatt
-0.65
initially
-0.65
approached
-0.63
aback
-0.63
atta
-0.63
POSITIVE LOGITS
theirs
0.93
civilized
0.90
negro
0.89
THEIR
0.89
idiots
0.88
tyranny
0.87
patri
0.87
ignor
0.87
goddamn
0.84
trillions
0.84
Activations Density 1.601%