INDEX
Explanations
words and phrases associated with political actions and statements
New Auto-Interp
Negative Logits
iple
-0.96
raft
-0.65
Cros
-0.64
atars
-0.63
uckland
-0.63
utic
-0.62
Shard
-0.62
uber
-0.61
Hatt
-0.61
Cube
-0.60
POSITIVE LOGITS
thereby
1.11
including
0.97
whereas
0.97
perpetrated
0.96
despite
0.95
citing
0.95
while
0.95
aka
0.94
which
0.94
lest
0.93
Activations Density 0.296%