INDEX
Explanations
references and quotes from news articles, particularly those related to government actions and political figures
New Auto-Interp
Negative Logits
tty
-0.77
gone
-0.73
course
-0.72
xual
-0.71
pants
-0.69
gaard
-0.68
ment
-0.66
lift
-0.64
changed
-0.64
abiding
-0.63
POSITIVE LOGITS
TN
1.09
PLE
0.95
PLIED
0.94
PLA
0.92
OE
0.90
PLIC
0.90
Photo
0.84
PROV
0.84
ocalypse
0.84
osta
0.83
Activations Density 3.161%