INDEX
Explanations
names of political figures
specific references to politics and notable figures
New Auto-Interp
Negative Logits
UNCLASSIFIED
-0.86
SPONSORED
-0.79
}.
-0.79
lihood
-0.76
_.
-0.73
>.
-0.72
.).
-0.72
)).
-0.71
:,
-0.68
};
-0.67
POSITIVE LOGITS
awoke
0.71
unveiled
0.66
prepares
0.66
has
0.64
announced
0.64
watchdog
0.60
finally
0.59
announces
0.56
examines
0.56
continues
0.55
Activations Density 0.885%