INDEX
Explanations
political figures and government-related terms
proper nouns, particularly names of people and organizations
New Auto-Interp
Negative Logits
$.
-0.62
".
-0.58
}.
-0.58
".
-0.57
''.
-0.53
").
-0.51
.).
-0.50
.</
-0.49
().
-0.49
).
-0.48
POSITIVE LOGITS
spokesman
0.69
spokeswoman
0.67
spokesperson
0.60
countered
0.57
meanwhile
0.57
reacted
0.56
tweeted
0.52
echoed
0.52
commented
0.51
cautioned
0.51
Activations Density 1.001%