INDEX
Explanations
mentions of political figures, particularly senators
references to senators
New Auto-Interp
Negative Logits
sticky
-0.67
civilisation
-0.66
tru
-0.64
theless
-0.64
misunderstanding
-0.64
managerial
-0.63
factor
-0.63
feats
-0.63
tolerance
-0.61
underpin
-0.61
POSITIVE LOGITS
iors
1.26
eca
1.14
seless
1.00
essee
0.99
escent
0.99
pai
0.98
esse
0.96
iture
0.95
.,
0.91
itive
0.88
Activations Density 0.018%