INDEX
Explanations
names of political figures and representatives
New Auto-Interp
Negative Logits
eric
-0.69
fields
-0.67
lag
-0.66
ultimate
-0.65
vered
-0.63
ADS
-0.60
conver
-0.60
rave
-0.60
icing
-0.58
fortun
-0.58
POSITIVE LOGITS
asse
0.78
artment
0.72
esson
0.70
Rodham
0.69
ornings
0.69
avanaugh
0.68
ENN
0.68
bernatorial
0.68
ãģ®å®
0.66
Kavanaugh
0.65
Activations Density 0.008%