INDEX
Explanations
names of political figures, organizations, and events
proper nouns related to news and political figures
New Auto-Interp
Negative Logits
)--
-0.56
''.
-0.55
......
-0.54
.''
-0.53
....
-0.53
utterstock
-0.53
}.
-0.51
)?
-0.51
Miami
-0.51
....
-0.50
POSITIVE LOGITS
seless
0.59
atism
0.51
osite
0.47
"#
0.46
uin
0.46
nered
0.46
hotline
0.46
cheat
0.46
TextColor
0.46
acial
0.46
Activations Density 1.232%