INDEX
Explanations
Twitter usernames
mentions of Twitter usernames or handles
New Auto-Interp
Negative Logits
Enabled
-0.71
LIMITED
-0.71
ACTIONS
-0.68
CONTROL
-0.67
Islamists
-0.66
AFB
-0.63
FAC
-0.61
Ninth
-0.61
excess
-0.60
Scheme
-0.59
POSITIVE LOGITS
rentice
0.93
yp
0.92
veyard
0.88
_
0.83
rick
0.82
CBC
0.79
Bow
0.79
onge
0.78
anie
0.77
neys
0.77
Activations Density 0.088%