INDEX
Explanations
Twitter handles and related metadata in text data
mentions of social media accounts or handles
New Auto-Interp
Negative Logits
ATIVE
-0.77
CONTROL
-0.76
IZE
-0.74
INESS
-0.74
Gamble
-0.73
âĸ¬
-0.71
idates
-0.70
LIFE
-0.70
ingham
-0.68
Bunny
-0.68
POSITIVE LOGITS
cd
1.02
cs
1.00
fp
0.93
cc
0.92
ucl
0.92
isoft
0.91
nm
0.91
ickets
0.90
gs
0.90
pd
0.89
Activations Density 0.117%