INDEX
Explanations
Twitter handles preceded by special characters like '@' and dates
the presence of at-signs associated with social media handles
New Auto-Interp
Negative Logits
blers
-0.77
attractions
-0.74
Lans
-0.74
icion
-0.72
criminals
-0.70
omorphic
-0.69
Ĥª
-0.66
treatments
-0.66
Ͻ
-0.64
houses
-0.64
POSITIVE LOGITS
TPS
0.84
Twe
0.76
76561
0.76
VIDEOS
0.75
Official
0.74
Ùħ
0.69
Tweet
0.68
à¼
0.67
arthed
0.67
âĶľ
0.66
Activations Density 0.033%