INDEX
Explanations
social media platform names
mentions of social media platforms, particularly Facebook and Twitter
New Auto-Interp
Negative Logits
stood
-0.72
downt
-0.66
shall
-0.61
Spac
-0.59
rano
-0.59
Gibbs
-0.58
Shore
-0.58
DEM
-0.57
rouse
-0.57
inav
-0.57
POSITIVE LOGITS
ileaks
0.87
0.86
0.79
0.79
PHOTO
0.78
Features
0.77
Messenger
0.76
uador
0.72
0.71
0.71
Activations Density 0.033%