INDEX
Explanations
mentions of social media platforms and activities like posting, commenting, or sharing
references to posting on social media platforms, particularly Facebook
New Auto-Interp
Negative Logits
osi
-0.72
mentation
-0.67
fts
-0.65
sen
-0.62
oso
-0.62
rica
-0.62
abe
-0.60
apy
-0.59
]=
-0.59
ppo
-0.59
POSITIVE LOGITS
ulate
1.06
flyers
1.02
selfies
1.01
pics
0.98
pictures
0.96
screenshots
0.92
photos
0.91
videos
0.88
hum
0.87
anonymously
0.85
Activations Density 0.062%