INDEX
Explanations
social media-related terms and actions
references to social media platforms and applications
New Auto-Interp
Negative Logits
ovember
-0.86
Ö¼
-0.76
Mane
-0.73
tower
-0.65
overc
-0.64
pacing
-0.61
abal
-0.60
sole
-0.59
Bere
-0.59
jewels
-0.58
POSITIVE LOGITS
1.00
0.99
0.90
Telegram
0.87
0.83
Tumblr
0.83
Tumblr
0.82
0.80
Gmail
0.79
0.78
Activations Density 0.028%