INDEX
Explanations
social media handles and Twitter usernames
New Auto-Interp
Negative Logits
Lisbon
-0.73
tape
-0.71
lipstick
-0.71
Borders
-0.69
Bran
-0.69
cholesterol
-0.67
Pyramid
-0.65
Corinth
-0.65
wards
-0.65
Bradford
-0.65
POSITIVE LOGITS
realDonaldTrump
0.96
uers
0.92
Sports
0.92
ANI
0.92
groups
0.92
hash
0.89
news
0.87
Know
0.87
nat
0.87
thereal
0.87
Activations Density 0.064%