INDEX
Explanations
Twitter handles along with related text information
New Auto-Interp
Negative Logits
specificity
-0.81
favors
-0.69
compress
-0.68
wards
-0.68
Norwich
-0.67
Takeru
-0.67
orche
-0.67
Romanian
-0.66
Sloven
-0.66
tape
-0.66
POSITIVE LOGITS
realDonaldTrump
1.30
thereal
1.17
Real
1.08
username
1.03
Coach
1.00
#$
0.99
Official
0.97
nat
0.97
meg
0.96
gmail
0.96
Activations Density 0.837%