INDEX
Explanations
social media handles or usernames
usernames or handles from social media contexts
New Auto-Interp
Negative Logits
charact
-0.64
streng
-0.62
reorgan
-0.62
surging
-0.62
unequ
-0.61
tha
-0.61
relegation
-0.59
stigmat
-0.59
delic
-0.58
heed
-0.57
POSITIVE LOGITS
Jr
0.99
mma
0.89
TX
0.88
MD
0.87
NFL
0.87
DC
0.86
CT
0.84
music
0.84
007
0.83
_
0.82
Activations Density 0.080%