INDEX
Explanations
Twitter handles containing a combination of letters and numbers
specific usernames or identifiers typically associated with social media accounts
New Auto-Interp
Negative Logits
scrut
-0.85
conservancy
-0.74
ancies
-0.71
etheless
-0.69
mathemat
-0.65
nodd
-0.64
arrang
-0.60
differe
-0.60
withd
-0.59
suits
-0.59
POSITIVE LOGITS
—
1.11
Jr
0.89
q
0.87
pic
0.85
X
0.83
Q
0.77
pic
0.77
8
0.73
&
0.73
E
0.72
Activations Density 0.029%