INDEX
Explanations
Twitter usernames
social media handles or usernames
New Auto-Interp
Negative Logits
margins
-0.83
ACTIONS
-0.75
jaws
-0.74
solvent
-0.74
soil
-0.71
warranties
-0.66
tense
-0.66
embodiments
-0.64
technicians
-0.63
overfl
-0.63
POSITIVE LOGITS
Jew
0.96
_.
0.95
Jr
0.93
PB
0.87
Twe
0.87
_(
0.86
_
0.86
Buff
0.83
York
0.82
ibrary
0.80
Activations Density 0.147%