INDEX
Explanations
Twitter handles or usernames
abbreviations or acronyms related to organizations or entities
New Auto-Interp
Negative Logits
warts
-0.76
values
-0.68
suits
-0.66
laws
-0.65
lings
-0.62
etheless
-0.60
builders
-0.60
Collins
-0.59
vals
-0.58
challengers
-0.58
POSITIVE LOGITS
0.93
qqa
0.81
tsky
0.78
daq
0.76
dq
0.74
zx
0.72
UTH
0.71
ibrary
0.70
zn
0.69
icz
0.69
Activations Density 0.070%