INDEX
Explanations
comparative phrases indicating superiority or inferiority
New Auto-Interp
Negative Logits
åĢij
-0.16
esktop
-0.16
adlo
-0.16
DonaldTrump
-0.16
InstanceState
-0.15
umo
-0.15
usercontent
-0.14
fty
-0.14
achuset
-0.14
ätt
-0.14
POSITIVE LOGITS
ever
0.24
usual
0.22
what
0.19
expected
0.19
meets
0.19
than
0.19
they
0.18
necessary
0.18
originally
0.18
anticipated
0.18
Activations Density 0.071%