INDEX
Explanations
quantifiers and comparisons
New Auto-Interp
Negative Logits
aston
-0.70
Ãį
-0.69
may
-0.69
afia
-0.69
ccording
-0.67
76561
-0.67
elle
-0.65
ek
-0.65
ARE
-0.65
ãĤ¨
-0.65
POSITIVE LOGITS
ours
0.89
actual
0.88
outright
0.80
theirs
0.74
hers
0.73
otherwise
0.71
ones
0.71
necessarily
0.70
yours
0.70
nons
0.69
Activations Density 2.572%