INDEX
Explanations
quantitative comparisons or equivalences
comparative or relational phrases regarding equality or similarity
New Auto-Interp
Negative Logits
encer
-0.68
eni
-0.67
sort
-0.64
azel
-0.63
Finish
-0.62
eworks
-0.62
akeru
-0.62
encers
-0.59
Soft
-0.59
glers
-0.59
POSITIVE LOGITS
ours
1.60
hers
1.45
theirs
1.42
those
1.31
yours
1.24
mine
1.23
those
1.07
Those
0.84
ones
0.80
that
0.77
Activations Density 0.468%