INDEX
Explanations
comparative phrases indicating a lower rank or status
New Auto-Interp
Negative Logits
gerald
-0.68
Downloadha
-0.64
velt
-0.63
bleacher
-0.62
kers
-0.62
=(
-0.61
vez
-0.60
ä½ľ
-0.60
issan
-0.59
Reward
-0.59
POSITIVE LOGITS
hand
1.29
arily
1.19
baseman
1.12
aries
1.05
ary
0.88
guessing
0.84
cousins
0.78
glance
0.78
cousin
0.76
halves
0.74
Activations Density 0.053%