INDEX
Explanations
comparisons using the word "than"
comparative phrases indicating a preference or a choice
New Auto-Interp
Negative Logits
ModLoader
-0.88
Juda
-0.75
Contract
-0.71
Winged
-0.70
enser
-0.66
Ire
-0.65
suspic
-0.64
ilic
-0.64
lied
-0.62
exemptions
-0.62
POSITIVE LOGITS
atos
1.13
lihood
0.94
assis
0.81
itars
0.70
apes
0.69
usual
0.69
acles
0.68
gins
0.68
vation
0.67
ply
0.67
Activations Density 0.039%