INDEX
Explanations
comparisons indicating preference or superiority
comparative phrases indicating "more than" relationships
New Auto-Interp
Negative Logits
Juda
-0.73
Ire
-0.70
ModLoader
-0.68
ilic
-0.65
Contract
-0.63
Winged
-0.62
stead
-0.62
derog
-0.61
aird
-0.59
veter
-0.58
POSITIVE LOGITS
atos
1.14
lihood
0.86
pload
0.81
xual
0.80
ply
0.78
tz
0.77
assis
0.75
gs
0.74
lio
0.74
gins
0.69
Activations Density 0.026%