INDEX
Explanations
phrases that express preference or comparison
New Auto-Interp
Negative Logits
myſelf
-0.93
ſeveral
-0.90
purpoſe
-0.88
ſtate
-0.86
ſever
-0.85
houſe
-0.81
fevere
-0.81
himſelf
-0.81
uſed
-0.78
reaſon
-0.77
POSITIVE LOGITS
than
0.93
而非
0.89
而不是
0.83
rather
0.78
niż
0.77
THAN
0.72
وليس
0.69
bukan
0.67
فريبيس
0.67
колко
0.66
Activations Density 0.151%