INDEX
Explanations
words indicating exclusivity or unique offerings
New Auto-Interp
Negative Logits
(
-0.56
↵↵
-0.54
w
-0.52
val
-0.51
,
-0.51
he
-0.51
(
-0.51
lo
-0.50
-0.49
ar
-0.49
POSITIVE LOGITS
Efq
1.18
greateſt
1.15
Monfieur
1.13
Theſe
1.08
myſelf
1.07
exclusive
1.05
ſeveral
1.04
Majefty
1.03
purpoſe
1.03
becauſe
1.01
Activations Density 0.091%