INDEX
Explanations
mentions of advancements and their novel nature
New Auto-Interp
Negative Logits
<bos>
-1.44
Geplaatst
-0.98
itſelf
-0.96
Monfieur
-0.93
noDo
-0.83
myſelf
-0.83
preſent
-0.81
Мексичка
-0.79
raiſ
-0.79
Majefty
-0.78
POSITIVE LOGITS
everyone
0.60
some
0.57
B
0.56
C
0.56
all
0.55
multi
0.55
obcí
0.52
O
0.51
a
0.51
⎩
0.50
Activations Density 0.421%