INDEX
Explanations
phrases indicating a promise or commitment
New Auto-Interp
Negative Logits
Efq
-1.06
Theſe
-0.95
myſelf
-0.94
ſeveral
-0.94
itſelf
-0.92
ſhall
-0.88
Мексичка
-0.88
Houſe
-0.88
fhall
-0.87
Majefty
-0.87
POSITIVE LOGITS
trekken
0.59
For
0.59
plastiques
0.56
By
0.56
fram
0.55
Porque
0.55
raltar
0.54
For
0.52
since
0.51
perchè
0.51
Activations Density 0.023%