INDEX
Explanations
references to the concept of "more" in various contexts
a little more
New Auto-Interp
Negative Logits
hroz
-0.46
masiva
-0.44
Turquía
-0.42
massive
-0.41
large
-0.40
huge
-0.40
múltiple
-0.37
Moscú
-0.36
gigantes
-0.36
aveug
-0.36
POSITIVE LOGITS
esternos
0.77
もう少し
0.73
beetje
0.71
<pad>
0.68
<unused28>
0.68
<unused14>
0.68
<unused23>
0.68
<unused16>
0.68
[@BOS@]
0.68
<unused8>
0.68
Activations Density 0.012%