INDEX
Explanations
instances of numbers and code formatting elements
code or sentence starters
New Auto-Interp
Negative Logits
zwiſchen
-0.72
majánló
-0.71
[@BOS@]
-0.69
<unused8>
-0.69
<unused47>
-0.69
<unused79>
-0.69
<unused28>
-0.69
<unused23>
-0.69
<unused14>
-0.69
<unused16>
-0.69
POSITIVE LOGITS
originally
0.44
nahilalakip
0.39
Originally
0.39
angelegt
0.37
Originally
0.36
The
0.35
Although
0.34
Though
0.33
Though
0.32
The
0.32
Activations Density 0.004%