INDEX
Explanations
numerical values associated with specific codes or identifiers
New Auto-Interp
Negative Logits
<em>
-0.82
</strong>
-0.80
…
-0.75
</em>
-0.71
-0.70
<eos>
-0.70
...
-0.70
-0.69
“
-0.65
é
-0.60
POSITIVE LOGITS
1.38
1.35
1.33
myſelf
1.32
1.28
―――――
1.27
1.26
1.25
1.24
Monfieur
1.24
Activations Density 0.301%