INDEX
Explanations
expressions related to restriction and loss of freedom
New Auto-Interp
Negative Logits
Sachsen
-0.38
「
-0.34
te
-0.33
ome
-0.33
Jacobs
-0.33
1
-0.32
-
-0.32
stated
-0.32
declared
-0.32
/
-0.32
POSITIVE LOGITS
zwiſchen
0.76
<unused41>
0.75
<unused42>
0.75
<unused43>
0.75
<unused74>
0.75
<unused8>
0.75
<pad>
0.75
<unused17>
0.74
<unused16>
0.74
[@BOS@]
0.74
Activations Density 0.053%