INDEX
Explanations
punctuation marks that indicate the end of sentences or phrases
string literals and names
New Auto-Interp
Negative Logits
))
-0.66
]))
-0.63
])))
-0.63
)))
-0.62
)).
-0.62
).
-0.60
])
-0.57
)),
-0.57
),
-0.56
])),
-0.56
POSITIVE LOGITS
<unused41>
1.15
[@BOS@]
1.15
<unused68>
1.15
<unused79>
1.15
<unused14>
1.14
<unused8>
1.14
<unused16>
1.14
<unused17>
1.14
<pad>
1.14
<unused3>
1.14
Activations Density 0.022%