INDEX
Explanations
recurring sequences or patterns in data
Code snippets and related formatting
code keywords and symbols
New Auto-Interp
Negative Logits
iſen
-1.48
ロウィン
-1.38
queſta
-1.38
majánló
-1.37
ſind
-1.37
témoig
-1.34
ſchaft
-1.30
<unused14>
-1.30
<unused8>
-1.30
[@BOS@]
-1.30
POSITIVE LOGITS
0
0.71
s
0.64
hline
0.62
1
0.62
-
0.59
[toxicity=0]
0.58
\
0.57
↵↵
0.57
9
0.57
2
0.57
Activations Density 0.058%