INDEX
Explanations
references to scientific measurements or metrics
lines, separators, and symbols
New Auto-Interp
Negative Logits
iſen
-0.88
ſich
-0.85
ainfi
-0.84
Geſ
-0.82
帖最后由
-0.81
createSprite
-0.81
zoude
-0.78
ſein
-0.77
majánló
-0.77
ſei
-0.77
POSITIVE LOGITS
1
0.47
0
0.44
<blockquote>
0.44
2
0.40
B
0.39
5
0.39
Y
0.39
As
0.39
9
0.39
[toxicity=0]
0.39
Activations Density 0.003%