INDEX
Explanations
unique tokens or identifiers from datasets or code snippets
Special characters and punctuation
code or mathematical context
New Auto-Interp
Negative Logits
pleaſure
-1.04
IntoConstraints
-0.96
出版年
-0.92
poffible
-0.91
occaf
-0.87
stanovnika
-0.85
GEBURTSDATUM
-0.85
raiſ
-0.84
itſelf
-0.82
IsMutable
-0.82
POSITIVE LOGITS
[toxicity=0]
1.20
}^{*}$0.98
*
0.87
*",
0.85
*}$
0.83
endblock
0.80
*)
0.76
*}
0.75
')
0.75
*',
0.74
Activations Density 0.020%