INDEX
Explanations
numerical data or statistics in a document
New Auto-Interp
Negative Logits
Efq
-0.91
Monfieur
-0.85
Eſ
-0.84
ſta
-0.78
LookAnd
-0.77
ſever
-0.76
Reſ
-0.76
faſt
-0.76
scattata
-0.76
auffi
-0.74
POSITIVE LOGITS
[toxicity=0]
0.74
Slf
0.64
*
0.63
<eos>
0.62
0.52
̀u
0.52
0.51
uxxxx
0.50
↑
0.50
</s>
0.50
Activations Density 0.164%