INDEX
Explanations
symbols and specific characters within text
New Auto-Interp
Negative Logits
G
-0.18
âĢº
-0.17
Pr
-0.17
R
-0.15
âĸº
-0.15
L
-0.15
-0.15
910
-0.14
?:
-0.14
ÄįÃŃ
-0.14
POSITIVE LOGITS
<T
0.27
<B
0.24
<A
0.24
<P
0.22
<D
0.21
<H
0.18
&A
0.18
<
0.18
<F
0.17
(TR
0.17
Activations Density 0.010%