INDEX
Explanations
the presence of specific chemical symbols or abbreviations related to scientific contexts
New Auto-Interp
Negative Logits
u
-0.77
ا
-0.76
y
-0.76
in
-0.73
is
-0.68
as
-0.66
ו
-0.62
an
-0.61
l
-0.61
at
-0.60
POSITIVE LOGITS
parsedMessage
1.13
purpoſe
0.90
featureID
0.88
<bos>
0.85
houſe
0.82
<unused43>
0.81
<pad>
0.80
<unused41>
0.79
<unused17>
0.79
<unused23>
0.79
Activations Density 2.259%