INDEX
Explanations
punctuation marks and symbols used for sentence structure
New Auto-Interp
Negative Logits
"]=
-0.68
"]="
-0.64
ratulations
-0.63
]+=
-0.63
')]
-0.61
CURIAM
-0.57
exels
-0.57
})=\
-0.56
']=
-0.55
)$}
-0.55
POSITIVE LOGITS
which
1.35
and
1.31
and
1.20
but
1.18
which
1.17
with
1.16
as
1.04
or
1.03
because
1.00
but
0.96
Activations Density 0.377%