INDEX
Explanations
symbols and mathematical expressions within the text
Numbers followed by brackets
`[` followed by punctuation
New Auto-Interp
Negative Logits
Ches
-0.57
Rueda
-0.57
hermes
-0.55
Hakim
-0.53
Dune
-0.52
Thea
-0.51
hermes
-0.50
wra
-0.50
Aya
-0.49
Ach
-0.49
POSITIVE LOGITS
.[
1.61
,[
1.55
)[
1.52
"[
1.50
$[
1.50
).[
1.48
'[
1.47
$[
1.46
?[
1.45
/[
1.45
Activations Density 1.954%