INDEX
Explanations
punctuation, particularly parentheses and brackets
New Auto-Interp
Negative Logits
𝐝
-0.64
myth
-0.61
𝐮
-0.60
Gla
-0.59
Hamb
-0.57
the
-0.56
𝐡
-0.55
Chit
-0.54
Jop
-0.54
ad
-0.54
POSITIVE LOGITS
})).
1.35
__).
1.32
()).
1.29
))).
1.28
])).
1.28
expandindo
1.27
}`).
1.24
)).
1.24
")).
1.20
').
1.20
Activations Density 0.062%