INDEX
Explanations
punctuation and structural markers in text
New Auto-Interp
Negative Logits
iani
-0.18
oft
-0.16
ublik
-0.14
aqu
-0.14
addock
-0.14
icket
-0.14
bia
-0.14
oji
-0.14
ocab
-0.14
leak
-0.14
POSITIVE LOGITS
-*-č↵
0.17
ilda
0.16
465
0.16
teg
0.15
ê¹Į
0.15
-toggler
0.15
Cha
0.15
ÄĽtÃŃ
0.15
Cha
0.15
cha
0.14
Activations Density 0.004%