INDEX
Explanations
punctuation marks and their associated labels in structured text
New Auto-Interp
Negative Logits
fleisch
-0.53
uD
-0.51
inės
-0.49
aliyet
-0.49
väg
-0.48
kří
-0.48
of
-0.47
leſs
-0.47
הה
-0.47
gott
-0.46
POSITIVE LOGITS
",
2.54
)",
2.28
”,
2.15
?",
2.15
]",
2.06
,",
2.04
.",
2.04
!",
2.01
}",
2.00
:",
1.98
Activations Density 0.118%