INDEX
Explanations
elements related to structured documentation or code
New Auto-Interp
Negative Logits
queſta
-1.09
laſſen
-0.97
ſind
-0.96
ロウィン
-0.95
ſſung
-0.95
müſſen
-0.94
iſchen
-0.94
iſche
-0.94
ſchaft
-0.93
ſicht
-0.93
POSITIVE LOGITS
hline
0.65
↵↵
0.57
0
0.56
(
0.54
In
0.49
if
0.47
So
0.46
S
0.46
,
0.45
1
0.45
Activations Density 0.048%