INDEX
Explanations
German job titles and endings
New Auto-Interp
Negative Logits
\
1.20
к
1.02
ق
0.99
(
0.97
}
0.93
ів
0.90
τ
0.86
గ
0.85
c
0.84
ک
0.84
POSITIVE LOGITS
3
1.20
2
1.09
0
0.90
8
0.89
to
0.87
the
0.83
7
0.81
9
0.78
in
0.76
ro
0.72
Activations Density 0.000%