INDEX
Explanations
definite articles and pronouns in German
New Auto-Interp
Negative Logits
brief
-0.15
swagen
-0.15
волÑı
-0.15
olec
-0.15
orks
-0.14
ean
-0.14
änder
-0.14
the
-0.14
rose
-0.13
ellen
-0.13
POSITIVE LOGITS
ses
0.24
same
0.22
jen
0.21
meisten
0.21
same
0.17
sel
0.17
noch
0.17
respective
0.16
entire
0.16
beiden
0.16
Activations Density 0.047%