INDEX
Explanations
articles and pronouns in the text
New Auto-Interp
Negative Logits
änder
-0.15
Gregg
-0.15
ewart
-0.14
³
-0.14
ean
-0.14
tej
-0.14
ãģªãĤĭ
-0.14
ysa
-0.14
DataStream
-0.13
çļĦæĺ¯
-0.13
POSITIVE LOGITS
same
0.23
meisten
0.23
jen
0.23
ses
0.21
beiden
0.19
noch
0.18
sel
0.17
same
0.17
ud
0.16
respective
0.15
Activations Density 0.051%