INDEX
Explanations
definite articles and their variations in German text
New Auto-Interp
Negative Logits
lež
-0.53
piş
-0.48
aiment
-0.47
Geset
-0.46
amico
-0.46
gând
-0.45
memas
-0.45
lichem
-0.45
Peut
-0.45
voordeel
-0.44
POSITIVE LOGITS
die
1.69
Die
1.45
Die
1.40
DIE
1.27
die
1.22
ihre
1.22
DIE
1.12
unsere
1.07
diese
1.06
Ihre
1.04
Activations Density 0.020%