INDEX
Explanations
terms related to gender and family structures
New Auto-Interp
Negative Logits
ischen
-0.20
Ihren
-0.18
enden
-0.17
lichen
-0.17
ieten
-0.17
respectively
-0.17
uellen
-0.17
Antworten
-0.17
genden
-0.16
oden
-0.16
POSITIVE LOGITS
erste
0.25
kleine
0.24
neue
0.23
groÃŁe
0.22
ige
0.22
weitere
0.21
deutsche
0.20
andere
0.19
ganze
0.19
perman
0.19
Activations Density 0.038%