INDEX
Explanations
proper nouns related to locations and people
New Auto-Interp
Negative Logits
pesada
-0.40
femenina
-0.36
masculina
-0.35
gustan
-0.35
secos
-0.35
pública
-0.35
mauvaise
-0.35
masculinos
-0.34
negra
-0.34
contigo
-0.34
POSITIVE LOGITS
<unused43>
0.83
<unused76>
0.82
<unused41>
0.82
<unused17>
0.82
<unused23>
0.82
<pad>
0.82
<unused20>
0.82
<unused74>
0.82
<unused42>
0.82
<unused51>
0.82
Activations Density 0.024%