INDEX
Explanations
references to specific documents or articles, particularly those that include categorizations, proof, or examples
New Auto-Interp
Negative Logits
telinga
-0.44
extranjera
-0.39
colectiva
-0.38
istrinya
-0.35
ibunya
-0.33
dolayı
-0.33
keluarganya
-0.33
turística
-0.33
imaginación
-0.32
suaminya
-0.31
POSITIVE LOGITS
ſind
1.19
ſelf
1.17
ſei
1.14
featureID
1.14
<pad>
1.13
<unused43>
1.12
<unused42>
1.11
<unused41>
1.11
<unused8>
1.10
<unused23>
1.10
Activations Density 1.063%