INDEX
Explanations
references to male pronouns and their variations
New Auto-Interp
Negative Logits
الثة
-0.49
Ỏ
-0.45
wedi
-0.44
nhiêu
-0.44
követ
-0.43
脚注の使い方
-0.43
galima
-0.42
umumkan
-0.41
estudian
-0.41
goa
-0.41
POSITIVE LOGITS
__*/
0.89
]]]
0.82
']]
0.81
OGND
0.79
AsUp
0.79
}))
0.77
Audiodateien
0.77
(!__
0.77
.")]
0.75
')))
0.74
Activations Density 0.193%