INDEX
Explanations
gendered terms and specific article usage in context
New Auto-Interp
Negative Logits
RegressionTest
-0.57
mattino
-0.57
∵
-0.57
ії
-0.55
समीक्षाओं
-0.54
avancé
-0.54
mourut
-0.54
onAnimation
-0.54
Denna
-0.53
Затем
-0.53
POSITIVE LOGITS
Das
0.86
Das
0.83
ное
0.79
ું
0.79
das
0.77
Το
0.75
льное
0.75
ческое
0.74
Het
0.74
noe
0.72
Activations Density 0.117%