INDEX
Explanations
sex differences automatically translate
New Auto-Interp
Negative Logits
Probably
0.42
VERY
0.41
$(\
0.40
そらく
0.38
very
0.38
Confidence
0.38
slightly
0.37
সম্ভবত
0.37
Confidence
0.37
明らかに
0.37
POSITIVE LOGITS
automaticamente
1.02
автоматически
1.02
necesariamente
1.01
automatically
0.99
automáticamente
0.97
necessarily
0.95
necessariamente
0.93
automatically
0.93
automatiquement
0.91
somehow
0.90
Activations Density 0.075%