INDEX
Explanations
comparisons and contrasts between subjects
New Auto-Interp
Negative Logits
?,?,?,?,
-0.23
$MESS
-0.15
Seven
-0.15
جات
-0.14
?,?,
-0.14
hiba
-0.14
MANY
-0.13
-many
-0.13
ικη
-0.13
خرÙī
-0.13
POSITIVE LOGITS
two
1.10
two
0.93
两个
0.82
Two
0.80
Two
0.78
TWO
0.78
_two
0.76
-two
0.74
两
0.73
zwei
0.72
Activations Density 0.572%