INDEX
Explanations
East Asia, Eastern Europe, East Timor
New Auto-Interp
Negative Logits
1
1.02
6
1.01
ي
0.96
ير
0.93
л
0.92
ل
0.91
ம்
0.89
ومي
0.85
েন
0.84
كان
0.83
POSITIVE LOGITS
(
1.25
the
0.93
em
0.91
ना
0.91
t
0.87
-
0.86
not
0.85
s
0.84
س
0.81
ս
0.80
Activations Density 0.111%