INDEX
Explanations
2 small country
New Auto-Interp
Negative Logits
↵
0.77
ل
0.58
ти
0.42
αν
0.41
l
0.40
ме
0.40
ל
0.39
л
0.37
ות
0.37
ين
0.36
POSITIVE LOGITS
be
0.52
to
0.39
t
0.38
{0.33
was
0.33
ت
0.33
را
0.33
at
0.31
are
0.31
it
0.30
Activations Density 5.396%