INDEX
Explanations
scientists, researchers, residents, locals
New Auto-Interp
Negative Logits
ش
0.88
of
0.76
?
0.74
أ
0.74
會
0.71
to
0.70
بر
0.70
ur
0.69
on
0.68
性
0.67
POSITIVE LOGITS
li
0.73
liğini
0.65
៤
0.62
lerce
0.62
៩
0.59
theologians
0.59
ih
0.58
ㅋㅋ
0.57
ли
0.57
மத்தியில்
0.57
Activations Density 0.263%