INDEX
Explanations
and followed by pronouns or articles
New Auto-Interp
Negative Logits
سل
0.83
س
0.83
femenina
0.82
0.79
bertahan
0.79
ക്കുറിച്ച
0.78
hoz
0.77
다
0.76
Subhanahu
0.75
스
0.74
POSITIVE LOGITS
that
1.02
◄
1.00
aya
0.96
that
0.94
آیا
0.93
הם
0.92
ні
0.91
既然
0.91
았
0.91
,(
0.88
Activations Density 0.124%