INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
i
1.16
ir
1.07
ي
1.05
IS
0.96
نا
0.92
ش
0.88
k
0.82
m
0.81
ى
0.80
ر
0.78
POSITIVE LOGITS
с
0.75
кий
0.68
sembled
0.67
고
0.66
ём
0.65
1
0.63
ди
0.62
0.59
եր
0.59
ತಮ್ಮ
0.58
Activations Density 4.712%