INDEX
Explanations
accomplishing tasks successfully
New Auto-Interp
Negative Logits
ktir
0.46
hazır
0.44
лық
0.41
ın
0.40
מר
0.39
당연
0.39
ının
0.39
এছাড়াও
0.38
كان
0.38
verwendet
0.38
POSITIVE LOGITS
successfully
1.05
berhasil
0.97
conseguiu
0.92
succesfully
0.90
riesce
0.88
удалось
0.85
успешно
0.85
सफलतापूर्वक
0.85
udało
0.84
riusc
0.83
Activations Density 0.053%