INDEX
Explanations
variables followed by special characters
New Auto-Interp
Negative Logits
ي
0.61
බ
0.60
י
0.57
ح
0.55
ığı
0.55
ushchev
0.55
ώ
0.55
ται
0.54
ılar
0.54
ά
0.53
POSITIVE LOGITS
$
0.86
as
0.80
k
0.71
IL
0.66
Полу
0.66
on
0.62
ik
0.62
.$
0.61
'$
0.61
↵
0.61
Activations Density 0.022%