INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
poor
1.91
HAN
1.91
KAN
1.90
Yup
1.90
CLES
1.89
Keg
1.82
proof
1.78
LEM
1.70
זאת
1.70
窕
1.70
POSITIVE LOGITS
на
4.03
ל
3.42
ان
3.27
ج
3.08
न
2.91
ك
2.80
نا
2.72
ン
2.70
ف
2.53
ת
2.44
Activations Density 0.143%