INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ك
1.52
ка
1.51
на
1.45
性
1.38
то
1.31
со
1.31
ل
1.27
по
1.27
ные
1.22
ر
1.21
POSITIVE LOGITS
I
1.43
5
1.16
ją
1.16
4
1.12
8
1.05
yg
0.98
int
0.97
1
0.96
0.96
ס
0.95
Activations Density 0.000%