INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
paljon
1.06
kao
1.02
maksimum
1.00
sitten
0.98
theless
0.93
saja
0.93
។
0.91
∈
0.89
asam
0.88
ipped
0.87
POSITIVE LOGITS
ar
0.94
s
0.86
ش
0.80
ق
0.79
ار
0.77
ра
0.77
ah
0.77
un
0.75
sberg
0.71
та
0.69
Activations Density 0.001%