INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
f
1.03
Kate
1.00
کړئ
0.99
g
0.98
υ
0.96
conscience
0.95
atamente
0.95
atrice
0.93
ש
0.92
ɱ
0.92
POSITIVE LOGITS
time
1.40
<unused2222>
1.38
sigh
1.32
dalam
1.27
kker
1.27
다
1.25
शासन
1.24
ක
1.24
ritical
1.23
dough
1.21
Activations Density 0.000%
No Known Activations
This feature has no known activations.