INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
the
0.67
a
0.66
an
0.50
а
0.50
your
0.48
it
0.47
قول
0.47
an
0.43
ut
0.43
eb
0.43
POSITIVE LOGITS
havde
0.63
️⃣
0.56
ين
0.54
⃣
0.53
horas
0.52
ήταν
0.51
был
0.49
horaires
0.49
cardiaque
0.49
했던
0.49
Activations Density 0.000%
No Known Activations
This feature has no known activations.