INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
↵
0.96
et
0.77
ed
0.70
il
0.68
u
0.63
m
0.63
و
0.61
as
0.61
م
0.59
on
0.57
POSITIVE LOGITS
0
0.73
are
0.71
of
0.66
were
0.64
was
0.62
0.59
は
0.56
sont
0.50
是
0.49
0
0.45
Activations Density 0.000%
No Known Activations
This feature has no known activations.