INDEX
Explanations
providing factual information
New Auto-Interp
Negative Logits
to
1.00
be
0.84
ä
0.75
\
0.64
ö
0.63
but
0.63
_{0.62
с
0.62
in
0.61
so
0.58
POSITIVE LOGITS
i
0.71
ي
0.61
헹
0.60
exemplaires
0.59
Consultado
0.59
rameaux
0.59
r
0.59
conclu
0.58
anciens
0.58
envoyer
0.58
Activations Density 5.272%