INDEX
Explanations
questions or phrases that seek clarification or information
New Auto-Interp
Negative Logits
Infórmanos
-0.75
Them
-0.66
HtmlAttribute
-0.65
Geografi
-0.65
Datuak
-0.64
Inscrivez
-0.63
honom
-0.62
Descrip
-0.61
său
-0.61
Jangan
-0.61
POSITIVE LOGITS
they
1.72
we
1.23
it
1.21
these
1.06
he
1.06
the
1.03
those
1.03
you
0.95
each
0.95
she
0.91
Activations Density 0.215%