INDEX
Explanations
phrases indicating ordinal rankings or sequencing
New Auto-Interp
Negative Logits
allas
-0.15
/goto
-0.15
achen
-0.14
IMA
-0.14
/entities
-0.14
etal
-0.13
jaw
-0.13
ltre
-0.13
åħħ
-0.13
_closure
-0.13
POSITIVE LOGITS
three
0.22
several
0.19
two
0.19
many
0.17
three
0.16
ä¸ī
0.15
two
0.15
ä¸ī
0.15
rin
0.15
lant
0.15
Activations Density 0.048%