INDEX
Explanations
occurrences of the word "or."
New Auto-Interp
Negative Logits
653
-0.17
gaard
-0.16
arna
-0.16
uko
-0.16
ναν
-0.15
imated
-0.14
irthday
-0.14
ĥģ
-0.14
Fakat
-0.14
293
-0.14
POSITIVE LOGITS
several
0.26
more
0.22
possibly
0.20
fewer
0.20
multiple
0.20
sometimes
0.19
preferably
0.19
ific
0.18
many
0.18
few
0.18
Activations Density 0.019%