INDEX
Explanations
phrases emphasizing totality or completeness
New Auto-Interp
Negative Logits
sel
-0.16
iner
-0.15
ton
-0.14
olls
-0.14
ell
-0.14
dale
-0.13
icl
-0.13
et
-0.13
æĸĹ
-0.13
nowhere
-0.13
POSITIVE LOGITS
uding
0.20
ayed
0.18
about
0.17
ivet
0.17
uring
0.17
uded
0.17
aylight
0.17
igned
0.16
Greek
0.16
äºĽ
0.15
Activations Density 0.030%