INDEX
Explanations
questions or phrases that quantify the extent or amount of something
New Auto-Interp
Negative Logits
why
-0.15
vang
-0.14
same
-0.14
esser
-0.14
arrass
-0.14
van
-0.14
van
-0.14
which
-0.14
with
-0.13
Same
-0.13
POSITIVE LOGITS
/how
0.19
516
0.16
itzer
0.16
رÙĪØª
0.16
they
0.15
ailable
0.14
soever
0.14
ìĶ©
0.14
pace
0.14
Notice
0.14
Activations Density 0.037%