INDEX
Explanations
repetitive phrases or terms that suggest an additive structure in the text
New Auto-Interp
Negative Logits
allerdings
-0.17
amen
-0.17
however
-0.16
and
-0.15
inho
-0.13
either
-0.13
esson
-0.13
nt
-0.13
atoes
-0.13
rd
-0.13
POSITIVE LOGITS
ebek
0.18
importantly
0.17
/OR
0.17
acen
0.16
vice
0.16
æĿ¥è¯´
0.15
forth
0.14
vice
0.14
eyen
0.14
yor
0.14
Activations Density 0.174%