INDEX
Explanations
instances of the word "and" and similar conjunctions
New Auto-Interp
Negative Logits
diss
-0.07
elder
-0.07
.o
-0.06
unter
-0.06
exe
-0.06
kur
-0.06
amat
-0.06
ensus
-0.06
oyer
-0.06
iglia
-0.05
POSITIVE LOGITS
although
0.07
éijij
0.07
çŀ
0.07
although
0.07
there
0.07
there
0.06
icorn
0.06
ithub
0.06
this
0.06
worthy
0.06
Activations Density 0.093%