INDEX
Explanations
transitional phrases and contrasting ideas in the text
New Auto-Interp
Negative Logits
çͱäºİ
-0.15
omanip
-0.15
wort
-0.14
umhur
-0.14
onte
-0.14
inous
-0.13
>'.↵
-0.13
ÄĻż
-0.13
anca
-0.13
due
-0.13
POSITIVE LOGITS
few
0.28
taken
0.25
maybe
0.23
what
0.21
beneath
0.21
credit
0.20
context
0.20
perhaps
0.20
such
0.20
few
0.20
Activations Density 0.276%