INDEX
Explanations
instances of the word "sure" indicating certainty or assurance
New Auto-Interp
Negative Logits
THEM
-0.15
ynchronously
-0.14
normally
-0.14
him
-0.14
eux
-0.14
zeigt
-0.14
bisher
-0.14
ryn
-0.14
Worse
-0.13
IGO
-0.13
POSITIVE LOGITS
none
0.29
everything
0.28
there
0.26
nothing
0.26
nobody
0.25
sure
0.24
they
0.24
everyone
0.22
no
0.21
everything
0.20
Activations Density 0.042%