INDEX
Explanations
phrases indicating skepticism or doubt
New Auto-Interp
Negative Logits
also
-0.58
snart
-0.58
hopefully
-0.57
also
-0.57
SOME
-0.57
ALREADY
-0.57
también
-0.56
tambi
-0.56
hopefully
-0.55
already
-0.55
POSITIVE LOGITS
ever
1.72
EVER
1.33
siquiera
1.22
even
1.18
Ever
1.13
jemals
1.11
Ever
1.08
ever
1.08
bothered
1.06
bothering
1.04
Activations Density 0.702%