INDEX
Explanations
phrases indicating uncertainty or speculative statements
New Auto-Interp
Negative Logits
former
-0.61
ouf
-0.60
noon
-0.57
quartered
-0.56
76561
-0.56
AAA
-0.55
kie
-0.55
Saying
-0.54
Coffin
-0.52
Trouble
-0.52
POSITIVE LOGITS
beh
1.50
seems
1.30
becomes
1.24
begs
1.23
appears
1.16
shouldn
1.13
makes
1.06
feels
1.04
unes
1.04
hurts
1.03
Activations Density 0.150%