INDEX
Explanations
periods at the end of sentences
end punctuation marks
New Auto-Interp
Negative Logits
personality
-0.74
unex
-0.66
disemb
-0.66
unts
-0.65
-0.65
clos
-0.65
pillar
-0.63
involuntary
-0.63
tyr
-0.63
utter
-0.62
POSITIVE LOGITS
Unfortunately
1.40
Ideally
1.36
Instead
1.17
Otherwise
1.13
Specifically
1.13
Fortunately
1.11
Luckily
1.09
Sadly
1.08
Doing
1.06
But
1.04
Activations Density 0.523%