INDEX
Explanations
holiday greetings
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
philos
-0.81
advoc
-0.76
princ
-0.75
perspect
-0.75
pudding
-0.74
undermin
-0.71
marqu
-0.71
commer
-0.71
glim
-0.70
sensitive
-0.70
POSITIVE LOGITS
9
1.61
5
1.59
7
1.59
6
1.58
8
1.58
4
1.49
3
1.46
2
1.36
1
1.30
98
1.24
Activations Density 0.052%