INDEX
Explanations
dates written in the format "Month Dayth"
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
downright
-0.80
forgetting
-0.77
deserved
-0.74
conceivable
-0.73
frankly
-0.72
tremend
-0.71
slapping
-0.71
smelling
-0.69
trusted
-0.69
unthinkable
-0.69
POSITIVE LOGITS
<|endoftext|>
1.50
Tickets
1.13
Coverage
1.08
®
1.01
Its
1.01
Previous
1.00
Previously
0.98
Admission
0.98
Prior
0.98
Additionally
0.98
Activations Density 0.321%