INDEX
Explanations
quotations within the text
punctuation marks, specifically quotation marks
New Auto-Interp
Negative Logits
favor
-0.84
grades
-0.80
grade
-0.73
catalog
-0.68
rumored
-0.67
vacation
-0.67
honors
-0.67
muse
-0.66
scheduled
-0.65
graded
-0.65
POSITIVE LOGITS
We
1.15
Firstly
1.14
Our
1.10
There
1.04
It
1.02
Therefore
0.99
Clearly
0.99
Quite
0.98
Such
0.98
Obviously
0.98
Activations Density 0.112%