INDEX
Explanations
punctuation marks, particularly commas, in the text
New Auto-Interp
Negative Logits
aez
-0.71
bucks
-0.66
======
-0.62
Abstract
-0.62
stract
-0.61
amoto
-0.60
onym
-0.59
TEXT
-0.58
ertodd
-0.58
illas
-0.56
POSITIVE LOGITS
whose
1.43
which
1.38
whose
1.35
which
1.21
whom
1.18
where
1.01
who
0.95
whence
0.91
where
0.88
who
0.85
Activations Density 0.839%