INDEX
Explanations
introductory phrases indicating a following statement
phrases indicating the acknowledgment or citation of statements
New Auto-Interp
Negative Logits
elled
-0.76
Editors
-0.64
prus
-0.64
vous
-0.64
pins
-0.62
cat
-0.59
eatured
-0.59
Roads
-0.59
tele
-0.59
ARDS
-0.58
POSITIVE LOGITS
itivity
0.80
caveats
0.79
lihood
0.77
hindsight
0.71
aside
0.69
ifference
0.69
anecd
0.69
respectfully
0.68
caveat
0.68
iments
0.66
Activations Density 0.032%