INDEX
Explanations
phrases indicating a specific quantity of items
repeated phrases that start with "of."
New Auto-Interp
Negative Logits
surpr
-0.70
ende
-0.68
blance
-0.67
disadvant
-0.64
condem
-0.64
agre
-0.63
lapt
-0.63
awa
-0.63
rewriting
-0.58
itute
-0.58
POSITIVE LOGITS
ses
0.72
these
0.69
them
0.67
us
0.67
course
0.65
Adams
0.64
them
0.64
icial
0.63
these
0.63
whom
0.63
Activations Density 0.099%