INDEX
Explanations
articles, particularly "a" and "an"
New Auto-Interp
Negative Logits
Attempts
-0.84
words
-0.83
evidence
-0.79
edit
-0.77
Accounts
-0.76
agents
-0.73
Types
-0.72
books
-0.71
Access
-0.69
external
-0.68
POSITIVE LOGITS
bunch
0.93
handful
0.85
lot
0.82
stationary
0.80
dozen
0.78
peac
0.77
piece
0.76
skysc
0.75
whole
0.74
semi
0.73
Activations Density 0.097%