INDEX
Explanations
questions asked with "aware of" or "best for"
New Auto-Interp
Negative Logits
we
1.12
nobody
1.08
simply
1.07
nonetheless
1.04
doesn
1.03
there
1.03
don
1.00
since
0.99
here
0.98
she
0.97
POSITIVE LOGITS
Of
2.86
For
2.80
And
2.67
To
2.56
With
2.53
On
2.52
At
2.39
From
2.36
By
2.35
For
2.29
Activations Density 1.490%