INDEX
Explanations
specific references to published articles or news reports
occurrences of the word 'the' and its variants within various contexts
New Auto-Interp
Negative Logits
thereof
-0.80
SPONSORED
-0.74
.''
-0.70
).
-0.68
!).
-0.66
.</
-0.66
!.
-0.66
thereby
-0.65
.�
-0.65
/
-0.65
POSITIVE LOGITS
same
1.12
simplest
1.09
smallest
1.06
latest
1.01
entire
0.99
largest
0.99
oret
0.98
slightest
0.98
entirety
0.97
hardest
0.97
Activations Density 1.815%