INDEX
Explanations
instances of the word "the" in various contexts
New Auto-Interp
Negative Logits
both
-0.56
/
-0.53
and
-0.53
,
-0.52
-
-0.50
whether
-0.47
another
-0.44
both
-0.43
using
-0.43
being
-0.43
POSITIVE LOGITS
same
1.32
entire
1.24
majority
1.18
entirety
1.14
aforementioned
1.13
latter
1.10
slightest
1.08
meisten
1.07
following
1.05
whole
1.05
Activations Density 3.042%