INDEX
Explanations
the definite article "the" repeated in various contexts
New Auto-Interp
Negative Logits
thood
-0.74
iffe
-0.68
den
-0.55
leeve
-0.55
suppose
-0.54
gat
-0.54
advertising
-0.53
assume
-0.53
ful
-0.53
outs
-0.52
POSITIVE LOGITS
ses
1.08
same
1.08
slightest
1.04
quickest
1.02
hardest
1.01
longest
1.01
fastest
0.99
way
0.95
entirety
0.93
ologically
0.92
Activations Density 0.177%