INDEX
Explanations
occurrences of the word "the" with high activation values
the definite article "the"
New Auto-Interp
Negative Logits
iffe
-0.77
thood
-0.63
aba
-0.60
leeve
-0.58
Pastebin
-0.58
claw
-0.57
craft
-0.57
bee
-0.56
Edit
-0.55
assume
-0.55
POSITIVE LOGITS
same
1.16
ses
1.08
longest
1.05
hardest
1.01
fastest
1.00
entire
0.96
entirety
0.96
quickest
0.95
slightest
0.94
latter
0.94
Activations Density 0.244%