INDEX
Explanations
occurrences of the word "count" in various forms
New Auto-Interp
Negative Logits
Mi
-0.17
orthand
-0.17
er
-0.17
ivery
-0.17
quate
-0.16
tion
-0.16
uset
-0.15
Mi
-0.15
Tap
-0.15
ated
-0.15
POSITIVE LOGITS
erten
0.32
less
0.28
ess
0.28
erc
0.28
erv
0.26
ering
0.25
eless
0.24
isbury
0.23
erview
0.23
enance
0.23
Activations Density 0.007%