INDEX
Explanations
instances of the word "name" and its variations
New Auto-Interp
Negative Logits
ness
-0.18
tica
-0.18
nds
-0.18
_named
-0.17
roy
-0.16
rego
-0.16
NESS
-0.16
neau
-0.15
ngo
-0.15
nas
-0.15
POSITIVE LOGITS
plate
0.45
ake
0.41
plates
0.39
less
0.33
sake
0.31
cheap
0.29
AKE
0.29
akes
0.29
lessly
0.27
paced
0.27
Activations Density 0.118%