INDEX
Explanations
occurrences of the word "kid" and its variants in various contexts
New Auto-Interp
Negative Logits
xis
-0.17
eyer
-0.16
iants
-0.16
eres
-0.15
iola
-0.15
resident
-0.15
bern
-0.15
acam
-0.15
spo
-0.14
esome
-0.14
POSITIVE LOGITS
nap
0.32
ney
0.29
ults
0.23
friendly
0.22
gloves
0.22
neys
0.22
lington
0.21
Gloves
0.21
der
0.21
-friendly
0.20
Activations Density 0.009%