INDEX
Explanations
occurrences of the word "son" and its variations
New Auto-Interp
Negative Logits
tures
-0.21
resse
-0.19
otland
-0.17
pery
-0.16
alls
-0.16
perator
-0.15
ees
-0.15
rottle
-0.15
IZER
-0.14
tick
-0.14
POSITIVE LOGITS
der
0.23
ny
0.23
orous
0.23
ewhere
0.21
nen
0.21
etimes
0.20
ically
0.20
ewhat
0.19
ship
0.19
al
0.19
Activations Density 0.002%