INDEX
Explanations
the phrase "Son of" and its variations
New Auto-Interp
Negative Logits
tures
-0.18
eer
-0.18
resse
-0.17
ees
-0.17
lett
-0.17
TURE
-0.16
ément
-0.16
onium
-0.16
oons
-0.15
alls
-0.15
POSITIVE LOGITS
orous
0.28
ny
0.26
ntag
0.25
der
0.24
nets
0.22
ething
0.22
nen
0.22
oma
0.21
ication
0.21
oran
0.21
Activations Density 0.023%