INDEX
    Explanations

    the phrase "Son of" and its variations

    New Auto-Interp
    Negative Logits
    tures
    -0.18
    eer
    -0.18
    resse
    -0.17
    ees
    -0.17
    lett
    -0.17
    TURE
    -0.16
    ément
    -0.16
    onium
    -0.16
    oons
    -0.15
    alls
    -0.15
    POSITIVE LOGITS
    orous
    0.28
    ny
    0.26
    ntag
    0.25
    der
    0.24
    nets
    0.22
    ething
    0.22
    nen
    0.22
    oma
    0.21
    ication
    0.21
    oran
    0.21
    Act Density 0.023%

    No Known Activations