INDEX
    Explanations

    references to loved ones and familial affection

    New Auto-Interp
    Negative Logits
    ViewFeatures
    -0.79
    l
    -0.70
    y
    -0.64
    1
    -0.62
    i
    -0.62
    3
    -0.61
     McClure
    -0.61
     Tor
    -0.61
     Kondo
    -0.60
    -0.60
    POSITIVE LOGITS
     loved
    1.87
     Loved
    1.84
    Loved
    1.63
     LOVED
    1.62
    loved
    1.50
     liked
    1.22
     gelieb
    1.21
    ſelves
    1.19
     uſed
    1.19
     pleaſure
    1.15
    Act Density 0.104%

    No Known Activations