INDEX
    Explanations

    the word "No" with high activation values

    repeated occurrences of the word "No."

    New Auto-Interp
    Negative Logits
    rn
    -0.67
    RAFT
    -0.64
    MpServer
    -0.64
    knit
    -0.63
    tein
    -0.62
    CLOSE
    -0.60
    RANT
    -0.59
    adobe
    -0.59
    ULAR
    -0.59
    iership
    -0.59
    POSITIVE LOGITS
     kidding
    1.08
     doubt
    1.05
     wonder
    1.00
    vel
    1.00
    zzle
    0.96
    isy
    0.96
     matter
    0.95
     longer
    0.92
    emi
    0.92
    ct
    0.90
    Act Density 0.060%

    No Known Activations