INDEX
    Explanations

    the name "Sus" or "Suz" alongside some activations associated with different aspects

    references to the name "Susan" and variations thereof

    New Auto-Interp
    Negative Logits
    è¦ļéĨĴ
    -0.84
     overhead
    -0.76
     learning
    -0.74
    OPLE
    -0.74
     living
    -0.73
     sorting
    -0.69
     machinery
    -0.69
    hetti
    -0.68
    anwhile
    -0.68
    erous
    -0.68
    POSITIVE LOGITS
    annah
    1.22
    pect
    1.12
    pected
    1.03
    Sus
    1.00
    pects
    0.99
    pic
    0.95
    pecting
    0.94
    pir
    0.90
    itiz
    0.90
     Sus
    0.84
    Act Density 0.007%

    No Known Activations