INDEX
    Explanations

    self-attention explanation

    New Auto-Interp
    Negative Logits
     Enabled
    0.49
    0.48
    0.48
     Kaltura
    0.47
    雅黑
    0.47
     পুরাতন
    0.46
    0.46
    OAc
    0.46
    Enhanced
    0.46
    0.46
    POSITIVE LOGITS
    מש
    0.50
     shoulder
    0.47
    sponge
    0.46
     wearer
    0.46
    0.46
     rodz
    0.45
     sponge
    0.45
     ghostly
    0.45
     suerte
    0.45
     curvy
    0.45
    Act Density 0.003%

    No Known Activations