INDEX
    Explanations

    contrasting or unexpected continuations

    New Auto-Interp
    Negative Logits
    ель
    0.88
    раст
    0.84
    ليل
    0.75
    اردوش
    0.75
    ي
    0.73
    ి
    0.72
     confertim
    0.68
    лай
    0.67
    سم
    0.66
    چل
    0.66
    POSITIVE LOGITS
    b
    0.78
    am
    0.77
    k
    0.77
     it
    0.72
    et
    0.70
    ate
    0.70
    _
    0.70
    0.69
     can
    0.68
    fte
    0.68
    Act Density 0.004%

    No Known Activations