INDEX
    Explanations

    **the beginning of phrases**

    New Auto-Interp
    Negative Logits
    й
    0.88
    0.86
    0.81
     bungal
    0.78
     ghat
    0.77
     Bungal
    0.77
    ي
    0.77
    াং
    0.75
    σσ
    0.75
     Redeemer
    0.75
    POSITIVE LOGITS
    ew
    0.76
    ad
    0.75
    em
    0.75
    ot
    0.71
    el
    0.70
    ate
    0.70
    es
    0.69
    oliko
    0.69
    8
    0.69
    D
    0.68
    Act Density 0.000%

    No Known Activations