INDEX
    Explanations

    titles like Prof. or Dr

    New Auto-Interp
    Negative Logits
    5
    0.80
    g
    0.79
    ↵↵
    0.77
    0.76
    er
    0.75
    '
    0.72
    S
    0.72
    4
    0.68
    را
    0.66
    0.66
    POSITIVE LOGITS
    0.79
    0.71
     be
    0.69
    كان
    0.69
    0.69
    0.67
    0.66
    0.66
     et
    0.64
    0.64
    Act Density 0.002%

    No Known Activations