INDEX
    Explanations

    generic followed by topic

    New Auto-Interp
    Negative Logits
     
    1.34
     be
    1.16
    '
    1.05
     \
    0.89
     (
    0.88
     You
    0.85
     politische
    0.83
     चांगले
    0.82
     NIV
    0.82
     for
    0.79
    POSITIVE LOGITS
    and
    1.54
    ي
    1.45
    ث
    1.37
    ش
    1.36
    ی
    1.33
    ک
    1.33
    ни
    1.30
    generic
    1.12
    ل
    1.11
    ال
    1.10
    Act Density 0.004%

    No Known Activations