INDEX
    Explanations

    identifying bottlenecks

    New Auto-Interp
    Negative Logits
    ми
    0.79
    ческие
    0.71
    見て
    0.67
     алуу
    0.67
    ות
    0.66
    taker
    0.66
    ä
    0.66
    \
    0.66
    ים
    0.65
     exercitation
    0.64
    POSITIVE LOGITS
     bottlenecks
    1.04
     bottleneck
    0.93
    ،
    0.83
    ش
    0.81
    ط
    0.77
    }$:
    0.77
    }$
    0.75
    ف
    0.73
    0.72
    كتب
    0.71
    Act Density 0.005%

    No Known Activations