INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    f
    1.21
    v
    1.08
    and
    1.00
     
    0.99
    a
    0.95
    ot
    0.87
    _
    0.86
    é
    0.83
    \
    0.82
    ug
    0.81
    POSITIVE LOGITS
    1.16
    К
    1.05
     opérés
    1.02
    0.99
    ב
    0.99
    ین
    0.97
    в
    0.97
    ות
    0.96
    ിയ
    0.95
    ز
    0.95
    Act Density 0.108%

    No Known Activations