INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    y
    0.71
    أ
    0.71
    1
    0.70
    ب
    0.70
    م
    0.68
     in
    0.67
    ной
    0.66
    تح
    0.66
    ő
    0.66
    In
    0.65
    POSITIVE LOGITS
    donut
    1.05
     Donuts
    1.00
    🍩
    0.99
     donut
    0.93
     donuts
    0.91
     doughnuts
    0.84
    st
    0.81
     doughnut
    0.79
    ва
    0.79
    anez
    0.78
    Act Density 0.002%

    No Known Activations