INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Sw
    0.72
    ордина
    0.66
    цтва
    0.63
    Creating
    0.62
    Tet
    0.62
     toric
    0.62
    щение
    0.62
    لوب
    0.62
    Loose
    0.62
    sw
    0.61
    POSITIVE LOGITS
    ↵↵↵↵↵↵↵↵↵↵↵↵
    0.90
    ↵↵↵↵↵↵
    0.87
    ↵↵↵↵↵↵↵↵
    0.87
    ↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.85
    ↵↵↵↵
    0.83
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.82
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.82
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.81
    ↵↵↵↵↵
    0.81
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.81
    Act Density 0.095%

    No Known Activations