INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     It
    0.96
    It
    0.88
    ב
    0.82
    IT
    0.75
    东西
    0.73
    ن
    0.71
    0.71
    יו
    0.65
    I
    0.64
    ён
    0.63
    POSITIVE LOGITS
    rom
    1.24
     Rom
    1.12
     ROM
    1.06
     rom
    1.03
    _
    0.97
    Rom
    0.91
    ud
    0.90
    um
    0.86
    ка
    0.82
    x
    0.81
    Act Density 0.003%

    No Known Activations