INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    S
    0.70
     is
    0.61
     was
    0.58
     (
    0.57
    ات
    0.57
    Parks
    0.57
    0.57
    ност
    0.56
    我们
    0.55
    اق
    0.54
    POSITIVE LOGITS
    <unused99>
    0.68
     하시
    0.64
    <unused992>
    0.62
     Bauch
    0.57
    0.57
    }$\
    0.57
    raient
    0.56
     partnering
    0.55
    ?}
    0.55
     Kogi
    0.55
    Act Density 0.002%

    No Known Activations