INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ב
    -0.07
    provide
    -0.06
    🏸
    -0.06
    ew
    -0.06
    .font
    -0.06
    כ
    -0.06
    large
    -0.06
    -0.06
    Б
    -0.06
    -0.06
    POSITIVE LOGITS
     culturally
    0.07
    不予
    0.07
    agers
    0.07
    风暴
    0.07
     ):↵
    0.07
    Regression
    0.06
    oples
    0.06
    sq
    0.06
    ocation
    0.06
     formulate
    0.06
    Act Density 0.123%

    No Known Activations