INDEX
    Explanations

    non-english text

    New Auto-Interp
    Negative Logits
    Clause
    -0.08
    Hindi
    -0.08
     Dw
    -0.08
    exclusive
    -0.07
    KO
    -0.07
    Probability
    -0.07
     DEN
    -0.07
    imming
    -0.07
     ]↵↵
    -0.07
    amming
    -0.07
    POSITIVE LOGITS
    0.09
    0.09
    ರ್
    0.09
    0.09
    0.09
    0.08
    0.08
    0.08
    िस
    0.08
    0.08
    Act Density 0.069%

    No Known Activations