INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -support
    -0.07
     Persona
    -0.06
     rağmen
    -0.06
    .'.
    -0.06
     tienes
    -0.06
    abric
    -0.06
     Bere
    -0.06
    Laughs
    -0.06
     doubling
    -0.06
    _tf
    -0.06
    POSITIVE LOGITS
     بند
    0.07
    											
    0.07
     cellul
    0.06
     condo
    0.06
     промислов
    0.06
    AxisAlignment
    0.06
    [np
    0.06
     Uniform
    0.06
    Foot
    0.06
    Military
    0.06
    Act Density 0.160%

    No Known Activations